diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry-local/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry-local/SKILL.md new file mode 100644 index 00000000..7818b9fc --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry-local/SKILL.md @@ -0,0 +1,89 @@ +--- +name: microsoft-foundry-local +description: "Build AI applications with Foundry Local — a lightweight runtime that downloads, manages, and serves language models entirely on-device via an OpenAI-compatible API. No cloud, no API keys. Routes to specific skills for setup, chat, RAG, agents, whisper, custom models, and evaluation. WHEN: foundry local, on-device AI, local LLM, foundry local overview, what can foundry do, foundry local help, local inference, offline AI, private AI, no cloud AI, foundry capabilities." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local — Skill Hub + +Foundry Local is an on-device AI runtime that serves language models via an OpenAI-compatible API at `http://localhost:/v1`. No cloud services, API keys, or Azure subscriptions required. + +## Skill Routing + +| Need | Skill | Triggers | +|------|-------|----------| +| Install CLI, start service, manage models | **setup** | install, CLI, service start/stop, model download, port discovery | +| Chat completions (streaming, multi-turn) | **chat** | chat, streaming, conversation history, OpenAI SDK | +| Retrieval-Augmented Generation | **rag** | RAG, knowledge base, context injection, document grounding | +| Single & multi-agent workflows | **agents** | agent, multi-agent, orchestration, Agent Framework | +| Audio transcription with Whisper | **whisper** | whisper, transcribe, speech-to-text, audio | +| Compile custom Hugging Face models | **custom-models** | custom model, ONNX, Model Builder, Hugging Face, quantize | +| Test & evaluate LLM output quality | **evaluation** | evaluate, golden dataset, LLM judge, prompt comparison | + +## Quick Reference + +- **API key**: Always `"not-required"` +- **Base URL**: Dynamic port — use SDK to discover: `manager.get_endpoint()` +- **Supported languages**: Python, JavaScript (Node.js), C# (.NET 9) +- **Key SDKs**: `foundry-local-sdk` (Python/JS), `Microsoft.AI.Foundry.Local` (C#) + +## Common Starting Points + +### Install Foundry Local +```bash +# Windows +winget install Microsoft.FoundryLocal + +# macOS +brew install foundrylocal +``` + +### List available models +```bash +foundry model list +``` + +### Start a model +```bash +foundry model run phi-4-mini +``` + +### Connect with Python +```python +from foundry_local import FoundryLocalManager + +manager = FoundryLocalManager("phi-4-mini") +client = manager.get_openai_client() +``` + +### Connect with JavaScript +```javascript +import { FoundryLocalManager } from "foundry-local-sdk"; + +const manager = await FoundryLocalManager.start("phi-4-mini"); +const client = manager.getOpenAIClient(); +``` + +### Connect with C# +```csharp +using Microsoft.AI.Foundry.Local; +using OpenAI; + +var manager = await FoundryLocalManager.StartServiceAsync(); +var client = new OpenAIClient(new("not-required"), + new() { Endpoint = manager.Endpoint }); +``` + +## Rules + +1. Always use the SDK for endpoint discovery — never hard-code ports. +2. Set `api_key` to `"not-required"` — Foundry Local doesn't use API keys. +3. Route to the specific sub-skill for detailed patterns and troubleshooting. +4. All code runs entirely on-device — no network calls to cloud APIs. + +## References + +- [Foundry Local](https://learn.microsoft.com/en-us/azure/foundry-local/) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry-local/agents/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry-local/agents/SKILL.md new file mode 100644 index 00000000..adc8e1ad --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry-local/agents/SKILL.md @@ -0,0 +1,283 @@ +--- +name: agents +description: "Build AI agents and multi-agent workflows with Foundry Local. Covers single agents with personas, multi-agent sequential pipelines, feedback loops, the Microsoft Agent Framework, and conversation history management. WHEN: foundry agent, AI agent local, multi-agent, agent orchestration, feedback loop, agent persona, system instructions, sequential pipeline, researcher writer editor, on-device agent, agent framework, FoundryLocalClient, AsAIAgent." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local Agents & Multi-Agent Workflows + +This skill provides patterns for building single agents and multi-agent workflows that run entirely on-device with Foundry Local. + +## Triggers + +Activate this skill when the user wants to: +- Create an AI agent with custom instructions and persona +- Build multi-agent pipelines (Researcher → Writer → Editor) +- Implement feedback loops between agents +- Use the Microsoft Agent Framework with Foundry Local +- Manage conversation history across agent interactions + +## Rules + +1. **Agents are stateless by default.** Multi-turn agents must explicitly maintain a `history` list. +2. **Use the Agent Framework when available** — it simplifies agent creation. Python uses `agent_framework_foundry_local`, C# uses `Microsoft.Agents.AI.OpenAI`. +3. **JavaScript has no high-level agent framework** — implement agents manually with OpenAI SDK + history management. +4. **Feedback loops need a retry limit** — prevent infinite loops with a max iteration count (typically 2-3). +5. For service setup, refer to **setup** skill. + +--- + +## Single Agent — Using the Agent Framework + +### Python (Recommended — Agent Framework) + +```python +import asyncio +from agent_framework_foundry_local import FoundryLocalClient + +async def main(): + alias = "phi-4-mini" + + # FoundryLocalClient handles service start, model download, and loading + client = FoundryLocalClient(model_id=alias) + + # Create an agent with system instructions + agent = client.as_agent( + name="Joker", + instructions="You are good at telling jokes.", + ) + + # Non-streaming + result = await agent.run("Tell me a joke about a pirate.") + print(result) + + # Streaming + async for chunk in agent.run("Tell me a joke about a programmer.", stream=True): + if chunk.text: + print(chunk.text, end="", flush=True) + +asyncio.run(main()) +``` + +### C# (Recommended — Agent Framework) + +```csharp +using Microsoft.Agents.AI; + +// After setting up manager, model, and OpenAI client (see setup)... +AIAgent joker = client + .GetChatClient(model.Id) + .AsAIAgent( + instructions: "You are good at telling jokes.", + name: "Joker" + ); + +// Non-streaming +var response = await joker.RunAsync("Tell me a joke about a pirate."); +Console.WriteLine(response); + +// Streaming +await foreach (var chunk in joker.RunStreamingAsync("Tell me another joke.")) +{ + Console.Write(chunk.Text); +} +``` + +### JavaScript (Manual — No Agent Framework) + +```javascript +class ChatAgent { + constructor(client, modelId, name, instructions) { + this.client = client; + this.modelId = modelId; + this.name = name; + this.history = [{ role: "system", content: instructions }]; + } + + async run(userMessage) { + this.history.push({ role: "user", content: userMessage }); + + const response = await this.client.chat.completions.create({ + model: this.modelId, + messages: this.history, + temperature: 0.7, + max_tokens: 1024, + }); + + const reply = response.choices[0].message.content; + this.history.push({ role: "assistant", content: reply }); + return reply; + } +} + +// Usage +const joker = new ChatAgent(client, modelInfo.id, "Joker", "You are good at telling jokes."); +const joke = await joker.run("Tell me a joke about a pirate."); +``` + +--- + +## Multi-Agent Pipeline — Sequential Workflow + +The canonical multi-agent pattern is a sequential pipeline where each agent's output feeds the next: + +``` +Topic → [Researcher] → Research Notes → [Writer] → Draft → [Editor] → Verdict +``` + +### Python + +```python +import asyncio +from agent_framework_foundry_local import FoundryLocalClient + +async def main(): + client = FoundryLocalClient(model_id="phi-4-mini") + + researcher = client.as_agent( + name="Researcher", + instructions=( + "You are a research assistant. When given a topic, provide a concise " + "collection of key facts as bullet points." + ), + ) + + writer = client.as_agent( + name="Writer", + instructions=( + "You are a skilled blog writer. Using the research notes provided, " + "write a short, engaging blog post (3-4 paragraphs)." + ), + ) + + editor = client.as_agent( + name="Editor", + instructions=( + "You are a senior editor. Review the blog post for clarity, grammar, " + "and factual consistency. Provide a verdict: ACCEPT or REVISE." + ), + ) + + topic = "The history of renewable energy" + + # Sequential pipeline + research = await researcher.run(f"Research this topic:\n{topic}") + draft = await writer.run(f"Write a blog post from these notes:\n\n{research}") + verdict = await editor.run( + f"Review this article.\n\nResearch notes:\n{research}\n\nArticle:\n{draft}" + ) + +asyncio.run(main()) +``` + +### C# + +```csharp +AIAgent researcher = chatClient.AsAIAgent( + name: "Researcher", + instructions: "You are a research assistant. Provide key facts as bullet points."); + +AIAgent writer = chatClient.AsAIAgent( + name: "Writer", + instructions: "You are a skilled blog writer. Write a short blog post."); + +AIAgent editor = chatClient.AsAIAgent( + name: "Editor", + instructions: "Review the blog post. Provide a verdict: ACCEPT or REVISE."); + +var topic = "The history of renewable energy"; + +var research = await researcher.RunAsync($"Research this topic:\n{topic}"); +var draft = await writer.RunAsync($"Write a blog post from these notes:\n\n{research}"); +var verdict = await editor.RunAsync( + $"Review this article.\n\nResearch notes:\n{research}\n\nArticle:\n{draft}"); +``` + +--- + +## Feedback Loop Pattern + +Add a feedback loop where the Editor can reject the draft and trigger a rewrite: + +```python +MAX_RETRIES = 2 + +for attempt in range(MAX_RETRIES + 1): + draft = await writer.run(f"Write a blog post from these notes:\n\n{research}") + + verdict = await editor.run( + f"Review this article.\n\nResearch:\n{research}\n\nArticle:\n{draft}" + ) + + if "ACCEPT" in verdict.upper(): + print("Article accepted!") + break + elif attempt < MAX_RETRIES: + print(f"Revising (attempt {attempt + 2})...") + research = await researcher.run( + f"The editor wants revisions:\n{verdict}\n\nOriginal topic:\n{topic}" + ) + else: + print("Max retries reached — publishing best effort.") +``` + +--- + +## Agent Design Best Practices + +| Practice | Rationale | +|----------|-----------| +| Give each agent a specific, focused persona | Broad instructions produce vague outputs | +| Include output format in instructions | "Organize as bullet points" or "Respond with ACCEPT or REVISE" | +| Pass context from previous agents explicitly | Agents don't share memory implicitly | +| Limit context passed between agents | Don't forward entire conversations — summarise | +| Set retry limits on feedback loops | Prevent infinite loops (2-3 retries is typical) | + +--- + +## Production Pattern — Shared Configuration + +For production apps (like the Zava Creative Writer), extract common configuration: + +### Python (FastAPI service) +```python +# foundry_config.py — shared across all agents +from foundry_local import FoundryLocalManager + +manager = FoundryLocalManager() +manager.start_service() + +ALIAS = "phi-4-mini" +manager.load_model(ALIAS) + +MODEL_ID = manager.get_model_info(ALIAS).id +ENDPOINT = manager.endpoint +API_KEY = manager.api_key +``` + +```python +# Each agent module imports the shared config +from foundry_config import MODEL_ID, ENDPOINT, API_KEY +``` + +--- + +## Key Packages + +| Language | Package | Purpose | +|----------|---------|---------| +| Python | `agent-framework-foundry-local` | High-level agent abstraction with streaming | +| C# | `Microsoft.Agents.AI.OpenAI` | `AsAIAgent()` extension method | +| JavaScript | — | No framework; use OpenAI SDK directly | + +--- + +## Cross-References + +- For service setup, see **setup** +- For basic chat patterns, see **chat** +- For grounding agents with local data, see **rag** +- For testing agent quality, see **evaluation** diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry-local/chat/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry-local/chat/SKILL.md new file mode 100644 index 00000000..165ffa26 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry-local/chat/SKILL.md @@ -0,0 +1,229 @@ +--- +name: chat +description: "Chat completion patterns with Foundry Local's OpenAI-compatible API. Covers streaming, multi-turn conversations, temperature tuning, and conversation history management. WHEN: foundry local chat, local LLM chat, streaming response, chat completion, conversation history, multi-turn, OpenAI SDK with foundry, api_key not-required, stream tokens, on-device chat, local inference, chat parameters." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local Chat Completions + +This skill provides patterns for chat completions using Foundry Local's OpenAI-compatible API across Python, JavaScript, and C#. + +## Triggers + +Activate this skill when the user wants to: +- Create a chat completion with a local model +- Stream responses token by token +- Build multi-turn conversations with history +- Configure temperature, max_tokens, or other parameters +- Use the OpenAI SDK with Foundry Local + +## Rules + +1. **Always use `manager.endpoint`** for the base URL — never hardcode a port. +2. **API key is `"not-required"`** — Foundry Local does not authenticate. +3. **Use `model_info.id`** (the full hardware-specific ID) in API calls, not the alias. +4. **Streaming syntax differs across languages** — use the correct pattern for each. +5. For service setup, refer to **setup** skill. + +--- + +## Single-Turn Chat (Non-Streaming) + +### Python +```python +response = client.chat.completions.create( + model=model_id, + messages=[{"role": "user", "content": "What is the golden ratio?"}], + temperature=0.7, + max_tokens=512, +) +print(response.choices[0].message.content) +``` + +### JavaScript +```javascript +const response = await client.chat.completions.create({ + model: modelInfo.id, + messages: [{ role: "user", content: "What is the golden ratio?" }], + temperature: 0.7, + max_tokens: 512, +}); +console.log(response.choices[0].message.content); +``` + +### C# +```csharp +var chatClient = client.GetChatClient(model.Id); +var result = await chatClient.CompleteChatAsync("What is the golden ratio?"); +Console.WriteLine(result.Value.Content[0].Text); +``` + +--- + +## Streaming Chat + +Streaming returns tokens as they are generated, providing a responsive user experience. + +### Python +```python +stream = client.chat.completions.create( + model=model_id, + messages=[{"role": "user", "content": "What is the golden ratio?"}], + stream=True, +) + +for chunk in stream: + if chunk.choices[0].delta.content is not None: + print(chunk.choices[0].delta.content, end="", flush=True) +print() +``` + +### JavaScript +```javascript +const stream = await client.chat.completions.create({ + model: modelInfo.id, + messages: [{ role: "user", content: "What is the golden ratio?" }], + stream: true, +}); + +for await (const chunk of stream) { + if (chunk.choices[0]?.delta?.content) { + process.stdout.write(chunk.choices[0].delta.content); + } +} +console.log(); +``` + +### C# +```csharp +var chatClient = client.GetChatClient(model.Id); +var updates = chatClient.CompleteChatStreaming("What is the golden ratio?"); + +foreach (var update in updates) +{ + if (update.ContentUpdate.Count > 0) + { + Console.Write(update.ContentUpdate[0].Text); + } +} +Console.WriteLine(); +``` + +--- + +## Multi-Turn Conversations + +Foundry Local is stateless — you must maintain conversation history yourself by appending each user message and assistant response. + +### Python +```python +history = [ + {"role": "system", "content": "You are a helpful assistant."}, +] + +def chat(user_message): + history.append({"role": "user", "content": user_message}) + + response = client.chat.completions.create( + model=model_id, + messages=history, + temperature=0.7, + max_tokens=512, + ) + + assistant_reply = response.choices[0].message.content + history.append({"role": "assistant", "content": assistant_reply}) + return assistant_reply +``` + +### JavaScript +```javascript +const history = [ + { role: "system", content: "You are a helpful assistant." }, +]; + +async function chat(userMessage) { + history.push({ role: "user", content: userMessage }); + + const response = await client.chat.completions.create({ + model: modelInfo.id, + messages: history, + temperature: 0.7, + max_tokens: 512, + }); + + const reply = response.choices[0].message.content; + history.push({ role: "assistant", content: reply }); + return reply; +} +``` + +### C# +```csharp +var messages = new List +{ + new SystemChatMessage("You are a helpful assistant."), +}; + +async Task ChatAsync(string userMessage) +{ + messages.Add(new UserChatMessage(userMessage)); + + var result = await chatClient.CompleteChatAsync(messages); + var reply = result.Value.Content[0].Text; + + messages.Add(new AssistantChatMessage(reply)); + return reply; +} +``` + +--- + +## Common Pitfalls + +| Mistake | Impact | Fix | +|---------|--------|-----| +| Forgetting to append assistant message to history | Model loses context each turn | Always `history.append({"role": "assistant", ...})` | +| Using alias instead of full model ID in API calls | May fail or select wrong variant | Use `manager.get_model_info(alias).id` | +| Hardcoding `http://localhost:5000/v1` | Fails when port changes | Use `manager.endpoint` | +| Setting `stream=True` but reading `.message.content` | Content is in `.delta.content` for streams | Check `chunk.choices[0].delta.content` | + +--- + +## REST API (cURL) + +You can also call the API directly. Get the port from `foundry service status`: + +```bash +curl http://localhost:/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "", + "messages": [{"role": "user", "content": "Hello!"}], + "max_tokens": 100, + "temperature": 0.7 + }' +``` + +--- + +## Parameters Reference + +| Parameter | Default | Notes | +|-----------|---------|-------| +| `temperature` | 1.0 | Lower = more deterministic, higher = more creative | +| `max_tokens` | model-specific | Maximum tokens to generate | +| `top_p` | 1.0 | Nucleus sampling threshold | +| `stream` | `false` | Enable token-by-token streaming | +| `stop` | none | Stop sequences | + +--- + +## Cross-References + +- For service setup and model management, see **setup** +- For grounding chat with local data, see **rag** +- For agents with system instructions, see **agents** diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry-local/custom-models/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry-local/custom-models/SKILL.md new file mode 100644 index 00000000..c1cf8ce5 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry-local/custom-models/SKILL.md @@ -0,0 +1,270 @@ +--- +name: custom-models +description: "Compile and register custom Hugging Face models for Foundry Local. Covers ONNX Runtime GenAI Model Builder, quantisation, chat template generation, cache registration, and inference_model.json configuration. WHEN: custom model foundry, hugging face model, ONNX compile, model builder, quantize model, int4 quantisation, register custom model, onnxruntime-genai, bring your own model, compile model, ONNX conversion, custom ONNX model, foundry cache register." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local Custom Models + +This skill provides the complete workflow for compiling Hugging Face models into the ONNX format that Foundry Local requires, configuring chat templates, and registering models in the local cache. + +## Triggers + +Activate this skill when the user wants to: +- Compile a Hugging Face model for Foundry Local +- Use the ONNX Runtime GenAI Model Builder +- Quantise a model (int4, int8, fp16, fp32) +- Create an `inference_model.json` chat template configuration +- Register a custom model in the Foundry Local cache +- Understand Model Builder vs Microsoft Olive trade-offs + +## Rules + +1. **Use ONNX Runtime GenAI Model Builder** — it produces the exact output format Foundry Local expects in a single command. +2. **Requires Python 3.10+** and a dedicated virtual environment (PyTorch, Transformers are large). +3. **The `inference_model.json` file is required** — it tells Foundry Local how to format prompts. +4. **The `Name` field in `inference_model.json` becomes the model alias** used in all API calls. +5. For service setup, refer to **setup** skill. + +--- + +## End-to-End Workflow + +``` +1. Install pip install onnxruntime-genai +2. Compile python -m onnxruntime_genai.models.builder -m -o -p int4 -e cpu +3. Chat Template python generate_chat_template.py (creates inference_model.json) +4. Register foundry cache cd +5. Run foundry model run +``` + +--- + +## Step 1: Install the Model Builder + +```bash +pip install onnxruntime-genai +``` + +Verify: +```bash +python -m onnxruntime_genai.models.builder --help +``` + +--- + +## Step 2: Compile a Model + +### CPU (int4 quantisation) + +```bash +python -m onnxruntime_genai.models.builder \ + -m Qwen/Qwen3-0.6B \ + -o models/qwen3 \ + -p int4 \ + -e cpu \ + --extra_options hf_token=false +``` + +### NVIDIA GPU (fp16) + +```bash +python -m onnxruntime_genai.models.builder \ + -m Qwen/Qwen3-0.6B \ + -o models/qwen3-gpu \ + -p fp16 \ + -e cuda \ + --extra_options hf_token=false +``` + +### Parameters + +| Parameter | Purpose | Common Values | +|-----------|---------|---------------| +| `-m` | Hugging Face model ID or local path | `Qwen/Qwen3-0.6B`, `microsoft/Phi-3.5-mini-instruct` | +| `-o` | Output directory | `models/qwen3` | +| `-p` | Quantisation precision | `int4`, `int8`, `fp16`, `fp32` | +| `-e` | Execution provider (target hardware) | `cpu`, `cuda`, `dml`, `NvTensorRtRtx`, `webgpu` | +| `--extra_options` | Additional options | `hf_token=false` (skip auth for public models) | + +### Precision Trade-offs + +| Precision | Size | Speed | Quality | Best For | +|-----------|------|-------|---------|----------| +| `int4` | Smallest | Fastest | Moderate loss | CPU development, low-RAM devices | +| `int8` | Small | Fast | Slight loss | Balanced trade-off | +| `fp16` | Large | Fast (GPU) | Very good | GPU inference | +| `fp32` | Largest | Slowest | Highest | Maximum quality | + +### Hardware Targets + +| Hardware | `-e` value | Recommended `-p` | +|----------|-----------|-------------------| +| CPU | `cpu` | `int4` | +| NVIDIA GPU | `cuda` | `fp16` or `int4` | +| Windows GPU (DirectML) | `dml` | `fp16` or `int4` | +| NVIDIA TensorRT RTX | `NvTensorRtRtx` | `fp16` | +| WebGPU | `webgpu` | `int4` | + +--- + +## Step 3: Create inference_model.json + +The `inference_model.json` tells Foundry Local how to format prompts. Generate it from the model's tokeniser: + +```python +"""Generate an inference_model.json chat template for Foundry Local.""" + +import json +from transformers import AutoTokenizer + +MODEL_PATH = "models/qwen3" + +tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH) + +messages = [ + {"role": "system", "content": "{Content}"}, + {"role": "user", "content": "{Content}"}, +] + +prompt_template = tokenizer.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True, + enable_thinking=False, +) + +inference_model = { + "Name": "qwen3-0.6b", # This becomes the model alias + "PromptTemplate": { + "assistant": "{Content}", + "prompt": prompt_template, + }, +} + +output_path = f"{MODEL_PATH}/inference_model.json" +with open(output_path, "w", encoding="utf-8") as f: + json.dump(inference_model, f, indent=2, ensure_ascii=False) + +print(f"Chat template written to {output_path}") +``` + +> **Important:** The `"Name"` field becomes the model alias used in all subsequent API calls and CLI commands. + +--- + +## Step 4: Register in Foundry Local Cache + +```bash +foundry cache cd models/qwen3 +``` + +Verify: +```bash +foundry cache ls +``` + +--- + +## Step 5: Run the Model + +### CLI +```bash +foundry model run qwen3-0.6b --verbose +``` + +### REST API +```bash +curl -X POST http://localhost:/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{"model": "qwen3-0.6b", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}' +``` + +### OpenAI SDK (Python) +```python +from openai import OpenAI + +client = OpenAI(base_url="http://localhost:/v1", api_key="not-required") + +response = client.chat.completions.create( + model="qwen3-0.6b", + messages=[{"role": "user", "content": "What is the golden ratio?"}], + max_tokens=200, +) +print(response.choices[0].message.content) +``` + +### Foundry Local SDK (Python) +```python +from foundry_local import FoundryLocalManager +import openai + +manager = FoundryLocalManager() +manager.start_service() +manager.load_model("qwen3-0.6b") + +client = openai.OpenAI(base_url=manager.endpoint, api_key=manager.api_key) +response = client.chat.completions.create( + model="qwen3-0.6b", + messages=[{"role": "user", "content": "Hello!"}], +) +print(response.choices[0].message.content) +``` + +--- + +## Expected Output Directory + +After compilation and chat template generation: + +``` +models/qwen3/ + model.onnx + model.onnx.data + genai_config.json (auto-generated by model builder) + chat_template.jinja (auto-generated by model builder) + inference_model.json (you create this) + tokenizer.json + tokenizer_config.json + vocab.json + merges.txt + special_tokens_map.json + added_tokens.json +``` + +--- + +## Model Builder vs Microsoft Olive + +| | **Model Builder** | **Olive** | +|---|---|---| +| **Package** | `onnxruntime-genai` | `olive-ai` | +| **Ease of use** | Single command | Multi-step workflow with YAML config | +| **Best for** | Quick compilation for Foundry Local | Production pipelines with fine-grained control | +| **Foundry Local compat** | Direct — output is immediately compatible | Requires `--use_ort_genai` flag | +| **Hardware scope** | CPU, CUDA, DirectML, TensorRT, WebGPU | All of above + Qualcomm QNN | + +> **Recommendation:** Use the Model Builder for compiling individual models for Foundry Local. Use Olive when you need advanced optimisation (accuracy-aware quantisation, graph surgery, multi-pass tuning). + +--- + +## Troubleshooting + +| Issue | Fix | +|-------|-----| +| Model fails to load after registration | Verify `inference_model.json` exists and is valid JSON | +| `` tags in output | Normal for reasoning models (Qwen3). Adjust prompt template to suppress | +| `hf_token` error | Add `--extra_options hf_token=false` for public models | +| Out of memory during compilation | Use a smaller model or `int4` precision | +| Compilation very slow | Expected — 5-15 min for small models, longer for large ones | + +--- + +## Cross-References + +- For service setup, see **setup** +- For chat completions with compiled models, see **chat** +- For testing model quality, see **evaluation** diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry-local/evaluation/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry-local/evaluation/SKILL.md new file mode 100644 index 00000000..df08830d --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry-local/evaluation/SKILL.md @@ -0,0 +1,345 @@ +--- +name: evaluation +description: "Test and evaluate LLM output quality with Foundry Local. Covers golden datasets, rule-based scoring, LLM-as-judge patterns, side-by-side prompt comparison, and handling service crashes under sustained load. WHEN: evaluate LLM, golden dataset, LLM as judge, prompt comparison, test AI output, eval framework, benchmark local model, quality scoring, evaluate agent, prompt engineering, A/B test prompts, regression testing." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local Evaluation Framework + +This skill provides patterns for systematically testing and evaluating LLM output quality using Foundry Local — entirely on-device. + +## Triggers + +Activate this skill when the user wants to: +- Create a golden dataset for testing AI responses +- Implement rule-based checks (keyword coverage, length, forbidden terms) +- Use LLM-as-judge scoring with a rubric +- Compare prompt variants side by side +- Build a regression testing framework for prompts +- Systematically test agent quality + +## Rules + +1. **Use golden datasets** — define expected outputs before testing, not after. +2. **Combine rule-based and LLM-based scoring** — rules catch obvious issues, LLM judges catch nuance. +3. **Handle HTTP 500 under sustained load** — the service may crash after ~13-15 completions; add try/catch with fallback. +4. **Lower temperature for evaluation** — use 0.1 for LLM-as-judge to get consistent scoring. +5. For service setup, refer to **setup** skill. + +--- + +## Architecture + +``` +Golden Dataset Prompt Variants Scoring +┌──────────────┐ ┌───────────────┐ ┌──────────────┐ +│ Test cases │──────►│ Agent run │──────►│ Rule-based │ +│ with expected│ │ with variant │ │ + LLM │ +│ keywords │ │ system prompt │ │ judge │ +└──────────────┘ └───────────────┘ └──────────────┘ + │ + ┌──────▼──────┐ + │ Comparison │ + │ Report │ + └─────────────┘ +``` + +--- + +## Step 1: Define a Golden Dataset + +Each test case includes an input, expected keywords, and a category: + +### Python +```python +GOLDEN_DATASET = [ + { + "input": "What tools do I need to build a wooden deck?", + "expected": ["saw", "drill", "screws", "level", "tape measure"], + "category": "product-recommendation", + }, + { + "input": "How do I fix a leaky kitchen faucet?", + "expected": ["wrench", "washer", "plumber", "valve", "seal"], + "category": "repair-guidance", + }, + { + "input": "How do I safely use a circular saw?", + "expected": ["safety", "glasses", "guard", "clamp", "blade"], + "category": "safety-advice", + }, +] +``` + +### JavaScript +```javascript +const GOLDEN_DATASET = [ + { + input: "What tools do I need to build a wooden deck?", + expected: ["saw", "drill", "screws", "level", "tape measure"], + category: "product-recommendation", + }, + { + input: "How do I fix a leaky kitchen faucet?", + expected: ["wrench", "washer", "plumber", "valve", "seal"], + category: "repair-guidance", + }, +]; +``` + +--- + +## Step 2: Define Prompt Variants + +Compare different system prompts to find the most effective one: + +```python +PROMPT_VARIANTS = { + "baseline": ( + "You are a helpful assistant. Answer the user's question clearly." + ), + "specialised": ( + "You are a DIY expert. Recommend specific tools and materials, " + "provide step-by-step guidance, and include safety tips." + ), +} +``` + +--- + +## Step 3: Rule-Based Scoring + +Deterministic checks that don't require an LLM call: + +### Python +```python +FORBIDDEN_TERMS = ["home depot", "lowes", "amazon"] + +def score_rules(response, expected_keywords): + words = response.lower().split() + word_count = len(words) + response_lower = response.lower() + + # Length check: 50-500 words + length_score = 1.0 if 50 <= word_count <= 500 else 0.0 + + # Keyword coverage + found = [kw for kw in expected_keywords if kw.lower() in response_lower] + keyword_score = len(found) / len(expected_keywords) if expected_keywords else 1.0 + + # Forbidden terms + forbidden_found = [t for t in FORBIDDEN_TERMS if t in response_lower] + forbidden_score = 0.0 if forbidden_found else 1.0 + + combined = (length_score + keyword_score + forbidden_score) / 3.0 + + return { + "length_score": length_score, + "keyword_score": keyword_score, + "keywords_found": found, + "keywords_missing": [kw for kw in expected_keywords if kw.lower() not in response_lower], + "forbidden_score": forbidden_score, + "combined": round(combined, 2), + } +``` + +### JavaScript +```javascript +const FORBIDDEN_TERMS = ["home depot", "lowes", "amazon"]; + +function scoreRules(response, expectedKeywords) { + const words = response.toLowerCase().split(/\s+/); + const responseLower = response.toLowerCase(); + + const lengthScore = words.length >= 50 && words.length <= 500 ? 1.0 : 0.0; + + const found = expectedKeywords.filter(kw => responseLower.includes(kw.toLowerCase())); + const keywordScore = expectedKeywords.length > 0 + ? found.length / expectedKeywords.length + : 1.0; + + const forbiddenFound = FORBIDDEN_TERMS.filter(t => responseLower.includes(t)); + const forbiddenScore = forbiddenFound.length === 0 ? 1.0 : 0.0; + + return { + lengthScore, + keywordScore, + keywordsFound: found, + forbiddenScore, + combined: Math.round(((lengthScore + keywordScore + forbiddenScore) / 3.0) * 100) / 100, + }; +} +``` + +--- + +## Step 4: LLM-as-Judge Scoring + +Use the same local model to grade response quality: + +### Python +```python +import json +import re + +JUDGE_SYSTEM_PROMPT = """\ +You are an impartial quality evaluator. Rate the following response on a scale of 1-5. + +Rubric: +- 1: Completely wrong or irrelevant +- 2: Partially correct but missing key information +- 3: Adequate but could be improved significantly +- 4: Good response with only minor issues +- 5: Excellent, comprehensive, well-structured response + +Respond ONLY with valid JSON (no code fences): +{"score": <1-5>, "reasoning": ""} +""" + +def llm_judge(client, model_id, question, response): + try: + result = client.chat.completions.create( + model=model_id, + messages=[ + {"role": "system", "content": JUDGE_SYSTEM_PROMPT}, + { + "role": "user", + "content": f"Question: {question}\n\nResponse to evaluate:\n{response}", + }, + ], + temperature=0.1, # Low temperature for consistent scoring + max_tokens=256, + ) + + raw = result.choices[0].message.content.strip() + raw = raw.removeprefix("```json").removeprefix("```").removesuffix("```").strip() + + parsed = json.loads(raw) + score = max(1, min(5, int(parsed.get("score", 3)))) + return {"score": score, "reasoning": parsed.get("reasoning", "")} + except Exception: + # Fallback: extract a number or default to 3 + numbers = re.findall(r"\b([1-5])\b", raw if 'raw' in dir() else "") + return {"score": int(numbers[0]) if numbers else 3, "reasoning": "Fallback score"} +``` + +--- + +## Step 5: Run Evaluation Pipeline + +### Python +```python +def run_agent(client, model_id, system_prompt, user_input): + result = client.chat.completions.create( + model=model_id, + messages=[ + {"role": "system", "content": system_prompt}, + {"role": "user", "content": user_input}, + ], + temperature=0.7, + max_tokens=512, + ) + return result.choices[0].message.content.strip() + + +# Run evaluation for each prompt variant +results = {} + +for variant_name, system_prompt in PROMPT_VARIANTS.items(): + variant_results = [] + + for test_case in GOLDEN_DATASET: + # Get agent response + response = run_agent(client, model_id, system_prompt, test_case["input"]) + + # Score with rules + rule_scores = score_rules(response, test_case["expected"]) + + # Score with LLM judge + judge_result = llm_judge(client, model_id, test_case["input"], response) + + variant_results.append({ + "input": test_case["input"], + "category": test_case["category"], + "rule_score": rule_scores["combined"], + "judge_score": judge_result["score"], + }) + + results[variant_name] = variant_results + +# Compare variants +for name, scores in results.items(): + avg_rule = sum(r["rule_score"] for r in scores) / len(scores) + avg_judge = sum(r["judge_score"] for r in scores) / len(scores) + print(f"{name}: Rule={avg_rule:.2f}, Judge={avg_judge:.1f}/5") +``` + +--- + +## Handling Service Crashes Under Sustained Load + +The Foundry Local service may return HTTP 500 after ~13-15 sequential completions. Add retry logic: + +```python +import time + +def safe_completion(client, model_id, messages, max_retries=2): + for attempt in range(max_retries + 1): + try: + result = client.chat.completions.create( + model=model_id, + messages=messages, + temperature=0.7, + max_tokens=512, + ) + return result.choices[0].message.content.strip() + except Exception as e: + if attempt < max_retries: + print(f"Retry {attempt + 1} after error: {e}") + time.sleep(2) + else: + raise +``` + +If retries don't help, restart the service: +```bash +foundry service stop +foundry service start +``` + +--- + +## Evaluation Design Guidelines + +| Guideline | Rationale | +|-----------|-----------| +| Write golden dataset before prompts | Prevents confirmation bias | +| Use 5+ test cases per category | Statistical significance | +| Combine rule + LLM scoring | Rules catch format issues; LLM catches content quality | +| Use `temperature: 0.1` for judge | Consistent scoring across runs | +| Include forbidden terms check | Catches hallucinated brand names or competitors | +| Test after every prompt change | Regression testing for prompt engineering | + +--- + +## Common Pitfalls + +| Mistake | Impact | Fix | +|---------|--------|-----| +| No try/catch around LLM judge | Pipeline crashes on HTTP 500 | Add fallback score (default 3) | +| High temperature for judge | Inconsistent scores | Use 0.1 | +| Too few test cases | Results not statistically meaningful | Use 5+ per category | +| Only using LLM judge (no rules) | Missing obvious format failures | Combine both approaches | +| Evaluating only one prompt variant | No comparison baseline | Always test at least 2 variants | + +--- + +## Cross-References + +- For service setup, see **setup** +- For agents to evaluate, see **agents** +- For RAG pipelines to evaluate, see **rag** +- For chat patterns, see **chat** diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry-local/rag/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry-local/rag/SKILL.md new file mode 100644 index 00000000..c3e29fd0 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry-local/rag/SKILL.md @@ -0,0 +1,242 @@ +--- +name: rag +description: "Build Retrieval-Augmented Generation (RAG) pipelines with Foundry Local. Covers knowledge base design, retrieval strategies, context injection, and prompt templates — all running on-device with no cloud dependencies. WHEN: RAG pipeline, retrieval augmented generation, ground answers in data, knowledge base, local search, context injection, foundry local RAG, on-device RAG, document grounding, chunk retrieval." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local RAG Pipelines + +This skill provides patterns for building Retrieval-Augmented Generation (RAG) pipelines that run entirely on-device with Foundry Local — no cloud, vector database, or embeddings API required. + +## Triggers + +Activate this skill when the user wants to: +- Build a RAG pipeline with Foundry Local +- Ground LLM answers in local data or documents +- Create a local knowledge base +- Implement retrieval (keyword or semantic) for prompt augmentation +- Design system prompts that inject retrieved context + +## Rules + +1. RAG = **Retrieve** relevant context + **Augment** the prompt + **Generate** a grounded answer. +2. Start with keyword-overlap retrieval (zero dependencies) before suggesting vector search. +3. Always instruct the model to use only the provided context — prevents hallucination. +4. Keep retrieved chunks concise — local models have limited context windows (typically 4K–16K tokens). +5. For service setup, refer to **setup** skill. + +--- + +## Architecture + +``` +User Question + │ + ▼ +┌─────────────┐ ┌──────────────┐ ┌──────────────┐ +│ Retrieve │────►│ Augment │────►│ Generate │ +│ (search │ │ (build │ │ (LLM call │ +│ knowledge │ │ prompt │ │ with │ +│ base) │ │ with │ │ context) │ +│ │ │ context) │ │ │ +└──────────────┘ └──────────────┘ └──────────────┘ +``` + +--- + +## Step 1: Define a Knowledge Base + +Structure your data as a list of chunks, each with a title and content: + +### Python +```python +KNOWLEDGE_BASE = [ + { + "title": "Foundry Local Overview", + "content": ( + "Foundry Local brings the power of Azure AI Foundry to your local " + "device without requiring an Azure subscription..." + ), + }, + { + "title": "Supported Hardware", + "content": ( + "Foundry Local automatically selects the best model variant for " + "your hardware. NVIDIA CUDA, Qualcomm NPU, or CPU..." + ), + }, +] +``` + +### JavaScript +```javascript +const KNOWLEDGE_BASE = [ + { + title: "Foundry Local Overview", + content: "Foundry Local brings the power of Azure AI Foundry...", + }, + { + title: "Supported Hardware", + content: "Foundry Local automatically selects the best model variant...", + }, +]; +``` + +--- + +## Step 2: Implement Retrieval + +### Keyword Overlap (Simple — No Dependencies) + +Scores chunks by word overlap with the query. Good for getting started: + +#### Python +```python +def retrieve(query, knowledge_base, top_k=2): + query_words = set(query.lower().split()) + scored = [] + for chunk in knowledge_base: + chunk_words = set(chunk["content"].lower().split()) + overlap = len(query_words & chunk_words) + scored.append((overlap, chunk)) + scored.sort(key=lambda x: x[0], reverse=True) + return [item[1] for item in scored[:top_k]] +``` + +#### JavaScript +```javascript +function retrieve(query, knowledgeBase, topK = 2) { + const queryWords = new Set(query.toLowerCase().split(/\s+/)); + return knowledgeBase + .map(chunk => { + const chunkWords = new Set(chunk.content.toLowerCase().split(/\s+/)); + const overlap = [...queryWords].filter(w => chunkWords.has(w)).length; + return { overlap, chunk }; + }) + .sort((a, b) => b.overlap - a.overlap) + .slice(0, topK) + .map(item => item.chunk); +} +``` + +### When to Upgrade + +| Approach | Dependencies | Best For | +|----------|-------------|----------| +| Keyword overlap | None | Prototyping, small knowledge bases (<100 chunks) | +| TF-IDF | `scikit-learn` | Medium knowledge bases, better relevance | +| Embedding similarity | Embedding model + numpy | Large knowledge bases, semantic matching | + +--- + +## Step 3: Augment the Prompt + +Build a system prompt that injects the retrieved context and instructs the model to use only that information: + +### Python +```python +def build_rag_prompt(question, retrieved_chunks): + context = "\n".join( + f"- {chunk['title']}: {chunk['content']}" for chunk in retrieved_chunks + ) + system_prompt = ( + "You are a helpful assistant. Answer the user's question using " + "ONLY the context provided below. If the context does not contain " + "enough information, say so.\n\n" + f"Context:\n{context}" + ) + return [ + {"role": "system", "content": system_prompt}, + {"role": "user", "content": question}, + ] +``` + +### JavaScript +```javascript +function buildRagPrompt(question, retrievedChunks) { + const context = retrievedChunks + .map(c => `- ${c.title}: ${c.content}`) + .join("\n"); + + return [ + { + role: "system", + content: + "You are a helpful assistant. Answer the user's question using " + + "ONLY the context provided below. If the context does not contain " + + "enough information, say so.\n\n" + + `Context:\n${context}`, + }, + { role: "user", content: question }, + ]; +} +``` + +--- + +## Step 4: Generate the Answer + +### Python +```python +question = "What hardware does Foundry Local support?" +chunks = retrieve(question, KNOWLEDGE_BASE, top_k=2) +messages = build_rag_prompt(question, chunks) + +response = client.chat.completions.create( + model=model_id, + messages=messages, + temperature=0.3, # Lower temperature for factual answers + max_tokens=512, +) +print(response.choices[0].message.content) +``` + +### JavaScript +```javascript +const question = "What hardware does Foundry Local support?"; +const chunks = retrieve(question, KNOWLEDGE_BASE, 2); +const messages = buildRagPrompt(question, chunks); + +const response = await client.chat.completions.create({ + model: modelInfo.id, + messages, + temperature: 0.3, + max_tokens: 512, +}); +console.log(response.choices[0].message.content); +``` + +--- + +## Design Guidelines + +| Guideline | Rationale | +|-----------|-----------| +| Use `temperature: 0.3` or lower | RAG answers should be factual, not creative | +| Limit to 2-3 retrieved chunks | Local models have limited context windows | +| Include "say so if context is insufficient" | Prevents hallucination when data is missing | +| Chunk content to 100-300 words each | Too long = context overflow; too short = missing info | +| Include source titles in context | Helps the model attribute information | + +--- + +## Common Pitfalls + +| Mistake | Impact | Fix | +|---------|--------|-----| +| No "use only this context" instruction | Model hallucinates beyond provided data | Add explicit grounding instruction in system prompt | +| Retrieving too many chunks | Exceeds context window, degrades quality | Limit `top_k` to 2-3 for small models | +| High temperature (>0.7) for RAG | Generates creative but inaccurate answers | Use 0.1-0.3 for factual grounding | +| Not chunking documents | Entire documents overwhelm context | Split into focused 100-300 word chunks | + +--- + +## Cross-References + +- For service setup, see **setup** +- For basic chat patterns, see **chat** +- For agents with persistent instructions, see **agents** +- For testing RAG quality systematically, see **evaluation** diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry-local/setup/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry-local/setup/SKILL.md new file mode 100644 index 00000000..e51a3f91 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry-local/setup/SKILL.md @@ -0,0 +1,215 @@ +--- +name: setup +description: "Install, configure, and manage Foundry Local — the on-device AI runtime. Covers CLI installation, service lifecycle, model management, port discovery, and troubleshooting. WHEN: install foundry local, start foundry service, download model, list models, foundry CLI, model not loading, service not starting, port discovery, foundry status, foundry local setup, model alias, cache location, hardware detection, service restart, dynamic port." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local Setup & Service Management + +This skill provides guidance for installing, configuring, and managing the Foundry Local on-device AI runtime. + +> **What is Foundry Local?** A lightweight runtime that downloads, manages, and serves language models entirely on your hardware. It exposes an **OpenAI-compatible API** — no cloud account or API keys required. See [foundrylocal.ai](https://foundrylocal.ai). + +## Triggers + +Activate this skill when the user wants to: +- Install Foundry Local CLI +- Start, stop, or restart the Foundry Local service +- Download, list, or manage models +- Discover the dynamic endpoint / port +- Understand model aliases vs hardware-specific IDs +- Troubleshoot service startup, model loading, or cache issues +- Set up a new project with Foundry Local SDK + +## Rules + +1. **Never hardcode ports.** The Foundry Local service uses a dynamic port — always use `manager.endpoint` (Python/JS) or `manager.Urls[0]` (C#). +2. **Cached ≠ loaded.** A model can be cached on disk but not loaded into memory. Always call `load_model()` / `loadModel()` / `LoadAsync()` after confirming the model is cached. +3. **Use aliases, not full IDs.** Aliases like `phi-3.5-mini` auto-select the best hardware variant (CUDA, QNN, CPU). Full IDs are hardware-specific. +4. **API key is always `"not-required"`.** Foundry Local does not authenticate — set `api_key="not-required"` or `"foundry-local"`. + +--- + +## CLI Installation + +### Windows +```powershell +winget install Microsoft.FoundryLocal +``` + +### macOS +```bash +brew tap microsoft/foundrylocal && brew install foundrylocal +``` + +### Verify +```bash +foundry --version +foundry service status +``` + +--- + +## CLI Quick Reference + +| Command | Purpose | +|---------|---------| +| `foundry model list` | List all available models in the catalog | +| `foundry model list --source cache` | List only downloaded (cached) models | +| `foundry model run ` | Download (if needed), load, and start interactive chat | +| `foundry service status` | Check if the service is running | +| `foundry service stop` | Stop the service | +| `foundry cache register --model-path --alias ` | Register a custom compiled model | + +--- + +## SDK Lifecycle — 7-Step Pattern (All Languages) + +Every Foundry Local application follows the same architecture: + +1. **Create manager** — no parameters required +2. **Start service** — spawns the inference server on a dynamic port +3. **Query catalog** — list available models +4. **Check cache** — distinguish "available" from "downloaded" +5. **Download if needed** — with progress callbacks +6. **Load into memory** — required before inference; resolves full model ID +7. **Create OpenAI client** — use `manager.endpoint` + dummy API key + +### Python + +```python +from foundry_local import FoundryLocalManager +import openai + +alias = "phi-3.5-mini" + +manager = FoundryLocalManager() +manager.start_service() + +# Check cache and download if needed +cached = manager.list_cached_models() +catalog_info = manager.get_model_info(alias) +is_cached = any(m.id == catalog_info.id for m in cached) if catalog_info else False + +if not is_cached: + manager.download_model(alias, progress_callback=lambda p: print(f"{p:.0f}%")) + +manager.load_model(alias) + +client = openai.OpenAI( + base_url=manager.endpoint, + api_key=manager.api_key # "not-required" +) +``` + +### JavaScript + +```javascript +import { FoundryLocalManager } from "foundry-local-sdk"; +import { OpenAI } from "openai"; + +const alias = "phi-3.5-mini"; +const manager = new FoundryLocalManager(); +await manager.startService(); + +const cached = await manager.listCachedModels(); +const catalogInfo = await manager.getModelInfo(alias); +const isCached = cached.some(m => m.id === catalogInfo?.id); + +if (!isCached) { + await manager.downloadModel(alias, undefined, false, p => console.log(`${p}%`)); +} + +const modelInfo = await manager.loadModel(alias); + +const client = new OpenAI({ + baseURL: manager.endpoint, + apiKey: manager.apiKey, +}); +``` + +### C# + +```csharp +using Microsoft.AI.Foundry.Local; +using Microsoft.Extensions.Logging.Abstractions; +using OpenAI; +using System.ClientModel; + +var alias = "phi-3.5-mini"; + +await FoundryLocalManager.CreateAsync( + new Configuration + { + AppName = "MyApp", + Web = new Configuration.WebService { Urls = "http://127.0.0.1:0" } + }, NullLogger.Instance, default); + +var manager = FoundryLocalManager.Instance; +await manager.StartWebServiceAsync(default); + +var catalog = await manager.GetCatalogAsync(default); +var model = await catalog.GetModelAsync(alias, default); + +if (!await model.IsCachedAsync(default)) + await model.DownloadAsync(null, default); + +await model.LoadAsync(default); + +var client = new OpenAIClient( + new ApiKeyCredential("foundry-local"), + new OpenAIClientOptions { Endpoint = new Uri(manager.Urls[0] + "/v1") }); +``` + +> **C# Note:** The `Microsoft.AI.Foundry.Local` NuGet package requires an explicit `` in your `.csproj` (e.g., `win-x64`, `win-arm64`). + +--- + +## Hardware Auto-Detection + +When you use an alias like `phi-3.5-mini`, the SDK automatically selects the best variant: + +| Hardware | Execution Provider | Selected Automatically | +|----------|-------------------|----------------------| +| NVIDIA GPU | CUDA | Yes | +| Qualcomm NPU | QNN | Yes (if available) | +| CPU (default) | CPU | Yes (fallback) | + +Developers do not need to pick variants — hardware detection is transparent. + +--- + +## Troubleshooting + +| Symptom | Cause | Fix | +|---------|-------|-----| +| Service won't start | Port conflict or stale process | `foundry service stop` then retry | +| Model not found | Alias typo or outdated catalog | Run `foundry model list` to see valid aliases | +| `IsCachedAsync` NullReferenceException | Race condition on first run (C#) | Retry after delay; SDK may not be fully ready | +| HTTP 500 under sustained load | Resource exhaustion after ~13-15 completions | `foundry service stop` then restart; add try/catch with fallback | +| OGA memory leak warnings on exit | SDK doesn't expose Dispose for native resources | Non-blocking; can be ignored | +| Snapdragon CPU warnings | cpuinfo library doesn't recognise Oryon cores | Cosmetic only; inference works correctly | +| C# build fails with NETSDK1047 | Missing `` in `.csproj` | Add `win-x64` | + +--- + +## Key SDK Properties + +| Property | Python | JavaScript | C# | +|----------|--------|------------|-----| +| Endpoint | `manager.endpoint` | `manager.endpoint` | `manager.Urls[0] + "/v1"` | +| API key | `manager.api_key` | `manager.apiKey` | `"foundry-local"` (any string) | +| Model ID | `manager.get_model_info(alias).id` | `modelInfo.id` | `model.Id` | +| Cache path | `manager.get_cache_location()` | — | — | + +--- + +## Cross-References + +- For chat completion patterns, see **chat** +- For RAG pipelines, see **rag** +- For agent creation, see **agents** +- For custom model compilation, see **custom-models** diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry-local/whisper/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry-local/whisper/SKILL.md new file mode 100644 index 00000000..5a0eccae --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry-local/whisper/SKILL.md @@ -0,0 +1,222 @@ +--- +name: whisper +description: "Transcribe audio with Whisper running on-device via Foundry Local. Covers model download (SDK only), ONNX encoder/decoder pipeline, feature extraction, audio format requirements, and language-specific APIs. WHEN: whisper transcription, speech to text local, audio transcription, foundry whisper, on-device transcription, WAV transcription, voice to text, transcribe audio, whisper model, speech recognition local." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" +--- + +# Foundry Local Whisper Transcription + +This skill provides patterns for transcribing audio files using the OpenAI Whisper model running entirely on-device through Foundry Local. + +## Triggers + +Activate this skill when the user wants to: +- Transcribe audio files (WAV) using Whisper locally +- Set up speech-to-text with no cloud dependencies +- Download and configure the Whisper model via Foundry Local +- Build an ONNX encoder/decoder transcription pipeline +- Process audio files for local transcription + +## Rules + +1. **Whisper must be downloaded via the SDK** — the CLI does not support Whisper model download. +2. **Audio must be 16kHz mono WAV** — resample before processing. +3. **Python/JS use a manual ONNX pipeline** (encoder → decoder with KV cache). +4. **C# has a high-level API** — `GetAudioClient().TranscribeAudioAsync()`. +5. For service setup, refer to **setup** skill. + +--- + +## Model Download + +The Whisper model **must** be downloaded using the Foundry Local SDK, not the CLI: + +### Python +```python +from foundry_local import FoundryLocalManager + +manager = FoundryLocalManager("whisper-medium") +model_info = manager.get_model_info("whisper-medium") +cache_location = manager.get_cache_location() +``` + +### JavaScript +```javascript +import { FoundryLocalManager } from "foundry-local-sdk"; + +const manager = new FoundryLocalManager(); +await manager.startService(); +const modelInfo = await manager.loadModel("whisper-medium"); +``` + +### C# +```csharp +var catalog = await manager.GetCatalogAsync(default); +var model = await catalog.GetModelAsync("whisper-medium", default); +if (!await model.IsCachedAsync(default)) + await model.DownloadAsync(null, default); +await model.LoadAsync(default); +``` + +--- + +## Python — Manual ONNX Pipeline + +Python requires a manual encoder/decoder pipeline using ONNX Runtime: + +### Dependencies +```bash +pip install foundry-local-sdk onnxruntime transformers librosa numpy +``` + +### Complete Pipeline + +```python +import numpy as np +import onnxruntime as ort +import librosa +from transformers import WhisperFeatureExtractor, WhisperTokenizer +from foundry_local import FoundryLocalManager +import os + +# Download model via SDK +manager = FoundryLocalManager("whisper-medium") +model_info = manager.get_model_info("whisper-medium") +cache_location = manager.get_cache_location() + +# Build path to ONNX files +model_dir = os.path.join( + cache_location, "Microsoft", + model_info.id.replace(":", "-"), + "cpu-fp32" +) + +# Load ONNX sessions +encoder_session = ort.InferenceSession( + os.path.join(model_dir, "whisper-medium_encoder_fp32.onnx"), + providers=["CPUExecutionProvider"], +) +decoder_session = ort.InferenceSession( + os.path.join(model_dir, "whisper-medium_decoder_fp32.onnx"), + providers=["CPUExecutionProvider"], +) + +# Load feature extractor and tokeniser +feature_extractor = WhisperFeatureExtractor.from_pretrained(model_dir) +tokenizer = WhisperTokenizer.from_pretrained(model_dir) + +# Whisper medium model dimensions +NUM_LAYERS = 24 +NUM_HEADS = 16 +HEAD_SIZE = 64 + +# Build initial decoder tokens +sot = tokenizer.convert_tokens_to_ids("<|startoftranscript|>") +eot = tokenizer.convert_tokens_to_ids("<|endoftext|>") +notimestamps = tokenizer.convert_tokens_to_ids("<|notimestamps|>") +forced_ids = tokenizer.get_decoder_prompt_ids(language="en", task="transcribe") +INITIAL_TOKENS = [sot] + [tid for _, tid in forced_ids] + [notimestamps] + + +def transcribe(audio_path): + # Load audio at 16kHz mono + audio, _ = librosa.load(audio_path, sr=16000) + + # Extract log-mel spectrogram + features = feature_extractor(audio, sampling_rate=16000, return_tensors="np") + audio_features = features["input_features"].astype(np.float32) + + # Run encoder + encoder_outputs = encoder_session.run(None, {"audio_features": audio_features}) + cross_kv_list = encoder_outputs[1:] + + # Prepare cross-attention KV cache + cross_kv = {} + for i in range(NUM_LAYERS): + cross_kv[f"past_key_cross_{i}"] = cross_kv_list[i * 2] + cross_kv[f"past_value_cross_{i}"] = cross_kv_list[i * 2 + 1] + + # Initialise self-attention KV cache + self_kv = {} + for i in range(NUM_LAYERS): + self_kv[f"past_key_self_{i}"] = np.zeros((1, NUM_HEADS, 0, HEAD_SIZE), dtype=np.float32) + self_kv[f"past_value_self_{i}"] = np.zeros((1, NUM_HEADS, 0, HEAD_SIZE), dtype=np.float32) + + # Autoregressive decoding + input_ids = np.array([INITIAL_TOKENS], dtype=np.int32) + generated = [] + + for _ in range(448): # Max tokens + feeds = {"input_ids": input_ids} + feeds.update(cross_kv) + feeds.update(self_kv) + + outputs = decoder_session.run(None, feeds) + logits = outputs[0] + next_token = int(np.argmax(logits[0, -1, :])) + + if next_token == eot: + break + + generated.append(next_token) + + # Update self-attention KV cache + for i in range(NUM_LAYERS): + self_kv[f"past_key_self_{i}"] = outputs[1 + i * 2] + self_kv[f"past_value_self_{i}"] = outputs[2 + i * 2] + + input_ids = np.array([[next_token]], dtype=np.int32) + + return tokenizer.decode(generated, skip_special_tokens=True).strip() +``` + +--- + +## C# — High-Level API + +C# provides a simpler API via `GetAudioClient()`: + +```csharp +var audioClient = model.GetAudioClient(); +var result = await audioClient.TranscribeAudioAsync(audioFilePath); +Console.WriteLine(result.Text); +``` + +--- + +## Audio Format Requirements + +| Requirement | Value | +|-------------|-------| +| Sample rate | 16,000 Hz (16 kHz) | +| Channels | Mono (1 channel) | +| Format | WAV (PCM) | +| Max duration | ~30 seconds per segment | + +If your audio is in a different format, resample before processing: + +```python +# librosa handles resampling automatically +audio, _ = librosa.load("input.wav", sr=16000) +``` + +--- + +## Known Issues + +| Issue | Severity | Workaround | +|-------|----------|------------| +| JS: Last audio file may return empty transcription | Minor | Node.js binding edge case; other files work fine | +| C#: Path resolution fragile with different RIDs | Minor | Use absolute paths or CLI arguments | +| OGA memory leak warnings on exit | Warning | Non-blocking; no cleanup API exposed | + +--- + +## Cross-References + +- For service setup and model download, see **setup** +- For chat completions (text), see **chat** +- For custom model compilation, see **custom-models** diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/.gitignore b/.github/plugins/azure-skills/skills/microsoft-foundry/.gitignore new file mode 100644 index 00000000..e69de29b diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry/SKILL.md index 8b111859..203aedab 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/SKILL.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/SKILL.md @@ -1,32 +1,101 @@ --- name: microsoft-foundry -description: | - Use this skill to work with Microsoft Foundry (Azure AI Foundry): deploy AI models from catalog, build RAG applications with knowledge indexes, create and evaluate AI agents, manage RBAC permissions and role assignments, manage quotas and capacity, create Foundry resources. - USE FOR: Microsoft Foundry, AI Foundry, deploy model, model catalog, RAG, knowledge index, create agent, evaluate agent, agent monitoring, create Foundry project, new Foundry project, set up Foundry, onboard to Foundry, provision Foundry infrastructure, create Foundry resource, create AI Services, multi-service resource, AIServices kind, register resource provider, enable Cognitive Services, setup AI Services account, create resource group for Foundry, RBAC, role assignment, managed identity, service principal, permissions, quota, capacity, TPM, deployment failure, QuotaExceeded. - DO NOT USE FOR: Azure Functions (use azure-functions), App Service (use azure-create-app), generic Azure resource creation (use azure-create-app). +description: "Deploy, evaluate, and manage Foundry agents end-to-end: Docker build, ACR push, hosted/prompt agent create, container start, batch eval, prompt optimization, agent.yaml, dataset curation from traces. USE FOR: deploy agent to Foundry, hosted agent, create agent, invoke agent, evaluate agent, run batch eval, optimize prompt, deploy model, Foundry project, RBAC, role assignment, permissions, quota, capacity, region, troubleshoot agent, deployment failure, create dataset from traces, dataset versioning, eval trending, create AI Services, Cognitive Services, create Foundry resource, provision resource, knowledge index, agent monitoring, customize deployment, onboard, availability, standard agent setup, capability host. DO NOT USE FOR: Azure Functions, App Service, general Azure deploy (use azure-deploy), general Azure prep (use azure-prepare)." +license: MIT +metadata: + author: Microsoft + version: "1.0.3" --- # Microsoft Foundry Skill -This skill helps developers work with Microsoft Foundry resources, covering model discovery and deployment, RAG (Retrieval-Augmented Generation) applications, AI agent creation, evaluation workflows, and troubleshooting. +> **MANDATORY:** Read this skill and the relevant sub-skill BEFORE calling any Foundry MCP tool. ## Sub-Skills -This skill includes specialized sub-skills for specific workflows. **Use these instead of the main skill when they match your task:** - | Sub-Skill | When to Use | Reference | |-----------|-------------|-----------| +| **deploy** | Containerize, build, push to ACR, create/update/start/stop/clone agent deployments | [deploy](agent/deploy/deploy.md) | +| **invoke** | Send messages to an agent, single or multi-turn conversations | [invoke](agent/invoke/invoke.md) | +| **observe** | Eval-driven optimization loop: evaluate → analyze → optimize → compare → iterate | [observe](agent/observe/observe.md) | +| **trace** | Query traces, analyze latency/failures, correlate eval results to specific responses via App Insights `customEvents` | [trace](agent/trace/trace.md) | +| **troubleshoot** | View container logs, query telemetry, diagnose failures | [troubleshoot](agent/troubleshoot/troubleshoot.md) | +| **create** | Create new hosted agent applications. Supports Microsoft Agent Framework, LangGraph, or custom frameworks in Python or C#. Downloads starter samples from foundry-samples repo. | [create](agent/create/create.md) | +| **eval-datasets** | Harvest production traces into evaluation datasets, manage dataset versions and splits, track evaluation metrics over time, detect regressions, and maintain full lineage from trace to deployment. Use for: create dataset from traces, dataset versioning, evaluation trending, regression detection, dataset comparison, eval lineage. | [eval-datasets](agent/eval-datasets/eval-datasets.md) | | **project/create** | Creating a new Azure AI Foundry project for hosting agents and models. Use when onboarding to Foundry or setting up new infrastructure. | [project/create/create-foundry-project.md](project/create/create-foundry-project.md) | | **resource/create** | Creating Azure AI Services multi-service resource (Foundry resource) using Azure CLI. Use when manually provisioning AI Services resources with granular control. | [resource/create/create-foundry-resource.md](resource/create/create-foundry-resource.md) | | **models/deploy-model** | Unified model deployment with intelligent routing. Handles quick preset deployments, fully customized deployments (version/SKU/capacity/RAI), and capacity discovery across regions. Routes to sub-skills: `preset` (quick deploy), `customize` (full control), `capacity` (find availability). | [models/deploy-model/SKILL.md](models/deploy-model/SKILL.md) | -| **agent/create/agent-framework** | Creating AI agents and workflows using Microsoft Agent Framework SDK. Supports single-agent and multi-agent workflow patterns with HTTP server and F5/debug support. | [agent/create/agent-framework/SKILL.md](agent/create/agent-framework/SKILL.md) | | **quota** | Managing quotas and capacity for Microsoft Foundry resources. Use when checking quota usage, troubleshooting deployment failures due to insufficient quota, requesting quota increases, or planning capacity. | [quota/quota.md](quota/quota.md) | | **rbac** | Managing RBAC permissions, role assignments, managed identities, and service principals for Microsoft Foundry resources. Use for access control, auditing permissions, and CI/CD setup. | [rbac/rbac.md](rbac/rbac.md) | -> 💡 **Tip:** For a complete onboarding flow: `project/create` → `agent/create` → `agent/deploy`. If the user wants to **create AND deploy** an agent, start with `agent/create` which can optionally invoke `agent/deploy` automatically. +Onboarding flow: `project/create` → `deploy` → `invoke` + +## Agent Lifecycle + +| Intent | Workflow | +|--------|----------| +| New agent from scratch | create → deploy → invoke | +| Deploy existing code | deploy → invoke | +| Test/chat with agent | invoke | +| Troubleshoot | invoke → troubleshoot | +| Fix + redeploy | troubleshoot → fix → deploy → invoke | + +## Project Context Resolution + +Resolve only missing values. Extract from user message first, then azd, then ask. + +1. Check for `azure.yaml`; if found, run `azd env get-values` +2. Map azd variables: + +| azd Variable | Resolves To | +|-------------|-------------| +| `AZURE_AI_PROJECT_ENDPOINT` / `AZURE_AIPROJECT_ENDPOINT` | Project endpoint | +| `AZURE_CONTAINER_REGISTRY_NAME` / `AZURE_CONTAINER_REGISTRY_ENDPOINT` | ACR registry | +| `AZURE_SUBSCRIPTION_ID` | Subscription | + +3. Ask user only for unresolved values (project endpoint, agent name) + +## Validation + +After each workflow step, validate before proceeding: +1. Run the operation +2. Check output for errors or unexpected results +3. If failed → diagnose using troubleshoot sub-skill → fix → retry +4. Only proceed to next step when validation passes + +## Agent Types + +| Type | Kind | Description | +|------|------|-------------| +| **Prompt** | `"prompt"` | LLM-based, backed by model deployment | +| **Hosted** | `"hosted"` | Container-based, running custom code | + +## Agent: Setup Types + +| Setup | Capability Host | Description | +|-------|----------------|-------------| +| **Basic** | None | Default. All resources Microsoft-managed. | +| **Standard** | Azure AI Services | Bring-your-own storage and search (public network). See [standard-agent-setup](references/standard-agent-setup.md). | +| **Standard + Private Network** | Azure AI Services | Standard setup with VNet isolation and private endpoints. See [private-network-standard-agent-setup](references/private-network-standard-agent-setup.md). | + +> **MANDATORY:** For standard setup, read the appropriate reference before proceeding: +> - **Public network:** [references/standard-agent-setup.md](references/standard-agent-setup.md) +> - **Private network (VNet isolation):** [references/private-network-standard-agent-setup.md](references/private-network-standard-agent-setup.md) + +## Tool Usage Conventions + +- Use the `ask_user` or `askQuestions` tool whenever collecting information from the user +- Use the `task` or `runSubagent` tool to delegate long-running or independent sub-tasks (e.g., env var scanning, status polling, Dockerfile generation) +- Prefer Azure MCP tools over direct CLI commands when available +- Reference official Microsoft documentation URLs instead of embedding CLI command syntax + +## References -> 💡 **Model Deployment:** Use `models/deploy-model` for all deployment scenarios — it intelligently routes between quick preset deployment, customized deployment with full control, and capacity discovery across regions. +- [Hosted Agents](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry) +- [Runtime Components](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/runtime-components?view=foundry) +- [Foundry Samples](https://github.com/azure-ai-foundry/foundry-samples) +- [Python SDK](references/sdk/foundry-sdk-py.md) -## SDK Quick Reference +## Dependencies -- [Python](references/sdk/foundry-sdk-py.md) \ No newline at end of file +Scripts in sub-skills require: Azure CLI (`az`) ≥2.0, `jq` (for shell scripts). Install via `pip install azure-ai-projects azure-identity` for Python SDK usage. \ No newline at end of file diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/SKILL.md deleted file mode 100644 index 1f298b27..00000000 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/SKILL.md +++ /dev/null @@ -1,162 +0,0 @@ ---- -name: agent-framework -description: | - Create AI agents and workflows using Microsoft Agent Framework SDK. Supports single-agent and multi-agent workflow patterns. - USE FOR: create agent, build agent, scaffold agent, new agent, agent framework, workflow pattern, multi-agent, MCP tools, create workflow. - DO NOT USE FOR: deploying agents (use agent/deploy), evaluating agents (use agent/evaluate), Azure AI Foundry agents without Agent Framework SDK. ---- - -# Create Agent with Microsoft Agent Framework - -Build AI agents, agentic apps, and multi-agent workflows using Microsoft Agent Framework SDK. - -## Quick Reference - -| Property | Value | -|----------|-------| -| **SDK** | Microsoft Agent Framework (Python) | -| **Patterns** | Single Agent, Multi-Agent Workflow | -| **Server** | Azure AI Agent Server SDK (HTTP) | -| **Debug** | AI Toolkit Agent Inspector + VSCode | -| **Best For** | Enterprise agents with type safety, checkpointing, orchestration | - -## When to Use This Skill - -Use when the user wants to: - -- **Create** a new AI agent or agentic application -- **Scaffold** an agent with tools (MCP, function calling) -- **Build** multi-agent workflows with orchestration patterns -- **Add** HTTP server mode to an existing agent -- **Configure** F5/debug support for VSCode - -## Defaults - -- **Language**: Python -- **SDK**: Microsoft Agent Framework (pin version `1.0.0b260107`) -- **Server**: HTTP via Azure AI Agent Server SDK -- **Environment**: Virtual environment (create or detect existing) - -## References - -| Topic | File | Description | -|-------|------|-------------| -| Server Pattern | [references/agent-as-server.md](references/agent-as-server.md) | HTTP server wrapping (production) | -| Debug Setup | [references/debug-setup.md](references/debug-setup.md) | VS Code configs for Agent Inspector | -| Agent Samples | [references/agent-samples.md](references/agent-samples.md) | Single agent, tools, MCP, threads | -| Workflow Basics | [references/workflow-basics.md](references/workflow-basics.md) | Executor types, handler signatures, edges, WorkflowBuilder — start here for any workflow | -| Workflow Agents | [references/workflow-agents.md](references/workflow-agents.md) | Agents as executor nodes, linear pipeline, run_stream event consumption | -| Workflow Foundry | [references/workflow-foundry.md](references/workflow-foundry.md) | Foundry agents with bidirectional edges, loop control, register_executor factories | - -> 💡 **Tip:** For advanced patterns (Reflection, Switch-Case, Fan-out/Fan-in, Loop, Human-in-Loop), search `microsoft/agent-framework` on GitHub. - -## MCP Tools - -This skill delegates to `microsoft-foundry` MCP tools for model and project operations: - -| Tool | Purpose | -|------|---------| -| `foundry_models_list` | Browse model catalog for selection | -| `foundry_models_deployments_list` | List deployed models for selection | -| `foundry_resource_get` | Get project endpoint | - -## Creation Workflow - -1. Gather context (read agent-as-server.md + debug-setup.md + code samples) -2. Select model & configure environment -3. Implement agent/workflow code + HTTP server mode + `.vscode/` configs -4. Install dependencies (venv + requirements.txt) -5. Verify startup (Run-Fix loop) -6. Documentation - -### Step 1: Gather Context - -Read reference files based on user's request: - -**Always read these references:** -- Server pattern: **agent-as-server.md** (required — HTTP server is the default) -- Debug setup: **debug-setup.md** (required — always generate `.vscode/` configs) - -**Read the relevant code sample:** -- Code samples: agent-samples.md, workflow-basics.md, workflow-agents.md, or workflow-foundry.md - -**Model Selection**: Use `microsoft-foundry` skill's model catalog to help user select and deploy a model. - -**Recommended**: Search `microsoft/agent-framework` on GitHub for advanced patterns. - -### Step 2: Select Model & Configure Environment - -*Decide on the model BEFORE coding.* - -If user hasn't specified a model, use `microsoft-foundry` skill to list deployed models or help deploy one. - -**ALWAYS create/update `.env` file**: -```bash -FOUNDRY_PROJECT_ENDPOINT= -FOUNDRY_MODEL_DEPLOYMENT_NAME= -``` - -- **Standard flow**: Populate with real values from user's Foundry project -- **Deferred Config**: Use placeholders, remind user to update before running - -### Step 3: Implement Code - -**All three are required by default:** - -1. **Agent/Workflow code**: Use gathered context to structure the agent or workflow -2. **HTTP Server mode**: Wrap with Agent-as-Server pattern from `agent-as-server.md` — this is the default entry point -3. **Debug configs**: Generate `.vscode/launch.json` and `.vscode/tasks.json` using templates from `debug-setup.md` - -> ⚠️ **Warning:** Only skip server mode or debug configs if the user explicitly requests a "minimal" or "no server" setup. - -### Step 4: Install Dependencies - -1. Generate/update `requirements.txt` - ```text - # pin version to avoid breaking changes - - # agent framework - agent-framework-azure-ai==1.0.0b260107 - agent-framework-core==1.0.0b260107 - - # agent server (for HTTP server mode) - azure-ai-agentserver-core==1.0.0b10 - azure-ai-agentserver-agentframework==1.0.0b10 - - # debugging support - debugpy - agent-dev-cli - ``` - -2. Use a virtual environment to avoid polluting the global Python installation - -> ⚠️ **Warning:** Never use bare `python` or `pip` — always use the venv-activated versions or full paths (e.g., `.venv/bin/pip`). - -### Step 5: Verify Startup (Run-Fix Loop) - -Enter a run-fix loop until no startup errors: - -1. Run the main entrypoint using the venv's Python (e.g., `.venv/Scripts/python main.py` on Windows, `.venv/bin/python main.py` on macOS/Linux) -2. **If startup fails**: Fix error → Rerun -3. **If startup succeeds**: Stop server immediately - -**Guardrails**: -- ✅ Perform real run to catch startup errors -- ✅ Cleanup after verification (stop HTTP server) -- ✅ Ignore environment/auth/connection/timeout errors -- ❌ Don't wait for user input -- ❌ Don't create separate test scripts -- ❌ Don't mock configuration - -### Step 6: Documentation - -Create/update `README.md` with setup instructions and usage examples. - -## Error Handling - -| Error | Cause | Resolution | -|-------|-------|------------| -| `ModuleNotFoundError` | Missing SDK | Run `pip install agent-framework-azure-ai==1.0.0b260107` in venv | -| `AgentRunResponseUpdate` not found | Wrong SDK version | Pin to `1.0.0b260107` (breaking rename in newer versions) | -| Agent name validation error | Invalid characters | Use alphanumeric + hyphens, start/end with alphanumeric, max 63 chars | -| Async credential error | Wrong import | Use `azure.identity.aio.DefaultAzureCredential` (not `azure.identity`) | diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/agent-as-server.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/agent-as-server.md deleted file mode 100644 index 19c60c6f..00000000 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/agent-as-server.md +++ /dev/null @@ -1,83 +0,0 @@ -# Agent as HTTP Server Best Practices - -Converting an Agent-Framework-based Agent/Workflow/App to run as an HTTP server requires code changes to host the agent as a RESTful HTTP server. - -(This doc applies to Python SDK only) - -## Code Changes - -### Run Workflow as Agent - -Agent Framework provides a way to run a whole workflow as agent, via appending `.as_agent()` to the `WorkflowBuilder`, like: - -```python -agent = ( - WorkflowBuilder() - .add_edge(...) - ... - .set_start_executor(...) - .build() - .as_agent() # here it is -) -``` - -Then, `azure.ai.agentserver.agentframework` package provides way to run above agent as an http server and receives user input direct from http request: - -```text -# requirements.txt -# pin version to avoid breaking changes or compatibility issues -azure-ai-agentserver-agentframework==1.0.0b10 -azure-ai-agentserver-core==1.0.0b10 -``` - -```python -from azure.ai.agentserver.agentframework import from_agent_framework - -# async method -await from_agent_framework(agent).run_async() - -# or, sync method -from_agent_framework(agent).run() -``` - -Notes: -- User may or may not have `azure.ai.agentserver.agentframework` installed, if not, install it via or equivalent with other package managers: - `pip install azure-ai-agentserver-core==1.0.0b10 azure-ai-agentserver-agentframework==1.0.0b10` - -- When changing the startup command line, make sure the http server mode is the default one (without any additional flag), which is better for further development (like local debugging) and deployment (like containerization and deploy to Microsoft Foundry). - -- If loading env variables from `.env` file, like `load_dotenv()`, make sure set `override=True` to let the env variables work in deployed environment, like `load_dotenv(override=True)` - -### Request/Response Requirements - -To handle http request as user input, the workflow's starter executor should have handler to support `list[ChatMessage]` as input, like: - -```python - @handler - async def some_handler(self, messages: list[ChatMessage], ctx: WorkflowContext[...]) -> ...: -``` - -Also, to let http response returns agent output, need to add `AgentRunUpdateEvent` to context, like: - -```python - from agent_framework import AgentRunUpdateEvent, AgentRunResponseUpdate, TextContent, Role - ... - response = await self.agent.run(messages) - for message in response.messages: - if message.role == Role.ASSISTANT: - await ctx.add_event( - AgentRunUpdateEvent( - self.id, - data=AgentRunResponseUpdate( - contents=[TextContent(text=f"Agent: {message.contents[-1].text}")], - role=Role.ASSISTANT, - response_id=str(uuid4()), - ), - ) - ) -``` - -## Notes - -- This step focuses on code changes to prepare an HTTP server-based agent, not actually containerizing or deploying, thus no need to generate extra files. -- Pin `agent-framework` to version `1.0.0b260107` to avoid breaking renaming changes like `AgentRunResponseUpdate`/`AgentResponseUpdate`, `create_agent`/`as_agent`, etc. diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/agent-samples.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/agent-samples.md deleted file mode 100644 index 8f34d754..00000000 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/agent-samples.md +++ /dev/null @@ -1,95 +0,0 @@ -# Python Agent Code Samples - -## Common Patterns - -These patterns are shared across all providers. Define them once and reuse. - -### Tool Definition -``` python -from random import randint -from typing import Annotated - -def get_weather( - location: Annotated[str, "The location to get the weather for."], -) -> str: - """Get the weather for a given location.""" - conditions = ["sunny", "cloudy", "rainy", "stormy"] - return f"The weather in {location} is {conditions[randint(0, 3)]} with a high of {randint(10, 30)}°C." -``` - -### MCP Tools Setup -```python -from agent_framework import MCPStdioTool, ToolProtocol, MCPStreamableHTTPTool -from typing import Any - -def create_mcp_tools() -> list[ToolProtocol | Any]: - return [ - MCPStdioTool( - name="Playwright MCP", - description="provides browser automation capabilities using Playwright", - command="npx", - args=["-y", "@playwright/mcp@latest"] - ), - MCPStreamableHTTPTool( - name="Microsoft Learn MCP", - description="bring trusted and up-to-date information directly from Microsoft's official documentation", - url="https://learn.microsoft.com/api/mcp", - ) - ] -``` - -### Thread Pattern (Multi-turn Conversation) -``` python -# Create a new thread that will be reused -thread = agent.get_new_thread() - -# First conversation -async for chunk in agent.run_stream("What's the weather like in Seattle?", thread=thread): - if chunk.text: - print(chunk.text, end="", flush=True) - -# Second conversation - maintains context -async for chunk in agent.run_stream("Pardon?", thread=thread): - if chunk.text: - print(chunk.text, end="", flush=True) -``` - ---- - -## Foundry - -Connect foundry model using `AzureAIClient`. (Legacy `AzureAIAgentClient` is deprecated, use `AzureAIClient`) - -``` python -from agent_framework.azure import AzureAIClient -from azure.identity.aio import DefaultAzureCredential - -async def main() -> None: - async with ( - DefaultAzureCredential() as credential, - AzureAIClient( - project_endpoint="", - model_deployment_name="", - credential=credential, - ).create_agent( - name="MyAgent", - instructions="You are a helpful agent.", - tools=[get_weather], # add tools - # tools=create_mcp_tools(), # or use MCP tools - ) as agent, - ): - thread = agent.get_new_thread() - async for chunk in agent.run_stream("hello", thread=thread): - if chunk.text: - print(chunk.text, end="", flush=True) -``` - ---- - -## Important Tips - -Agent Framework supports various implementation patterns. These are quite useful tips to ensure stability and avoid common errors: - -- If using `AzureAIClient` (e.g., connect to Foundry project), use `DefaultAzureCredential` from `azure.identity.aio` (Not `azure.identity`) since the client requires async credential. -- Agent instance can be created via either `client.create_agent(...)` method or `ChatAgent(...)` constructor. -- If using `AzureAIClient` to create Foundry agent, the agent name "must start and end with alphanumeric characters, can contain hyphens in the middle, and must not exceed 63 characters". E.g., good names: ["SampleAgent", "agent-1", "myagent"], and bad names: ["-agent", "agent-", "sample_agent"]. diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/debug-setup.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/debug-setup.md deleted file mode 100644 index 1bb4bac5..00000000 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/debug-setup.md +++ /dev/null @@ -1,202 +0,0 @@ -# Agent / Workflow Debugging - -Support debugging for agent-framework-based agents or workflows locally in VSCode. - -For agent as HTTP server, introduces `agentdev` tool, fully integrated with AI Toolkit Agent Inspector for interactive debugging and testing, supporting: -- agent and workflow execution -- visualize interactions and message flows -- monitor and trace multi-agent orchestration patterns -- troubleshoot complex workflow logic - -(This doc applies to Python SDK only) - -## Prerequisites - -- (REQUIRED) Agent or workflow created using agent-framework SDK -- (REQUIRED) Running in HTTP server mode, i.e., using `azure.ai.agentserver.agentframework` SDK. If not, wrap the agent with `from_agent_framework(agent).run_async()` and install `azure-ai-agentserver-agentframework==1.0.0b10`. - -## SDK Installations - -Install `debugpy` for debugging support (used by VSCode Python Debugger Extension): - -```bash -# install the latest one for better compatibility -pip install debugpy -``` - -Then, for HTTP server mode, install `agent-dev-cli` pre-release package (which introduces `agentdev` module and command): - -```bash -pip install agent-dev-cli --pre -``` - -More `agentdev` usages: -```bash -# Run script with agentdev instrumentation -agentdev run my_agent.py -# Specify a custom port -agentdev run my_agent.py --port 9000 -# Enable verbose output -agentdev run my_agent.py --verbose -# Pass arguments to script -agentdev run my_agent.py -- --server-mode --model ... -``` - -## Launch Command - -The agent/workflow could be launched in either HTTP server mode or CLI mode, depending on the code implementation. To work with VSCode Python Debugger, need to wrap via `debugpy` module. - -(Important) By default use the HTTP server mode with `agentdev` for full features. If the agent/workflow code supports CLI mode, could also launch in CLI mode for simpler debugging. - -```bash -# HTTP server mode sample launch command -python .py --server - -# Wrapped with debugpy and agentdev -python -m debugpy --listen 127.0.0.1:5679 -m agentdev run .py --verbose --port 8087 -- --server - -# CLI mode sample launch command -python .py --cli - -# Wrapped with debugpy only -python -m debugpy --listen 127.0.0.1:5679 .py --cli -``` - -## Example - -Example configuration files for VSCode to enable debugging support. - -### tasks.json - -Run agent with debugging enabled. Note - no need to install dependencies via task (users may have their own python env). - -```json -{ - "version": "2.0.0", - "tasks": [ - { - "label": "Validate prerequisites", - // AI Toolkit built-in task to check port occupancy. Fixed type and command names, fixed args schema, but port numbers list can be customized - "type": "aitk", - "command": "debug-check-prerequisites", - "args": { - "portOccupancy": [5679, 8087] - } - }, - { - // preferred - run agent as HTTP server - "label": "Run Agent/Workflow HTTP Server", - "type": "shell", - // use `${command:python.interpreterPath}` to point to current user's python env - "command": "${command:python.interpreterPath} -m debugpy --listen 127.0.0.1:5679 -m agentdev run --verbose --port 8087 -- --server", - "isBackground": true, - "options": { "cwd": "${workspaceFolder}" }, - "dependsOn": ["Validate prerequisites"], - // problem matcher to capture server startup - "problemMatcher": { - "pattern": [{ "regexp": "^.*$", "file": 0, "location": 1, "message": 2 }], - "background": { - "activeOnStart": true, - "beginsPattern": ".*", - // the fixed pattern for agent hosting startup - "endsPattern": "Application startup complete|running on|Started server process" - } - } - }, - { - // for HTTP server mode - open the inspector after server is up - "label": "Open Agent Inspector", - "type": "shell", - // the fixed command to open the inspector with port specified by arguments - "command": "echo '${input:openAgentInspector}'", - "presentation": { "reveal": "never" }, - "dependsOn": ["Run Agent/Workflow HTTP Server"] - }, - { - // alternative - run agent in CLI mode - "label": "Run Agent/Workflow in Terminal", - "type": "shell", - // use `${command:python.interpreterPath}` to point to current user's python env - "command": "${command:python.interpreterPath} -m debugpy --listen 127.0.0.1:5679 --cli", - "isBackground": true, - "options": { "cwd": "${workspaceFolder}" }, - // problem matcher to capture startup - "problemMatcher": { - "pattern": [{ "regexp": "^.*$", "file": 0, "location": 1, "message": 2 }], - "background": { - "activeOnStart": true, - "beginsPattern": ".*", - // the pattern for startup - "endsPattern": "Application startup complete|running on|Started server process" - } - } - }, - { - // util task for gracefully terminating - "label": "Terminate All Tasks", - "command": "echo ${input:terminate}", - "type": "shell", - "problemMatcher": [] - } - ], - "inputs": [ - { - "id": "openAgentInspector", - "type": "command", - "command": "ai-mlstudio.openTestTool", - // This port should match exactly the "--port" argument of the `agentdev` module - "args": { "triggeredFrom": "tasks", "port": 8087 } - }, - { - "id": "terminate", - "type": "command", - "command": "workbench.action.tasks.terminate", - "args": "terminateAll" - } - ] -} -``` - -### launch.json - -Attach debugger to the running agent/workflow. - -```json -{ - "version": "0.2.0", - "configurations": [ - { - // preferred - debug HTTP server mode - "name": "Debug Local Agent/Workflow HTTP Server", - "type": "debugpy", - "request": "attach", - "connect": { - "host": "localhost", - // the same debugpy port as in tasks.json - "port": 5679 - }, - // run the tasks before launching debugger - "preLaunchTask": "Open Agent Inspector", - "internalConsoleOptions": "neverOpen", - // terminate all tasks after debugging session ends - "postDebugTask": "Terminate All Tasks" - }, - { - // alternative - debug CLI mode - "name": "Debug Local Agent/Workflow in Terminal", - "type": "debugpy", - "request": "attach", - "connect": { - "host": "localhost", - // the same debugpy port as in tasks.json - "port": 5679 - }, - // run the tasks before launching debugger - "preLaunchTask": "Run Agent/Workflow in Terminal", - "internalConsoleOptions": "neverOpen", - // terminate all tasks after debugging session ends - "postDebugTask": "Terminate All Tasks" - } - ] -} -``` diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/workflow-agents.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/workflow-agents.md deleted file mode 100644 index c53f623d..00000000 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/workflow-agents.md +++ /dev/null @@ -1,75 +0,0 @@ -# Workflow with Agents and Streaming - -Wrap chat agents (via `AzureAIClient`) inside workflow executors and consume streaming events. Use this when building workflows where each node is backed by an AI agent. - -> 💡 **Tip:** Use `DefaultAzureCredential` from `azure.identity.aio` (not `azure.identity`) — `AzureAIClient` requires async credentials. - -## Pattern: Writer → Reviewer Pipeline - -A Writer agent generates content, then a Reviewer agent finalizes the result. Uses `run_stream` to observe events in real-time. - -```python -from agent_framework import ( - ChatAgent, ChatMessage, Executor, ExecutorFailedEvent, - WorkflowBuilder, WorkflowContext, WorkflowFailedEvent, - WorkflowOutputEvent, WorkflowRunState, WorkflowStatusEvent, handler, -) -from agent_framework.azure import AzureAIClient -from azure.identity.aio import DefaultAzureCredential -from typing_extensions import Never - -class Writer(Executor): - agent: ChatAgent - - def __init__(self, client: AzureAIClient, id: str = "writer"): - self.agent = client.create_agent( - name="ContentWriterAgent", - instructions="You are an excellent content writer.", - ) - super().__init__(id=id) - - @handler - async def handle(self, message: ChatMessage, ctx: WorkflowContext[list[ChatMessage]]) -> None: - messages: list[ChatMessage] = [message] - response = await self.agent.run(messages) - messages.extend(response.messages) - await ctx.send_message(messages) - -class Reviewer(Executor): - agent: ChatAgent - - def __init__(self, client: AzureAIClient, id: str = "reviewer"): - self.agent = client.create_agent( - name="ContentReviewerAgent", - instructions="You are an excellent content reviewer.", - ) - super().__init__(id=id) - - @handler - async def handle(self, messages: list[ChatMessage], ctx: WorkflowContext[Never, str]) -> None: - response = await self.agent.run(messages) - await ctx.yield_output(response.text) - -async def main(): - client = AzureAIClient(credential=DefaultAzureCredential()) - writer = Writer(client) - reviewer = Reviewer(client) - workflow = WorkflowBuilder().set_start_executor(writer).add_edge(writer, reviewer).build() - - async for event in workflow.run_stream( - ChatMessage(role="user", text="Create a slogan for a new electric SUV.") - ): - if isinstance(event, WorkflowOutputEvent): - print(f"Output: {event.data}") - elif isinstance(event, WorkflowStatusEvent): - print(f"State: {event.state}") - elif isinstance(event, (ExecutorFailedEvent, WorkflowFailedEvent)): - print(f"Error: {event.details.message}") -``` - -Sample output: -``` -State: WorkflowRunState.IN_PROGRESS -Output: Drive the Future. Affordable Adventure, Electrified. -State: WorkflowRunState.IDLE -``` diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/workflow-basics.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/workflow-basics.md deleted file mode 100644 index b17d88a1..00000000 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/workflow-basics.md +++ /dev/null @@ -1,56 +0,0 @@ -# Python Workflow Basics - -Executors, edges, and the WorkflowBuilder API — the foundation for all workflow patterns. - -For more patterns, SEARCH the GitHub repository (github.com/microsoft/agent-framework) to get code snippets like: Agent as Edge, Custom Agent Executor, Workflow as Agent, Reflection, Condition, Switch-Case, Fan-out/Fan-in, Loop, Human in Loop, Concurrent, etc. - -## Executor Node Definitions - -| Style | When to Use | Example | -|-------|-------------|---------| -| `Executor` subclass + `@handler` | Nodes needing state or lifecycle hooks | `class MyNode(Executor)` | -| `@executor` decorator on function | Simple stateless steps | `@executor(id="my_step")` | -| `AgentExecutor(agent=..., id=...)` | Wrapping an existing agent (not subclassing) | `AgentExecutor(agent=my_agent, id="a1")` | -| Agent directly | Using agent as a node | `client.create_agent(name="...", ...)` (must provide `name`) | - -## Handler Signature - -``` -(input: T, ctx: WorkflowContext[T_Out, T_W_Out]) -> None -``` - -- `T` = typed input from upstream node -- `ctx.send_message(T_Out)` → forwards to downstream nodes -- `ctx.yield_output(T_W_Out)` → yields workflow output (terminal nodes) -- `WorkflowContext[T_Out]` = shorthand for `WorkflowContext[T_Out, Never]` -- `WorkflowContext` (no params) = `WorkflowContext[Never, Never]` - -> ⚠️ **Warning:** Previous node's output type must match next node's input type — check carefully when mixing node styles. - -## Code Sample - -```python -from typing_extensions import Never -from agent_framework import Executor, WorkflowBuilder, WorkflowContext, executor, handler - -class UpperCase(Executor): - def __init__(self, id: str): - super().__init__(id=id) - - @handler - async def to_upper_case(self, text: str, ctx: WorkflowContext[str]) -> None: - await ctx.send_message(text.upper()) - -@executor(id="reverse_text_executor") -async def reverse_text(text: str, ctx: WorkflowContext[Never, str]) -> None: - await ctx.yield_output(text[::-1]) - -async def main(): - upper_case = UpperCase(id="upper_case_executor") - workflow = WorkflowBuilder().add_edge(upper_case, reverse_text).set_start_executor(upper_case).build() - - # run() for simplicity; run_stream() is preferred for production - events = await workflow.run("hello world") - print(events.get_outputs()) # ['DLROW OLLEH'] - print(events.get_final_state()) # WorkflowRunState.IDLE -``` diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/workflow-foundry.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/workflow-foundry.md deleted file mode 100644 index cd73b931..00000000 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/agent-framework/references/workflow-foundry.md +++ /dev/null @@ -1,105 +0,0 @@ -# Foundry Multi-Agent Workflow - -Multi-agent loop workflow using Foundry project endpoint with `AzureAIClient`. Use this when building workflows with bidirectional edges (loops) and turn-based agent interaction. - -> ⚠️ **Warning:** Use Foundry project endpoint, NOT Azure OpenAI endpoint. Use `AzureAIClient` (v2), not legacy `AzureAIAgentClient` (v1). - -> 💡 **Tip:** Agent names: alphanumeric + hyphens, start/end alphanumeric, max 63 chars. - -## Pattern: Student-Teacher Loop - -Two Foundry agents interact in a loop with turn-based control. - -```python -from agent_framework import ( - AgentRunEvent, ChatAgent, ChatMessage, Executor, Role, - WorkflowBuilder, WorkflowContext, WorkflowOutputEvent, handler, -) -from agent_framework.azure import AzureAIClient -from azure.identity.aio import DefaultAzureCredential - -ENDPOINT = "" -MODEL_DEPLOYMENT_NAME = "" - -class StudentAgentExecutor(Executor): - agent: ChatAgent - - def __init__(self, agent: ChatAgent, id="student"): - self.agent = agent - super().__init__(id=id) - - @handler - async def handle_teacher_question( - self, messages: list[ChatMessage], ctx: WorkflowContext[list[ChatMessage]] - ) -> None: - response = await self.agent.run(messages) - messages.extend(response.messages) - await ctx.send_message(messages) - -class TeacherAgentExecutor(Executor): - turn_count: int = 0 - agent: ChatAgent - - def __init__(self, agent: ChatAgent, id="teacher"): - self.agent = agent - super().__init__(id=id) - - @handler - async def handle_start_message( - self, message: str, ctx: WorkflowContext[list[ChatMessage]] - ) -> None: - messages: list[ChatMessage] = [ChatMessage(Role.USER, text=message)] - response = await self.agent.run(messages) - messages.extend(response.messages) - await ctx.send_message(messages) - - @handler - async def handle_student_answer( - self, messages: list[ChatMessage], ctx: WorkflowContext[list[ChatMessage], str] - ) -> None: - self.turn_count += 1 - if self.turn_count >= 5: - await ctx.yield_output("Done!") - return - response = await self.agent.run(messages) - messages.extend(response.messages) - await ctx.send_message(messages) - -async def main(): - async with ( - DefaultAzureCredential() as credential, - AzureAIClient( - project_endpoint=ENDPOINT, - model_deployment_name=MODEL_DEPLOYMENT_NAME, - credential=credential, - ).create_agent( - name="StudentAgent", - instructions="You are Jamie, a student. Answer questions briefly.", - ) as student_agent, - AzureAIClient( - project_endpoint=ENDPOINT, - model_deployment_name=MODEL_DEPLOYMENT_NAME, - credential=credential, - ).create_agent( - name="TeacherAgent", - instructions="You are Dr. Smith. Ask ONE simple question at a time.", - ) as teacher_agent - ): - # Use factories for cleaner state management in production - workflow = ( - WorkflowBuilder() - .register_executor(lambda: StudentAgentExecutor(student_agent), name="Student") - .register_executor(lambda: TeacherAgentExecutor(teacher_agent), name="Teacher") - .add_edge("Student", "Teacher") - .add_edge("Teacher", "Student") - .set_start_executor("Teacher") - .build() - ) - - async for event in workflow.run_stream("Start the quiz session."): - if isinstance(event, AgentRunEvent): - print(f"\n{event.executor_id}: {event.data}") - elif isinstance(event, WorkflowOutputEvent): - print(f"\nDone: {event.data}") - break -``` diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/create-prompt.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/create-prompt.md new file mode 100644 index 00000000..46fa9a3e --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/create-prompt.md @@ -0,0 +1,89 @@ +# Create Prompt Agent + +Create and manage prompt agents in Azure Foundry Agent Service using MCP tools or Python SDK. For hosted agents (container-based), see [create.md](create.md). + +## Quick Reference + +| Property | Value | +|----------|-------| +| **Agent Type** | Prompt (`kind: "prompt"`) | +| **Primary Tool** | Foundry MCP server (`foundry_agents_*`) | +| **Fallback SDK** | `azure-ai-projects` v2.x preview | +| **Auth** | `DefaultAzureCredential` / `az login` | + +## Workflow + +``` +User Request (create/list/get/update/delete agent) + │ + ▼ +Step 1: Resolve project context (endpoint + credentials) + │ + ▼ +Step 2: Try MCP tool for the operation + │ ├─ ✅ MCP available → Execute via MCP tool → Done + │ └─ ❌ MCP unavailable → Continue to Step 3 + │ + ▼ +Step 3: Fall back to SDK + │ Read references/sdk-operations.md for code + │ + ▼ +Step 4: Execute and confirm result +``` + +### Step 1: Resolve Project Context + +The user needs a Foundry project endpoint. Check for: + +1. `PROJECT_ENDPOINT` environment variable +2. Ask the user for their project endpoint +3. Use `foundry_resource_get` MCP tool to discover it + +Endpoint format: `https://.services.ai.azure.com/api/projects/` + +### Step 2: Create Agent (MCP — Preferred) + +For a **prompt agent**: +- Provide: agent name, model deployment name, instructions +- Optional: tools (code interpreter, file search, function calling, web search, Bing grounding, memory) + +For a **workflow**: +- Workflows are created in the Foundry portal visual builder +- Use MCP to create the individual agents that participate in the workflow +- Direct the user to the Foundry portal for workflow assembly + +### Step 3: SDK Fallback + +If MCP tools are unavailable, use the `azure-ai-projects` SDK: +- See [SDK Operations](references/sdk-operations.md) for create, list, update, delete code samples +- See [Agent Tools](references/agent-tools.md) for adding tools to agents + +### Step 4: Add Tools (Optional) + +> ⚠️ **MANDATORY:** Before configuring any tool, **read its reference documentation** linked below to understand prerequisites, required parameters, and setup steps. Do not attempt to add a tool without first reviewing its reference. + +| Tool Category | Reference | +|---------------|-----------| +| Code Interpreter, Function Calling | [Simple Tools](references/agent-tools.md) | +| File Search (requires vector store) | [File Search](references/tool-file-search.md) | +| Web Search (default, no setup needed) | [Web Search](references/tool-web-search.md) | +| Bing Grounding (explicit request only) | [Bing Grounding](references/tool-bing-grounding.md) | +| Azure AI Search (private data) | [Azure AI Search](references/tool-azure-ai-search.md) | +| MCP Servers | [MCP Tool](references/tool-mcp.md) | +| Memory (persistent across sessions) | [Memory](references/tool-memory.md) | +| Connections (for tools that need them) | [Project Connections](../../project/connections.md) | + +> ⚠️ **Web Search Default:** Use `WebSearchPreviewTool` for web search. Only use `BingGroundingAgentTool` when the user explicitly requests Bing Grounding. + +## Error Handling + +| Error | Cause | Resolution | +|-------|-------|------------| +| Agent creation fails | Missing model deployment | Deploy a model first via `foundry_models_deploy` or portal | +| MCP tool not found | MCP server not running | Fall back to SDK — see [SDK Operations](references/sdk-operations.md) | +| Permission denied | Insufficient RBAC | Need `Azure AI User` role on the project | +| Agent name conflict | Name already exists | Use a unique name or update the existing agent | +| Tool not available | Tool not configured for project | Verify tool prerequisites (e.g., Bing resource for grounding) | +| SDK version mismatch | Using 1.x instead of 2.x | Install `azure-ai-projects --pre` for v2.x preview | +| Tenant mismatch | MCP token tenant differs from resource tenant | Fall back to SDK — `DefaultAzureCredential` resolves the correct tenant | diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/create.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/create.md new file mode 100644 index 00000000..1f9042e5 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/create.md @@ -0,0 +1,239 @@ +# Create Hosted Agent Application + +Create new hosted agent applications for Microsoft Foundry, or convert existing agent projects to be Foundry-compatible using the hosting adapter. + +## Quick Reference + +| Property | Value | +|----------|-------| +| **Samples Repo** | `microsoft-foundry/foundry-samples` | +| **Python Samples** | `samples/python/hosted-agents/{framework}/` | +| **C# Samples** | `samples/csharp/hosted-agents/{framework}/` | +| **Hosted Agents Docs** | https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents | +| **Best For** | Creating new or converting existing agent projects for Foundry | + +## When to Use This Skill + +- Create a new hosted agent application from scratch (greenfield) +- Start from an official sample and customize it +- Convert an existing agent project to be Foundry-compatible (brownfield) +- Help user choose a framework or sample for their agent + +## Workflow + +### Step 1: Determine Scenario + +Check the user's workspace for existing agent project indicators: + +- **No agent-related code found** → **Greenfield**. Proceed to Greenfield Workflow (Step 2). +- **Existing agent code present** → **Brownfield**. Proceed to Brownfield Workflow. + +### Step 2: Gather Requirements (Greenfield) + +If the user hasn't already specified, use `ask_user` to collect: + +**Framework:** + +| Framework | Python Path | C# Path | +|-----------|------------|---------| +| Microsoft Agent Framework (default) | `agent-framework` | `AgentFramework` | +| LangGraph | `langgraph` | ❌ Python only | +| Custom | `custom` | `AgentWithCustomFramework` | + +**Language:** Python (default) or C#. + +> ⚠️ **Warning:** LangGraph is Python-only. For C# + LangGraph, suggest Agent Framework or Custom instead. + +If user has no specific preference, suggest Microsoft Agent Framework + Python as defaults. + +### Step 3: Browse and Select Sample + +List available samples using the GitHub API: + +``` +GET https://api.github.com/repos/microsoft-foundry/foundry-samples/contents/samples/{language}/hosted-agents/{framework} +``` + +If the user has specified any information on what they want their agent to do, just choose the most relevant or most simple sample to start with. Only if user has not given any preferences, present the sample directories to the user and help them choose based on their requirements (e.g., RAG, tools, multi-agent workflows, HITL). + +### Step 4: Download Sample Files + +Download only the selected sample directory — do NOT clone the entire repo. Preserve the directory structure by creating subdirectories as needed. + +**Using `gh` CLI (preferred if available):** +```bash +gh api repos/microsoft-foundry/foundry-samples/contents/samples/{language}/hosted-agents/{framework}/{sample} \ + --jq '.[] | select(.type=="file") | .download_url' | while read url; do + filepath="${url##*/samples/{language}/hosted-agents/{framework}/{sample}/}" + mkdir -p "$(dirname "$filepath")" + curl -sL "$url" -o "$filepath" +done +``` + +**Using curl (fallback):** +```bash +curl -s "https://api.github.com/repos/microsoft-foundry/foundry-samples/contents/samples/{language}/hosted-agents/{framework}/{sample}" | \ + jq -r '.[] | select(.type=="file") | .path + "\t" + .download_url' | while IFS=$'\t' read path url; do + relpath="${path#samples/{language}/hosted-agents/{framework}/{sample}/}" + mkdir -p "$(dirname "$relpath")" + curl -sL "$url" -o "$relpath" + done +``` + +For nested directories, recursively fetch the GitHub contents API for entries where `type == "dir"` and repeat the download for each. + +### Step 5: Customize and Implement + +1. Read the sample's README.md to understand its structure +2. Read the sample code to understand patterns and dependencies used +3. If using Agent Framework, follow the best practices in [references/agentframework.md](references/agentframework.md) +4. Implement the user's specific requirements on top of the sample +5. Update configuration (`.env`, dependency files) as needed. +6. Ensure the project is in a runnable state + +### Step 6: Verify Startup + +1. Install dependencies (use virtual environment for Python) +2. Ask user to provide values for .env variables if placeholders were used using `ask_user` tool. +3. Run the main entrypoint +4. Fix startup errors and retry if needed +5. Send a test request to the agent. The agent will support OpenAI Responses schema. +6. Fix any errors from the test request and retry until it succeeds +7. Once startup and test request succeed, stop the server to prevent resource usage + +**Guardrails:** +- ✅ Perform real run to catch startup errors +- ✅ Cleanup after verification (stop server) +- ✅ Ignore auth/connection/timeout errors (expected without Azure config) +- ❌ Don't wait for user input or create test scripts + +## Brownfield Workflow: Convert Existing Agent to Hosted Agent + +Use this workflow when the user has an existing agent project that needs to be made compatible with Foundry hosted agent deployment. The key requirement is wrapping the agent with the appropriate **hosting adapter** package, which converts any agent into an HTTP service compatible with the Foundry Responses API. + +### Step B1: Analyze Existing Project + +Scan the project to determine: + +1. **Language** — Python (look for `requirements.txt`, `pyproject.toml`, `*.py`) or C# (look for `*.csproj`, `*.cs`) +2. **Framework** — Identify which agent framework is in use: + +| Indicator | Framework | +|-----------|-----------| +| Imports from `agent_framework` or `Microsoft.Agents.AI` | Microsoft Agent Framework | +| Imports from `langgraph`, `langchain` | LangGraph | +| No recognized framework imports, or other frameworks (e.g., Semantic Kernel, AutoGen) | Custom | + +3. **Entry point** — Identify the main script/entrypoint that creates and runs the agent +4. **Agent object** — Identify the agent instance that needs to be wrapped (e.g., a `BaseAgent` subclass, a compiled `StateGraph`, or an existing server/app) + +### Step B2: Add Hosting Adapter Dependency + +Add the correct adapter package based on framework and language. Get the latest version from the package registry — do not hardcode versions. + +**Python adapter packages:** + +| Framework | Package | +|-----------|---------| +| Microsoft Agent Framework | `azure-ai-agentserver-agentframework` | +| LangGraph | `azure-ai-agentserver-langgraph` | +| Custom | `azure-ai-agentserver-core` | + +**.NET adapter packages:** + +| Framework | Package | +|-----------|---------| +| Microsoft Agent Framework | `Azure.AI.AgentServer.AgentFramework` | +| Custom | `Azure.AI.AgentServer.Core` | + +Add the package to the project's dependency file (`requirements.txt`, `pyproject.toml`, or `.csproj`). For Python, also add `python-dotenv` if not present. + +### Step B3: Wrap Agent with Hosting Adapter + +Modify the project's main entrypoint to wrap the existing agent with the adapter. The approach differs by framework: + +**Microsoft Agent Framework (Python):** +- Import `from_agent_framework` from the adapter package +- Pass the agent instance (a `BaseAgent` subclass) to the adapter +- Call `.run()` on the adapter as the default entrypoint +- The agent must implement both `run()` and `run_stream()` methods + +**LangGraph (Python):** +- Import `from_langgraph` from the adapter package +- Pass the compiled `StateGraph` to the adapter +- Call `.run()` on the adapter as the default entrypoint + +**Custom code (Python):** +- Import `FoundryCBAgent` from the core adapter package +- Create a class that extends `FoundryCBAgent` +- Implement the `agent_run()` method which receives an `AgentRunContext` and returns either an `OpenAIResponse` (non-streaming) or `AsyncGenerator[ResponseStreamEvent]` (streaming) +- The agent must handle the Foundry request/response protocol manually — refer to the [custom sample](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents/custom) for the exact interface +- Instantiate and call `.run()` as the default entrypoint + +**Custom code (C#):** +- Use `AgentServerApplication.RunAsync()` with dependency injection to register an `IAgentInvocation` implementation +- Refer to the [C# custom sample](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/csharp/hosted-agents/AgentWithCustomFramework) for the exact interface + +> ⚠️ **Warning:** The adapter MUST be the default entrypoint (no flags required to start). This is required for both local debugging and containerized deployment. + +### Step B4: Configure Environment + +1. Create or update a `.env` file with required environment variables (project endpoint, model deployment name, etc.) +2. For Python: ensure the code uses `load_dotenv()` so Foundry-injected environment variables is available at runtime. +3. If the project uses Azure credentials: ensure Python uses `azure.identity.aio.DefaultAzureCredential` (async version) for **local development**, not `azure.identity.DefaultAzureCredential`. In production, use `ManagedIdentityCredential`. See [auth-best-practices.md](../../references/auth-best-practices.md) + +### Step B5: Create agent.yaml + +Create an `agent.yaml` file in the project root. This file defines the agent's metadata and deployment configuration for Foundry. Required fields: + +- `name` — Unique identifier (alphanumeric + hyphens, max 63 chars) +- `description` — What the agent does +- `template.kind` — Must be `hosted` +- `template.protocols` — Must include `responses` protocol v1 +- `template.environment_variables` — List all environment variables the agent needs at runtime + +Refer to any sample's `agent.yaml` in the [foundry-samples repo](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents) for the exact schema. + +### Step B6: Create Dockerfile + +Create a `Dockerfile` if one doesn't exist. Requirements: + +- Base image appropriate for the language (e.g., `python:3.12-slim` for Python, `mcr.microsoft.com/dotnet/sdk` for C#) +- Copy source code into the container +- Install dependencies +- Expose port **8088** (the adapter's default port) +- Set the main entrypoint as the CMD + +> ⚠️ **Warning:** When building, MUST use `--platform linux/amd64`. Hosted agents run on Linux AMD64 infrastructure. Images built for other architectures (e.g., ARM64 on Apple Silicon) will fail. + +Refer to any sample's `Dockerfile` in the [foundry-samples repo](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents) for the exact pattern. + +### Step B7: Test Locally + +1. Install dependencies (use virtual environment for Python) +2. Run the main entrypoint — the adapter should start an HTTP server on `localhost:8088` +3. Send a test request: `POST http://localhost:8088/responses` with body `{"input": "hello"}` +4. Verify the response follows the OpenAI Responses API format +5. Fix any errors and retry until the test request succeeds +6. Stop the server + +> 💡 **Tip:** If auth/connection errors occur for Azure services, that's expected without real Azure credentials configured. The key validation is that the HTTP server starts and accepts requests. + +## Common Guidelines + +IMPORTANT: YOU MUST FOLLOW THESE. + +Apply these to both greenfield and brownfield projects: + +1. **Logging** — Implement proper logging using the language's standard logging framework (Python `logging` module, .NET `ILogger`). Hosted agents stream container stdout/stderr logs to Foundry, so all log output is visible via the troubleshoot workflow. Use structured log levels (INFO, WARNING, ERROR) and include context like request IDs and agent names. + +2. **Framework-specific best practices** — When using Agent Framework, read the [Agent Framework best practices](references/agentframework.md) for hosting adapter setup, credential patterns, and debugging guidance. + +## Error Handling + +| Error | Cause | Resolution | +|-------|-------|------------| +| GitHub API rate limit | Too many requests | Authenticate with `gh auth login` | +| `gh` not available | CLI not installed | Use curl REST API fallback | +| Sample not found | Path changed in repo | List parent directory to discover current samples | +| Dependency install fails | Version conflicts | Use versions from sample's own dependency file | diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/agent-tools.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/agent-tools.md new file mode 100644 index 00000000..ce0eb5c8 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/agent-tools.md @@ -0,0 +1,45 @@ +# Agent Tools — Simple Tools + +Add tools to agents to extend capabilities. This file covers tools that work without external connections. For tools requiring connections/RBAC setup, see: +- [Web Search tool](tool-web-search.md) — real-time public web search with citations (default for web search) +- [Bing Grounding tool](tool-bing-grounding.md) — web search via dedicated Bing resource (only when explicitly requested) +- [Azure AI Search tool](tool-azure-ai-search.md) — private data grounding with vector search +- [MCP tool](tool-mcp.md) — remote Model Context Protocol servers + +## Code Interpreter + +Enables agents to write and run Python in a sandboxed environment. Supports data analysis, chart generation, and file processing. Has [additional charges](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) beyond token-based fees. + +> Sessions: 1-hour active / 30-min idle timeout. Each conversation = separate billable session. + +For code samples, see: [Code Interpreter tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/code-interpreter?view=foundry) + +## Function Calling + +Define custom functions the agent can invoke. Your app executes the function and returns results. Runs expire 10 minutes after creation — return tool outputs promptly. + +> **Security:** Treat tool arguments as untrusted input. Don't pass secrets in tool output. Use `strict=True` for schema validation. + +For code samples, see: [Function Calling tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/function-calling?view=foundry) + +## Tool Summary + +| Tool | Connection? | Reference | +|------|-------------|-----------| +| `CodeInterpreterTool` | No | This file | +| `FileSearchTool` | No (vector store required) | [tool-file-search.md](tool-file-search.md) | +| `FunctionTool` | No | This file | +| `WebSearchPreviewTool` | No | [tool-web-search.md](tool-web-search.md) | +| `BingGroundingAgentTool` | Yes (Bing) | [tool-bing-grounding.md](tool-bing-grounding.md) | +| `AzureAISearchAgentTool` | Yes (Search) | [tool-azure-ai-search.md](tool-azure-ai-search.md) | +| `MCPTool` | Optional | [tool-mcp.md](tool-mcp.md) | + +> ⚠️ **Default for web search:** Use `WebSearchPreviewTool` unless the user explicitly requests Bing Grounding or Bing Custom Search. + +> Combine multiple tools on one agent. The model decides which to invoke. + +## References + +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) +- [Code Interpreter](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/code-interpreter?view=foundry) +- [Function Calling](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/function-calling?view=foundry) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/agentframework.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/agentframework.md new file mode 100644 index 00000000..51293695 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/agentframework.md @@ -0,0 +1,92 @@ +# Microsoft Agent Framework — Best Practices for Hosted Agents + +Best practices when building hosted agents with Microsoft Agent Framework for deployment to Foundry Agent Service. + +## Official Resources + +| Resource | URL | +|----------|-----| +| **GitHub Repo** | https://github.com/microsoft/agent-framework | +| **MS Learn Overview** | https://learn.microsoft.com/agent-framework/overview/agent-framework-overview | +| **Quick Start** | https://learn.microsoft.com/agent-framework/tutorials/quick-start | +| **User Guide** | https://learn.microsoft.com/agent-framework/user-guide/overview | +| **Hosted Agents Concepts** | https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents | +| **Python Samples (MAF repo)** | https://github.com/microsoft/agent-framework/tree/main/python/samples | +| **.NET Samples (MAF repo)** | https://github.com/microsoft/agent-framework/tree/main/dotnet/samples | +| **PyPI** | https://pypi.org/project/agent-framework/ | +| **NuGet** | https://www.nuget.org/profiles/MicrosoftAgentFramework/ | + +## Installation + +**Python:** `pip install agent-framework --pre` (installs all sub-packages) + +**.NET:** `dotnet add package Microsoft.Agents.AI` + +> ⚠️ **Warning:** Always pin specific pre-release versions. Use `--pre` to get the latest. Check the [PyPI page](https://pypi.org/project/agent-framework/) or [NuGet profile](https://www.nuget.org/profiles/MicrosoftAgentFramework/) for current stable versions. + +## Hosting Adapter + +Hosted agents must expose an HTTP server using the hosting adapter. This enables local testing and Foundry deployment with the same code. + +**Python adapter packages:** `azure-ai-agentserver-core`, `azure-ai-agentserver-agentframework` + +**.NET adapter packages:** `Azure.AI.AgentServer.Core`, `Azure.AI.AgentServer.AgentFramework` + +The adapter handles protocol translation between Foundry request/response formats and your framework's native data structures, including conversation management, message serialization, and streaming. + +> 💡 **Tip:** Make HTTP server mode the default entrypoint (no flags needed). This simplifies both local debugging and containerized deployment. + +## Key Patterns + +### Python: Async Credentials + +For **local development**, use `DefaultAzureCredential` from `azure.identity.aio` (not `azure.identity`) — `AzureAIClient` requires async credentials. In production, use `ManagedIdentityCredential` from `azure.identity.aio`. See [auth-best-practices.md](../../../references/auth-best-practices.md). + +### Python: Environment Variables + +Always use `load_dotenv(override=False)` so environment variables set by Foundry at runtime take precedence over local `.env` values. + +Required `.env` variables: +- `FOUNDRY_PROJECT_ENDPOINT` — project endpoint URL +- `FOUNDRY_MODEL_DEPLOYMENT_NAME` — model deployment name + +### Authentication + +If explicitly asked to use API key instead of managed identity, then use AzureOpenAIResponsesClient and pass in api_key parameter to it. + +### Agent Naming Rules + +Agent names must: start/end with alphanumeric characters, may contain hyphens in the middle, max 63 characters. Examples: `MyAgent`, `agent-1`. Invalid: `-agent`, `agent-`, `sample_agent`. + +### Python: Virtual Environment + +Always use a virtual environment. Never use bare `python` or `pip` — use venv-activated versions or full paths (e.g., `.venv/bin/pip`). + +## Workflow Patterns + +Agent Framework supports single-agent and multi-agent workflow patterns using graph-based orchestration: + +- **Single Agent** — Basic agent with tools, RAG, or MCP integration +- **Multi-Agent Workflow** — Graph-based orchestration connecting multiple agents and deterministic functions +- **Advanced Patterns** — Reflection, switch-case, fan-out/fan-in, loop, human-in-the-loop + +For workflow samples and advanced patterns, search the [Agent Framework GitHub repo](https://github.com/microsoft/agent-framework). + +## Debugging + +Use [AI Toolkit for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio) with the `agentdev` CLI tool for interactive debugging: + +1. Install `debugpy` for VS Code Python Debugger support +2. Install `agent-dev-cli` (pre-release) for the `agentdev` command +3. Key debug tasks: `agentdev run .py --port 8087` starts the agent HTTP server, `debugpy --listen 127.0.0.1:5679` attaches the debugger, and the `ai-mlstudio.openTestTool` VS Code command opens the Agent Inspector UI + +For VS Code `launch.json` and `tasks.json` configuration templates, see [AI Toolkit Agent Inspector — Configure debugging manually](https://github.com/microsoft/vscode-ai-toolkit/blob/main/doc/agent-test-tool.md#configure-debugging-manually). + +## Common Errors + +| Error | Cause | Fix | +|-------|-------|-----| +| `ModuleNotFoundError` | Missing SDK | `pip install agent-framework --pre` in venv | +| Async credential error | Wrong import | Use `azure.identity.aio.DefaultAzureCredential` (local dev) or `azure.identity.aio.ManagedIdentityCredential` (production) | +| Agent name validation error | Invalid characters | Use alphanumeric + hyphens, start/end alphanumeric, max 63 chars | +| Hosting adapter not found | Missing package | Install `azure-ai-agentserver-agentframework` | diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/sdk-operations.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/sdk-operations.md new file mode 100644 index 00000000..e84cccc5 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/sdk-operations.md @@ -0,0 +1,47 @@ +# SDK Operations for Foundry Agent Service + +Use the Foundry MCP tools for agent CRUD operations. When MCP tools are unavailable, use the `azure-ai-projects` Python SDK or REST API. + +## Agent Operations via MCP + +| Operation | MCP Tool | Description | +|-----------|----------|-------------| +| Create/Update agent | `agent_update` | Create a new agent or update an existing one (creates new version) | +| List/Get agents | `agent_get` | List all agents, or get a specific agent by name | +| Delete agent | `agent_delete` | Delete an agent | +| Invoke agent | `agent_invoke` | Send a message to an agent and get a response | +| Get schema | `agent_definition_schema_get` | Get the full JSON schema for agent definitions | + +## SDK Agent Operations + +When MCP tools are unavailable, use the `azure-ai-projects` Python SDK (`pip install azure-ai-projects --pre`): + +```python +from azure.ai.projects import AIProjectClient +from azure.identity import DefaultAzureCredential + +endpoint = "https://.services.ai.azure.com/api/projects/" +client = AIProjectClient(endpoint=endpoint, credential=DefaultAzureCredential()) +``` + +| Operation | SDK Method | +|-----------|------------| +| Create | `client.agents.create_version(agent_name, definition)` | +| List | `client.agents.list()` | +| Get | `client.agents.get(agent_name)` | +| Update | `client.agents.create_version(agent_name, definition)` (creates new version) | +| Delete | `client.agents.delete(agent_name)` | +| Chat | `client.get_openai_client().responses.create(model=, input=, extra_body={"agent": {"name": agent_name, "type": "agent_reference"}})` | + +## Environment Variables + +| Variable | Description | +|----------|-------------| +| `PROJECT_ENDPOINT` | Foundry project endpoint (`https://.services.ai.azure.com/api/projects/`) | +| `MODEL_DEPLOYMENT_NAME` | Deployed model name (e.g., `gpt-4.1-mini`) | + +## References + +- [Agent quickstart](https://learn.microsoft.com/azure/ai-foundry/agents/quickstart?view=foundry) +- [Create agents](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/create-agent?view=foundry) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-azure-ai-search.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-azure-ai-search.md new file mode 100644 index 00000000..9859e81c --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-azure-ai-search.md @@ -0,0 +1,69 @@ +# Azure AI Search Tool + +Ground agent responses with data from an Azure AI Search vector index. Requires a project connection and proper RBAC setup. + +## Prerequisites + +- Azure AI Search index with vector search configured: + - One or more `Edm.String` fields (searchable + retrievable) + - One or more `Collection(Edm.Single)` vector fields (searchable) + - At least one retrievable text field with content for citations + - A retrievable field with source URL for citation links +- A [project connection](../../../project/connections.md) between your Foundry project and search service +- `azure-ai-projects` package (`pip install azure-ai-projects --pre`) + +## Required RBAC Roles + +For **keyless authentication** (recommended), assign these roles to the **Foundry project's managed identity** on the Azure AI Search resource: + +| Role | Scope | Purpose | +|------|-------|---------| +| **Search Index Data Contributor** | AI Search resource | Read/write index data | +| **Search Service Contributor** | AI Search resource | Manage search service config | + +> **If RBAC assignment fails:** Ask the user to manually assign roles in Azure portal → AI Search resource → Access control (IAM). They need Owner or User Access Administrator on the search resource. + +## Connection Setup + +A project connection between your Foundry project and the Azure AI Search resource is required. See [Project Connections](../../../project/connections.md) for connection management via Foundry MCP tools. + +## Query Types + +| Value | Description | +|-------|-------------| +| `SIMPLE` | Keyword search | +| `VECTOR` | Vector similarity only | +| `SEMANTIC` | Semantic ranking | +| `VECTOR_SIMPLE_HYBRID` | Vector + keyword | +| `VECTOR_SEMANTIC_HYBRID` | Vector + keyword + semantic (default, recommended) | + +## Tool Parameters + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `project_connection_id` | Yes | Connection ID (resolve via `foundry_connections_get`) | +| `index_name` | Yes | Search index name | +| `top_k` | No | Number of results (default: 5) | +| `query_type` | No | Search type (default: `vector_semantic_hybrid`) | +| `filter` | No | OData filter applied to all queries | + +## Limitations + +- Only **one index per tool** instance. For multiple indexes, use connected agents each with their own index. +- Search resource and Foundry agent must be in the **same tenant**. +- Private AI Search resources require **standard agent deployment** with vNET injection. + +## Troubleshooting + +| Error | Cause | Fix | +|-------|-------|-----| +| 401/403 accessing index | Missing RBAC roles | Assign `Search Index Data Contributor` + `Search Service Contributor` to project managed identity | +| Index not found | Name mismatch | Verify `AI_SEARCH_INDEX_NAME` matches exactly (case-sensitive) | +| No citations in response | Instructions don't request them | Add citation instructions to agent prompt | +| Wrong connection endpoint | Connection points to different search resource | Re-create connection with correct endpoint | + +## References + +- [Azure AI Search tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/azure-ai-search?view=foundry) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) +- [Project Connections](../../../project/connections.md) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-bing-grounding.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-bing-grounding.md new file mode 100644 index 00000000..9d466cd2 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-bing-grounding.md @@ -0,0 +1,50 @@ +# Bing Grounding Tool + +Access real-time web information via Bing Search. Unlike the [Web Search tool](tool-web-search.md) (which works out of the box), Bing Grounding requires a dedicated Bing resource and a project connection. + +> ⚠️ **Warning:** Use the [Web Search tool](tool-web-search.md) as the default for web search. Only use Bing Grounding when the user **explicitly** requests Grounding with Bing Search or Grounding with Bing Custom Search. + +## When to Use + +- User explicitly asks for "Bing Grounding" or "Grounding with Bing Search" +- User explicitly asks for "Bing Custom Search" or "Grounding with Bing Custom Search" +- User needs to restrict web search to specific domains (Bing Custom Search) +- User has an existing Bing Grounding resource they want to use + +## Prerequisites + +- A [Grounding with Bing Search resource](https://portal.azure.com/#create/Microsoft.BingGroundingSearch) in Azure portal +- `Contributor` or `Owner` role at subscription/RG level to create Bing resource and get keys +- `Azure AI Project Manager` role on the project to create a connection +- A project connection configured with the Bing resource key — see [connections](../../../project/connections.md) + +## Setup + +1. Register the Bing provider: `az provider register --namespace 'Microsoft.Bing'` +2. Create a Grounding with Bing Search resource in the Azure portal +3. Create a project connection with the Bing resource key — see [connections](../../../project/connections.md) +4. Set `BING_PROJECT_CONNECTION_NAME` environment variable + +## Important Disclosures + +- Bing data flows **outside Azure compliance boundary** +- Review [Grounding with Bing terms of use](https://www.microsoft.com/bing/apis/grounding-legal-enterprise) +- Not supported with VPN/Private Endpoints +- Usage incurs costs — see [pricing](https://www.microsoft.com/bing/apis/grounding-pricing) + +## Troubleshooting + +| Issue | Cause | Resolution | +|-------|-------|------------| +| Connection not found | Name mismatch or wrong project | Use `foundry_connections_list` to find correct name | +| Unauthorized creating connection | Missing Azure AI Project Manager role | Assign role on the Foundry project | +| Bing resource creation fails | Provider not registered | Run `az provider register --namespace 'Microsoft.Bing'` | +| No results returned | Connection misconfigured | Verify Bing resource key and connection setup | + +## References + +- [Bing Grounding tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/bing-grounding?view=foundry) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) +- [Grounding with Bing Terms](https://www.microsoft.com/bing/apis/grounding-legal-enterprise) +- [Connections Guide](../../../project/connections.md) +- [Web Search Tool (default)](tool-web-search.md) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-file-search.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-file-search.md new file mode 100644 index 00000000..159f73c8 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-file-search.md @@ -0,0 +1,60 @@ +# File Search Tool + +Enables agents to search through uploaded files using semantic and keyword search from vector stores. Supports a wide range of file formats including PDF, Markdown, Word, and more. + +> ⚠️ **Important:** Before creating an agent with file search, you **must** read the official documentation linked in the References section to understand prerequisites, supported file types, and vector store setup. + +## Prerequisites + +- A [basic or standard agent environment](https://learn.microsoft.com/azure/ai-foundry/agents/environment-setup) +- A **vector store** must be created before the agent — the `file_search` tool requires `vector_store_ids` +- Files must be uploaded to the vector store before the agent can search them + +## Key Concepts + +| Concept | Description | +|---------|-------------| +| **Vector Store** | A container that indexes uploaded files for semantic search. Must be created first. | +| **vector_store_ids** | Required parameter on the `file_search` tool — references the vector store(s) to search. | +| **File upload** | Files are uploaded to the project, then attached to a vector store for indexing. | + +## Setup Workflow + +``` +1. Create a vector store (REST API: POST /vector_stores) + │ + ▼ +2. (Optional) Upload files and attach to vector store + │ + ▼ +3. Create agent with file_search tool referencing the vector_store_ids + │ + ▼ +4. Agent can now search files in the vector store +``` + +> ⚠️ **Warning:** Creating an agent with `file_search` without providing `vector_store_ids` will fail with a `400 BadRequest` error: `required: Required properties ["vector_store_ids"] are not present`. + +## REST API Notes + +When creating vector stores via `az rest`: + +| Parameter | Value | +|-----------|-------| +| **Endpoint** | `https://.services.ai.azure.com/api/projects//vector_stores` | +| **API version** | `v1` | +| **Auth resource** | `https://ai.azure.com` | + +## Troubleshooting + +| Error | Cause | Fix | +|-------|-------|-----| +| `vector_store_ids` not present | Agent created without vector store | Create a vector store first, then pass its ID | +| 401 Unauthorized | Wrong auth resource for REST API | Use `--resource "https://ai.azure.com"` with `az rest` | +| Bad API version | Using ARM-style API version | Use `api-version=v1` for the data-plane vector store API | +| No search results | Vector store is empty | Upload files to the vector store before querying | + +## References + +- [File Search tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/file-search?view=foundry&pivots=python) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-mcp.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-mcp.md new file mode 100644 index 00000000..0a70e593 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-mcp.md @@ -0,0 +1,66 @@ +# MCP Tool (Model Context Protocol) + +Connect agents to remote MCP servers to extend capabilities with external tools and data sources. MCP is an open standard for LLM tool integration. + +## Prerequisites + +- A remote MCP server endpoint (e.g., `https://api.githubcopilot.com/mcp`) +- For authenticated servers: a [project connection](../../../project/connections.md) storing credentials +- RBAC: **Contributor** or **Owner** role on the Foundry project + +## Authenticated Server Connections + +For authenticated MCP servers, create an `api_key` project connection to store credentials. Unauthenticated servers (public endpoints) don't need a connection — omit `project_connection_id`. + +See [Project Connections](../../../project/connections.md) for connection management via Foundry MCP tools. + +## MCPTool Parameters + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `server_label` | Yes | Unique label for this MCP server within the agent | +| `server_url` | Yes | Remote MCP server endpoint URL | +| `require_approval` | No | `"always"` (default), `"never"`, or `{"never": ["tool1"]}` / `{"always": ["tool1"]}` | +| `allowed_tools` | No | List of specific tools to enable (default: all) | +| `project_connection_id` | No | Connection ID for authenticated servers | + +## Approval Workflow + +1. Agent sends request → MCP server returns tool calls +2. Response contains `mcp_approval_request` items +3. Your code reviews tool name + arguments +4. Submit `McpApprovalResponse` with `approve=True/False` +5. Agent completes work using approved tool results + +> **Best practice:** Always use `require_approval="always"` unless you fully trust the MCP server. Use `allowed_tools` to restrict which tools the agent can access. + +## Hosting Local MCP Servers + +Agent Service only accepts **remote** MCP endpoints. To use a local server, deploy it to: + +| Platform | Transport | Notes | +|----------|-----------|-------| +| [Azure Container Apps](https://github.com/Azure-Samples/mcp-container-ts) | HTTP POST/GET | Any language, container rebuild needed | +| [Azure Functions](https://github.com/Azure-Samples/mcp-sdk-functions-hosting-python) | HTTP streamable | Python/Node/.NET/Java, key-based auth | + +## Known Limitations + +- **100-second timeout** for non-streaming MCP tool calls +- **Identity passthrough not supported in Teams** — agents published to Teams use project managed identity +- **Network-secured Foundry** can't use private MCP servers in same vNET — only public endpoints + +## Troubleshooting + +| Error | Cause | Fix | +|-------|-------|-----| +| `Invalid tool schema` | `anyOf`/`allOf` in MCP server definition | Update MCP server schema to use simple types | +| `Unauthorized` / `Forbidden` | Wrong credentials in connection | Verify connection credentials match server requirements | +| Model never calls MCP tool | Misconfigured server_label/url | Check `server_label`, `server_url`, `allowed_tools` values | +| Agent stalls after approval | Missing `previous_response_id` | Include `previous_response_id` in follow-up request | +| Timeout | Server takes >100s | Optimize server-side logic or break into smaller operations | + +## References + +- [MCP tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/mcp?view=foundry) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) +- [Project Connections](../../../project/connections.md) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-memory.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-memory.md new file mode 100644 index 00000000..8ad90259 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-memory.md @@ -0,0 +1,109 @@ +# Agent Memory + +Managed long-term memory for Foundry agents. Enables agent continuity across sessions, devices, and workflows. Agents retain user preferences, conversation history, and deliver personalized experiences. Memory is stored in your project's owned storage. + +## Prerequisites + +- A [Foundry project](https://learn.microsoft.com/azure/ai-foundry/how-to/create-projects) with authorization configured +- A **chat model deployment** (e.g., `gpt-5.2`) +- An **embedding model deployment** (e.g., `text-embedding-3-small`) — see [Check Embedding Model](#check-embedding-model) below +- Python packages: `pip install azure-ai-projects azure-identity` + +### Check Embedding Model + +An embedding model is **required** before enabling memory. Check if one is already deployed: + +Use `foundry_models_list` MCP tool to list all deployments and look for an embedding model (e.g., `text-embedding-3-small`, `text-embedding-3-large`, `text-embedding-ada-002`). + +| Result | Action | +|--------|--------| +| ✅ Embedding model found | Note the deployment name and proceed | +| ❌ No embedding model | Deploy one before enabling memory — see below | + +### Deploy Embedding Model + +If no embedding model exists, use `foundry_models_deploy` MCP tool with: +- `deploymentName`: `text-embedding-3-small` (or preferred name) +- `modelName`: `text-embedding-3-small` +- `modelFormat`: `OpenAI` + +## Authorization and Permissions + +| Role | Scope | Purpose | +|------|-------|---------| +| **Azure AI User** | AI Services resource | Assigned to project managed identity | +| **System-assigned managed identity** | Project | Must be enabled on the project | + +**Setup steps:** +1. In Azure portal → project → **Resource Management** → **Identity** → enable system-assigned managed identity +2. On the AI Services resource → **Access control (IAM)** → assign **Azure AI User** to the project managed identity + +## Workflow + +``` +User wants agent memory + │ + ▼ +Step 1: Check for embedding model deployment + │ ├─ ✅ Found → Continue + │ └─ ❌ Not found → Deploy one (ask user) + │ + ▼ +Step 2: Create memory store + │ + ▼ +Step 3: Attach memory tool to agent + │ + ▼ +Step 4: Test with conversation +``` + +## Key Concepts + +### Memory Store Options + +| Option | Description | +|--------|-------------| +| `chat_summary_enabled` | Summarize conversations for memory | +| `user_profile_enabled` | Build and maintain user profile | +| `user_profile_details` | Control what data gets stored (e.g., `"Avoid sensitive data such as age, financials, location, credentials"`) | + +> 💡 **Tip:** Use `user_profile_details` to control what the agent stores — e.g., `"flight carrier preference and dietary restrictions"` for a travel agent, or exclude sensitive data. + +### Scope + +The `scope` parameter partitions memory per user: + +| Scope Value | Behavior | +|-------------|----------| +| `{{$userId}}` | Auto-extracts TID+OID from auth token (recommended) | +| `"user_123"` | Static identifier — you manage user mapping | + +### Memory Store Operations + +| Operation | Description | +|-----------|-------------| +| Create | Initialize a memory store with chat/embedding models and options | +| List | List all memory stores in the project | +| Update | Update memory store description or configuration | +| Delete scope | Delete memories for a specific user scope | +| Delete store | Delete entire memory store (irreversible — all scopes lost) | + +> ⚠️ **Warning:** Deleting a memory store removes all memories across all scopes. Agents with attached memory stores lose access to historical context. + +## Troubleshooting + +| Issue | Cause | Resolution | +|-------|-------|------------| +| Auth/authorization error | Identity or managed identity lacks required roles | Verify roles in Authorization section; refresh access token for REST | +| Memories don't appear after conversation | Updates are debounced or still processing | Increase wait time or call update API with `update_delay=0` | +| Memory search returns no results | Scope mismatch between update and search | Use same scope value for storing and retrieving memories | +| Agent response ignores stored memory | Agent not configured with memory search tool | Confirm agent definition includes `MemorySearchTool` with correct store name | +| No embedding model available | Embedding deployment missing | Deploy an embedding model — see Check Embedding Model section | + +## References + +- [Memory tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/memory-usage?view=foundry) +- [Memory Concepts](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/what-is-memory) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) +- [Python Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-projects/samples/memories) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-web-search.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-web-search.md new file mode 100644 index 00000000..7dfc9b3a --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/create/references/tool-web-search.md @@ -0,0 +1,57 @@ +# Web Search Tool (Preview) + +Enables agents to retrieve and ground responses with real-time public web information before generating output. Returns up-to-date answers with inline URL citations. This is the **default tool for web search** — no external resource or connection setup required. + +> ⚠️ **Warning:** For Bing Grounding or Bing Custom Search (which require a separate Bing resource and project connection), see [tool-bing-grounding.md](tool-bing-grounding.md). Only use those when explicitly requested. + +## Important Disclosures + +- Web Search (preview) uses Grounding with Bing Search and Grounding with Bing Custom Search, which are [First Party Consumption Services](https://www.microsoft.com/licensing/terms/product/Glossary/EAEAS) governed by [Grounding with Bing terms of use](https://www.microsoft.com/bing/apis/grounding-legal-enterprise) and the [Microsoft Privacy Statement](https://go.microsoft.com/fwlink/?LinkId=521839&clcid=0x409). +- The [Data Protection Addendum](https://aka.ms/dpa) **does not apply** to data sent to Grounding with Bing Search and Grounding with Bing Custom Search. +- Data transfers occur **outside compliance and geographic boundaries**. +- Usage incurs costs — see [pricing](https://www.microsoft.com/bing/apis/grounding-pricing). + +## Prerequisites + +- A [basic or standard agent environment](https://learn.microsoft.com/azure/ai-foundry/agents/environment-setup) +- Azure credentials configured (e.g., `DefaultAzureCredential`) + +## Setup + +No external resource or project connection is required. The web search tool works out of the box when added to an agent definition. + +## Configuration Options + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `user_location` | Approximate location (country/region/city) for localized results | None | +| `search_context_size` | Context window space for search: `low`, `medium`, `high` | `medium` | + +## Administrator Control + +Admins can enable or disable web search at the subscription level via Azure CLI. Requires Owner or Contributor access. + +- **Disable:** `az feature register --name OpenAI.BlockedTools.web_search --namespace Microsoft.CognitiveServices --subscription ""` +- **Enable:** `az feature unregister --name OpenAI.BlockedTools.web_search --namespace Microsoft.CognitiveServices --subscription ""` + +## Security Considerations + +- Treat web search results as **untrusted input**. Validate before use in downstream systems. +- Avoid sending secrets or sensitive data in prompts forwarded to external services. + +## Troubleshooting + +| Issue | Cause | Resolution | +|-------|-------|------------| +| No citations appear | Model didn't determine web search was needed | Update instructions to explicitly allow web search; ask queries requiring current info | +| Requests fail after enabling | Web search disabled at subscription level | Ask admin to enable — see Administrator Control above | +| Authentication errors (REST) | Bearer token missing, expired, or insufficient | Refresh token; confirm project/agent access | +| Outdated results | Content not recently indexed by Bing | Refine query to request most recent info | +| No results for specific topics | Query too narrow | Broaden query; niche topics may have limited coverage | +| Rate limiting (429) | Too many requests | Implement exponential backoff; space out requests | + +## References + +- [Web Search tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/web-search?view=foundry) +- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry) +- [Bing Pricing](https://www.microsoft.com/bing/apis/grounding-pricing) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/deploy/deploy.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/deploy/deploy.md new file mode 100644 index 00000000..2a2a7891 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/deploy/deploy.md @@ -0,0 +1,368 @@ +# Foundry Agent Deploy + +Create and manage agent deployments in Azure AI Foundry. For hosted agents, this includes the full workflow from containerizing the project to starting the agent container. + +## Quick Reference + +| Property | Value | +|----------|-------| +| Agent types | Prompt (LLM-based), Hosted (ACA based), Hosted (vNext) | +| MCP server | `foundry-mcp` | +| Key MCP tools | `agent_update`, `agent_container_control`, `agent_container_status_get` | +| CLI tools | `docker`, `az acr` (hosted agents only) | +| Container protocols | `a2a`, `responses`, `mcp` | +| Supported languages | .NET, Node.js, Python, Go, Java | + +## When to Use This Skill + +USE FOR: deploy agent to foundry, push agent to foundry, ship my agent, build and deploy container agent, deploy hosted agent, create hosted agent, deploy prompt agent, start agent container, stop agent container, ACR build, container image for agent, docker build for foundry, redeploy agent, update agent deployment, clone agent, delete agent, azd deploy hosted agent, azd ai agent, azd up for agent, deploy agent with azd. + +> ⚠️ **DO NOT manually run** `azd up`, `azd deploy`, `az acr build`, `docker build`, `agent_update`, or `agent_container_control` **without reading this skill first.** This skill orchestrates the full deployment pipeline: project scan → env var collection → Dockerfile generation → image build → agent creation → container startup → verification. Running CLI commands or calling MCP tools individually skips critical steps (env var confirmation, schema validation, status polling). + +## MCP Tools + +| Tool | Description | Parameters | +|------|-------------|------------| +| `agent_definition_schema_get` | Get JSON schema for agent definitions | `projectEndpoint` (required), `schemaType` (`prompt`, `hosted`, `tools`, `all`) | +| `agent_update` | Create, update, or clone an agent | `projectEndpoint`, `agentName` (required); `agentDefinition` (JSON), `isCloneRequest`, `cloneTargetAgentName`, `modelName`, `creationOptions` (JSON with `description` and `metadata`) | +| `agent_get` | List all agents or get a specific agent | `projectEndpoint` (required), `agentName` (optional) | +| `agent_delete` | Delete an agent with container cleanup | `projectEndpoint`, `agentName` (required) | +| `agent_container_control` | Start or stop a hosted agent container | `projectEndpoint`, `agentName`, `action` (`start`/`stop`) (required); `agentVersion`, `minReplicas`, `maxReplicas` | +| `agent_container_status_get` | Check container running status | `projectEndpoint`, `agentName` (required); `agentVersion` | + +## Workflow: Hosted Agent Deployment +There are two types of hosted agent - ACA based and vNext. There is only one change in the deployment flow for vNext which is indicated in the steps below. You must use vNext experience only when user explicitly asks you to deploy the agent to vNext (or v2, or v-next, or similar words). For all other cases, use the ACA based deployment flow. + + +### Step 1: Detect and Scan Project + +Get the project path from the project context (see Common: Project Context Resolution). Detect the project type by checking for these files: + +| Project Type | Detection Files | +|--------------|-----------------| +| .NET | `*.csproj`, `*.fsproj` | +| Node.js | `package.json` | +| Python | `requirements.txt`, `pyproject.toml`, `setup.py` | +| Go | `go.mod` | +| Java (Maven) | `pom.xml` | +| Java (Gradle) | `build.gradle` | + +Delegate an environment variable scan to a sub-agent. Provide the project path and project type. Search source files for these patterns: + +| Project Type | Patterns to Search | +|--------------|--------------------| +| .NET (`*.cs`) | `Environment.GetEnvironmentVariable("...")`, `configuration["..."]`, `configuration.GetValue("...")` | +| Node.js (`*.js`, `*.ts`, `*.mjs`) | `process.env.VAR_NAME`, `process.env["..."]` | +| Python (`*.py`) | `os.environ["..."]`, `os.environ.get("...")`, `os.getenv("...")` | +| Go (`*.go`) | `os.Getenv("...")`, `os.LookupEnv("...")` | +| Java (`*.java`) | `System.getenv("...")`, `@Value("${...}")` | + +Classification: if followed by a throw/error → required; if followed by a fallback value → optional with default; otherwise → assume required, ask user. + +### Step 2: Collect and Confirm Environment Variables + +> ⚠️ **Warning:** Environment variables are included in the agent payload and are difficult to change after deployment. + +Use azd environment values from the project context to pre-fill discovered variables. Merge with any user-provided values. Present all variables to the user for confirmation with variable name, value, and source (`azd`, `project default`, or `user`). Mask sensitive values. + +Loop until the user confirms or cancels: +- `yes` → Proceed +- `VAR_NAME=new_value` → Update the value, show updated table, ask again +- `cancel` → Abort deployment + +### Step 3: Generate Dockerfile and Build Image + +Delegate Dockerfile creation to a sub-agent. Guidelines: +- Use official base image for the detected language and runtime version +- Use multi-stage builds for compiled languages +- Use Alpine or slim variants for smaller images +- Always target `linux/amd64` platform +- Expose the correct port (usually 8088) + +> 💡 **Tip:** Reference [Hosted Agents Foundry Samples](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents) for containerized agent examples. + +Also generate `docker-compose.yml` and `.env` files for local development. + +**IMPORTANT**: You MUST always generate image tag as current timestamp (e.g., `myagent:202401011230`) to ensure uniqueness and avoid conflicts with existing images in ACR. DO NOT use static tags like `latest` or `v1`. + +Collect ACR details from project context. Let the user choose the build method: + +**Cloud Build (ACR Tasks) (Recommended)** — no local Docker required: +```bash +az acr build --registry --image : --platform linux/amd64 --source-acr-auth-id "[caller]" --file Dockerfile . +``` + +**Local Docker Build:** +```bash +docker build --platform linux/amd64 -t : -f Dockerfile . +az acr login --name +docker tag : .azurecr.io/: +docker push .azurecr.io/: +``` + +> 💡 **Tip:** Prefer Cloud Build if Docker is not available locally. On Windows with WSL, prefix Docker commands with `wsl -e` if `docker info` fails but `wsl -e docker info` succeeds. + +### Step 4: Collect Agent Configuration + +Use the project endpoint and ACR name from the project context. Ask the user only for values not already resolved: +- **Agent name** — Unique name for the agent +- **Model deployment** — Model deployment name (e.g., `gpt-4o`) + +### Step 5: Get Agent Definition Schema + +Use `agent_definition_schema_get` with `schemaType: hosted` to retrieve the current schema and validate required fields. + +### Step 6: Create the Agent + +> **VNext Experience:** You MUST pass `enableVnextExperience = true` in the `metadata` field of `creationOptions`. This is required for vNext deployments. + +Use `agent_update` with the agent definition: + +For ACA one: +```json +{ + "kind": "hosted", + "image": ".azurecr.io/:", + "cpu": "", + "memory": "", + "container_protocol_versions": [ + { "protocol": "", "version": "" } + ], + "environment_variables": { "": "" } +} +``` + +For vNext one: +```json +{ + "agentDefinition": { + "kind": "hosted", + "image": ".azurecr.io/:", + "cpu": "", + "memory": "", + "container_protocol_versions": [ + { "protocol": "", "version": "" } + ], + "environment_variables": { "": "" } + }, + "creationOptions": { + "metadata": { + "enableVnextExperience": "true" + } + } +} +``` + +### Step 7: Start Agent Container + +Use `agent_container_control` with `action: start` to start the container. + +### Step 8: Verify Agent Status + +Delegate status polling to a sub-agent. Provide the project endpoint, agent name, and instruct it to use `agent_container_status_get` repeatedly until the status is `Running` or `Failed`. + +**Container status values:** +- `Starting` — Container is initializing +- `Running` — Container is active and ready ✅ +- `Stopped` — Container has been stopped +- `Failed` — Container failed to start ❌ + +### Step 9: Test the Agent + +Read and follow the [invoke skill](../invoke/invoke.md) to send a test message and verify the agent responds correctly. DO NOT SKIP reading the invoke skill — it contains important information about how to format messages for hosted agents for vNext experience. + +> ⚠️ **DO NOT stop here.** Continue to Step 10 (Auto-Create Evaluators & Dataset). This step is mandatory after every successful deployment. + +### Step 10: Auto-Create Evaluators & Dataset + +Follow [After Deployment — Auto-Create Evaluators & Dataset](#after-deployment--auto-create-evaluators--dataset) below. + +## Workflow: Prompt Agent Deployment + +### Step 1: Collect Agent Configuration + +Use the project endpoint from the project context (see Common: Project Context Resolution). Ask the user only for values not already resolved: +- **Agent name** — Unique name for the agent +- **Model deployment** — Model deployment name (e.g., `gpt-4o`) +- **Instructions** — System prompt (optional) +- **Temperature** — Response randomness 0-2 (optional, default varies by model) +- **Tools** — Tool configurations (optional) + +### Step 2: Get Agent Definition Schema + +Use `agent_definition_schema_get` with `schemaType: prompt` to retrieve the current schema. + +### Step 3: Create the Agent + +Use `agent_update` with the agent definition: + +```json +{ + "kind": "prompt", + "model": "", + "instructions": "", + "temperature": 0.7 +} +``` + +### Step 4: Test the Agent + +Read and follow the [invoke skill](../invoke/invoke.md) to send a test message and verify the agent responds correctly. + +> ⚠️ **DO NOT stop here.** Continue to Step 5 (Auto-Create Evaluators & Dataset). This step is mandatory after every successful deployment. + +### Step 5: Auto-Create Evaluators & Dataset + +Follow [After Deployment — Auto-Create Evaluators & Dataset](#after-deployment--auto-create-evaluators--dataset) below. + +## Display Agent Information +Once deployment is done for either hosted or prompt agent, display the agent's details in a nicely formatted table. + +Below the table you MUST also display a Playground link for direct access to the agent in Azure AI Foundry: + +[Open in Playground](https://ai.azure.com/nextgen/r/{encodedSubId},{resourceGroup},,{accountName},{projectName}/build/agents/{agentName}/build?version={agentVersion}) + +To calculate the encodedSubId, you need to take subscription id and convert it into its 16-byte GUID, then encode it as URL-safe base64 without padding (= characters trimmed). You can use the following Python code to do this conversion: + +``` +python -c "import base64,uuid;print(base64.urlsafe_b64encode(uuid.UUID('').bytes).rstrip(b'=').decode())" +``` + +## Document Deployment Context + +After a successful deployment, persist the following to a `.env` or config file in the repo so future conversations (e.g., evaluation, monitoring) can pick them up automatically: + +| Variable | Purpose | Example | +|----------|---------|---------| +| `AZURE_AI_PROJECT_ENDPOINT` | Foundry project endpoint | `https://.services.ai.azure.com/api/projects/` | +| `AZURE_AI_AGENT_NAME` | Deployed agent name | `my-support-agent` | +| `AZURE_AI_AGENT_VERSION` | Current agent version | `1` | +| `AZURE_CONTAINER_REGISTRY` | ACR resource (hosted agents) | `myregistry.azurecr.io` | + +If a `.env` file already exists, read it first and merge — do not overwrite existing values without confirmation. + +## After Deployment — Auto-Create Evaluators & Dataset + +> ⚠️ **This step is automatic.** After a successful deployment, immediately prepare for evaluation without waiting for the user to request it. This matches the eval-driven optimization loop. + +### 1. Read Agent Instructions + +Use **`agent_get`** (or local `agent.yaml`) to understand the agent's purpose and capabilities. + +### 2. Select Default Evaluators + +| Category | Evaluators | +|----------|-----------| +| **Quality (built-in)** | intent_resolution, task_adherence, coherence | +| **Safety (include ≥2)** | violence, self_harm, hate_unfairness | + +### 3. Identify LLM-Judge Deployment + +Use **`model_deployment_get`** to find a suitable model (e.g., `gpt-4o`) for quality evaluators. + +### 4. Generate Local Test Dataset + +Use the identified LLM deployment to generate realistic test queries based on the agent's instructions and tool capabilities. Save to `datasets/-test.jsonl` with each line containing at minimum a `query` field (optionally `context`, `ground_truth`). + +> ⚠️ **Prefer local dataset generation.** Generate test queries locally and save to `datasets/*.jsonl` rather than using `generateSyntheticData=true` on the eval API. Local datasets provide reproducibility, version control, and can be reviewed before running evals. + +### 5. Persist Artifacts + +Save evaluator definitions to `evaluators/.yaml` and any locally generated test datasets to `datasets/*.jsonl`: + +``` +evaluators/ # custom evaluator definitions + .yaml # prompt text, scoring type, thresholds +datasets/ # locally generated input datasets + *.jsonl # test queries +``` + +### 6. Prompt User + +*"Your agent is deployed and running. Evaluators and a test dataset have been auto-configured. Would you like to run an evaluation to identify optimization opportunities?"* + +- **Yes** → follow the [observe skill](../observe/observe.md) starting at **Step 2 (Evaluate)** — evaluators and dataset are already prepared. +- **No** → stop. The user can return later. +- **Production trace analysis** → follow the [trace skill](../trace/trace.md) to search conversations, diagnose failures, and analyze latency using App Insights. + +## Agent Definition Schemas + +### Prompt Agent + +| Property | Type | Required | Description | +|----------|------|----------|-------------| +| `kind` | string | ✅ | Must be `"prompt"` | +| `model` | string | ✅ | Model deployment name (e.g., `gpt-4o`) | +| `instructions` | string | | System message for the model | +| `temperature` | number | | Response randomness (0-2) | +| `top_p` | number | | Nucleus sampling (0-1) | +| `tools` | array | | Tools the model may call | +| `tool_choice` | string/object | | Tool selection strategy | +| `rai_config` | object | | Responsible AI configuration | + +### Hosted Agent + +| Property | Type | Required | Description | +|----------|------|----------|-------------| +| `kind` | string | ✅ | Must be `"hosted"` | +| `image` | string | ✅ | Container image URL | +| `cpu` | string | ✅ | CPU allocation (e.g., `"0.5"`, `"1"`, `"2"`) | +| `memory` | string | ✅ | Memory allocation (e.g., `"1Gi"`, `"2Gi"`) | +| `container_protocol_versions` | array | ✅ | Protocol and version pairs | +| `environment_variables` | object | | Key-value pairs for container env vars | +| `tools` | array | | Tool configurations | +| `rai_config` | object | | Responsible AI configuration | + +> **Reminder:** Always pass `creationOptions.metadata.enableVnextExperience: "true"` when creating vNext hosted agents. + +### Container Protocols + +| Protocol | Description | +|----------|-------------| +| `a2a` | Agent-to-Agent protocol | +| `responses` | OpenAI Responses API | +| `mcp` | Model Context Protocol | + +## Agent Management Operations + +### Clone an Agent + +Use `agent_update` with `isCloneRequest: true` and `cloneTargetAgentName` to create a copy. For prompt agents, optionally override the model with `modelName`. + +### Delete an Agent + +Use `agent_delete` — automatically cleans up containers for hosted agents. + +### List Agents + +Use `agent_get` without `agentName` to list all agents, or with `agentName` to get a specific agent's details. + +## Error Handling + +| Error | Cause | Resolution | +|-------|-------|------------| +| Project type not detected | No known project files found | Ask user to specify project type manually | +| Docker not running | Docker Desktop not started or not installed | Start Docker Desktop, or use Cloud Build (ACR Tasks) instead | +| ACR login failed | Not authenticated to Azure | Run `az login` first, then `az acr login --name ` | +| Build/push failed | Dockerfile errors or insufficient ACR permissions | Check Dockerfile syntax, verify Contributor or AcrPush role on registry | +| Agent creation failed | Invalid definition or missing required fields | Use `agent_definition_schema_get` to verify schema, check all required fields | +| Container start failed | Image not accessible or invalid configuration | Verify ACR image path, check cpu/memory values, confirm ACR permissions | +| Container status: Failed | Runtime error in container | Check container logs, verify environment variables, ensure image runs correctly | +| Permission denied | Insufficient Foundry project permissions | Verify Azure AI Owner or Contributor role on the project | +| Schema fetch failed | Invalid project endpoint | Verify project endpoint URL format: `https://.services.ai.azure.com/api/projects/` | + +## Non-Interactive / YOLO Mode + +When running in non-interactive mode (e.g., `nonInteractive: true` or YOLO mode), the skill skips user confirmation prompts and uses sensible defaults: + +- **Environment variables** — Uses values resolved from `azd env get-values` and project defaults without prompting for confirmation +- **Agent name** — Must be provided in the initial user message or derived sensibly from the project context; if missing, the skill fails with an error instead of prompting +- **Container lifecycle** — Automatically starts the container and polls for `Running` status without user confirmation + +> ⚠️ **Warning:** In non-interactive mode, ensure all required values (project endpoint, agent name, ACR image) are provided upfront in the user message or available via `azd env get-values`. Missing values will cause the deployment to fail rather than prompt. + +## Additional Resources + +- [Foundry Hosted Agents](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry) +- [Foundry Agent Runtime Components](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/runtime-components?view=foundry) +- [Foundry Samples](https://github.com/microsoft-foundry/foundry-samples/) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/eval-datasets.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/eval-datasets.md new file mode 100644 index 00000000..7696c724 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/eval-datasets.md @@ -0,0 +1,81 @@ +# Evaluation Datasets — Trace-to-Dataset Pipeline & Lifecycle Management + +Manage the full lifecycle of evaluation datasets for Foundry agents — from harvesting production traces into test datasets, through versioning and organization, to evaluation trending and regression detection. This skill closes the gap between **production observability** and **evaluation quality** by turning real-world agent failures into reproducible test cases. + +## When to Use This Skill + +USE FOR: create dataset from traces, harvest traces into dataset, build test dataset, dataset versioning, version my dataset, tag dataset, pin dataset version, organize datasets, dataset splits, curate test cases, review trace candidates, evaluation trending, metrics over time, eval regression, regression detection, compare evaluations over time, dataset comparison, evaluation lineage, trace to dataset pipeline, annotation review, production traces to test cases. + +> ⚠️ **DO NOT manually run** KQL queries to extract datasets or call `evaluation_dataset_create` **without reading this skill first.** This skill defines the correct trace extraction patterns, schema transformation, versioning conventions, and quality gates that raw tools do not enforce. + +> 💡 **Tip:** This skill complements the [observe skill](../observe/observe.md) (eval-driven optimization loop) and the [trace skill](../trace/trace.md) (production trace analysis). Use this skill when you need to **bridge traces and evaluations** — turning production data into test cases and tracking evaluation quality over time. + +## Quick Reference + +| Property | Value | +|----------|-------| +| MCP server | `foundry-mcp` | +| Key MCP tools | `evaluation_dataset_get`, `evaluation_get`, `evaluation_comparison_create`, `evaluation_comparison_get` | +| Azure services | Application Insights (via `monitor_resource_log_query`) | +| ⚠️ Not available | `evaluation_dataset_create` (dataset upload MCP not ready — use local JSONL + `inputData`) | +| Prerequisites | Agent deployed, App Insights connected (see [trace skill](../trace/trace.md)) | +| Artifact paths | `datasets/`, `results/`, `evaluators/` | + +## Entry Points + +| User Intent | Start At | +|-------------|----------| +| "Create dataset from production traces" / "Harvest traces" | [Trace-to-Dataset Pipeline](references/trace-to-dataset.md) | +| "Version my dataset" / "Tag dataset" / "Pin dataset version" | [Dataset Versioning](references/dataset-versioning.md) | +| "Organize my datasets" / "Dataset splits" / "Filter datasets" | [Dataset Organization](references/dataset-organization.md) | +| "Review trace candidates" / "Curate test cases" | [Dataset Curation](references/dataset-curation.md) | +| "Show eval metrics over time" / "Evaluation trending" | [Eval Trending](references/eval-trending.md) | +| "Did my agent regress?" / "Regression detection" | [Eval Regression](references/eval-regression.md) | +| "Compare datasets" / "Experiment comparison" / "A/B test" | [Dataset Comparison](references/dataset-comparison.md) | +| "Trace my evaluation lineage" / "Audit eval history" | [Eval Lineage](references/eval-lineage.md) | + +## Before Starting — Detect Current State + +1. Check `.env` for `AZURE_AI_PROJECT_ENDPOINT`, `AZURE_AI_AGENT_NAME`, and `APPLICATIONINSIGHTS_CONNECTION_STRING` +2. If App Insights is missing, resolve via [trace skill](../trace/trace.md) (Before Starting section) +3. Check `datasets/` for existing datasets and `results/` for evaluation history +4. Check if `evaluation_dataset_get` returns any server-side datasets +5. Route to the appropriate entry point based on user intent + +## The Foundry Flywheel + +This skill enables a closed-loop improvement cycle where production failures become regression tests: + +``` +Production Agent → [1] Trace (App Insights + OTel) + → [2] Harvest (KQL extraction) + → [3] Curate (human review) + → [4] Dataset (versioned, tagged) + → [5] Evaluate (batch eval) + → [6] Analyze (trending + regression) + → [7] Compare (version diff) + → [8] Deploy → back to [1] +``` + +Each cycle makes the test suite harder and more representative. Production failures from release N become regression tests for release N+1. + +## Behavioral Rules + +1. **Always show KQL queries.** Before executing any trace extraction query, display it in a code block. Never run queries silently. +2. **Scope to time ranges.** Always include a time range in KQL queries (default: last 7 days for trace harvesting). Ask user for the range if not specified. +3. **Require human review.** Never auto-commit harvested traces to a dataset without showing candidates to the user first. The curation step is mandatory. +4. **Use versioning conventions.** Follow the naming pattern `--v` (e.g., `support-bot-traces-v3`). +5. **Persist artifacts.** Save datasets to `datasets/`, evaluation results to `results/`, and track lineage in `datasets/manifest.json`. +6. **Confirm before overwriting.** If a dataset version already exists, warn the user and ask for confirmation before replacing. +7. **Never upload datasets to cloud storage.** Do not use blob upload, SAS URLs, or `evaluation_dataset_create`. Always persist datasets locally and reference them via `inputData` when running evaluations. +8. **Never remove dataset rows or weaken evaluators to recover scores.** Score drops after a dataset update are expected — harder tests expose real gaps. Optimize the agent for new failure patterns; do not shrink the test suite. + +## Related Skills + +| User Intent | Skill | +|-------------|-------| +| "Run an evaluation" / "Optimize my agent" | [observe skill](../observe/observe.md) | +| "Search traces" / "Analyze failures" / "Latency analysis" | [trace skill](../trace/trace.md) | +| "Find eval scores for a response ID" / "Link eval results to traces" | [trace skill → Eval Correlation](../trace/references/eval-correlation.md) (in `agent/trace/references/`) | +| "Deploy my agent" | [deploy skill](../deploy/deploy.md) | +| "Debug container issues" | [troubleshoot skill](../troubleshoot/troubleshoot.md) | diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/dataset-comparison.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/dataset-comparison.md new file mode 100644 index 00000000..ed5feca6 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/dataset-comparison.md @@ -0,0 +1,98 @@ +# Dataset Comparison — Experiment Framework & A/B Testing + +Run structured experiments that compare agent versions against the same dataset, and present results as leaderboards with per-evaluator breakdowns. + +## Experiment Structure + +An experiment consists of: +1. **One pinned dataset version** — ensures fair comparison +2. **Multiple agent versions** — the variables being compared +3. **Same evaluators** — applied consistently across all versions +4. **Comparison results** — which version wins on each metric + +## Step 1 — Define the Experiment + +| Parameter | Value | Example | +|-----------|-------|---------| +| Dataset | Pinned version from `datasets/manifest.json` | `support-bot-traces-v3` (tag: `prod`) | +| Baseline | Agent version to compare against | `v2` | +| Treatment(s) | Agent version(s) to evaluate | `v3`, `v4` | +| Evaluators | Same set for all runs | coherence, fluency, relevance, intent_resolution, task_adherence | + +## Step 2 — Run Evaluations + +For each agent version, run **`evaluation_agent_batch_eval_create`** with: +- Same `evaluationId` (groups all runs for comparison) +- Same `inputData` (from the pinned dataset) +- Same `evaluatorNames` +- Different `agentVersion` + +> **Important:** Use `evaluationId` (NOT `evalId`) to group runs. All versions must be in the same evaluation group for comparison to work. + +## Step 3 — Compare Results + +Use **`evaluation_comparison_create`** with the baseline and treatment runs: + +```json +{ + "insightRequest": { + "displayName": "Experiment: v2 vs v3 vs v4 on traces-v3", + "state": "NotStarted", + "request": { + "type": "EvaluationComparison", + "evalId": "", + "baselineRunId": "", + "treatmentRunIds": ["", ""] + } + } +} +``` + +## Step 4 — Leaderboard + +Present results as a leaderboard table: + +| Evaluator | v2 (baseline) | v3 | v4 | Best | +|-----------|:---:|:---:|:---:|:---:| +| Coherence | 3.5 | 4.1 | 4.0 | ✅ v3 | +| Fluency | 4.2 | 4.4 | 4.5 | ✅ v4 | +| Relevance | 3.0 | 3.8 | 3.6 | ✅ v3 | +| Intent Resolution | 3.3 | 4.0 | 4.1 | ✅ v4 | +| Task Adherence | 2.8 | 3.5 | 3.9 | ✅ v4 | +| **Wins** | **0** | **2** | **3** | — | + +### Recommendation + +Based on the comparison: + +*"v4 wins on 3/5 evaluators (Fluency, Intent Resolution, Task Adherence). v3 wins on 2/5 (Coherence, Relevance). Recommend deploying v4 with additional prompt tuning to recover Relevance."* + +## Pairwise A/B Comparison + +For detailed pairwise analysis between exactly two versions: + +| Evaluator | Baseline (v2) | Treatment (v3) | Delta | p-value | Effect | +|-----------|:---:|:---:|:---:|:---:|:---:| +| Coherence | 3.5 ± 0.8 | 4.1 ± 0.6 | +0.6 | 0.02 | Improved | +| Fluency | 4.2 ± 0.5 | 4.4 ± 0.4 | +0.2 | 0.15 | Inconclusive | +| Relevance | 3.0 ± 1.1 | 3.8 ± 0.9 | +0.8 | 0.01 | Improved | + +> 💡 **Tip:** The `evaluation_comparison_create` result includes `pValue` and `treatmentEffect` fields. Use `pValue < 0.05` as the threshold for statistical significance. + +## Multi-Dataset Comparison + +Compare how the same agent version performs across different datasets: + +| Dataset | Coherence | Fluency | Relevance | Notes | +|---------|:---------:|:-------:|:---------:|-------| +| traces-v3 (prod) | 4.0 | 4.5 | 3.6 | Production-derived | +| synthetic-v2 | 4.3 | 4.6 | 4.1 | May overestimate quality | +| manual-v1 (curated) | 3.8 | 4.4 | 3.2 | Hardest test cases | + +> ⚠️ **Warning:** Be cautious comparing scores across different datasets. Differences may reflect dataset difficulty, not agent quality. Always compare agent versions on the same dataset. + +## Next Steps + +- **Track trends over time** → [Eval Trending](eval-trending.md) +- **Check for regressions** → [Eval Regression](eval-regression.md) +- **Audit full lineage** → [Eval Lineage](eval-lineage.md) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/dataset-curation.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/dataset-curation.md new file mode 100644 index 00000000..da1d76d2 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/dataset-curation.md @@ -0,0 +1,102 @@ +# Dataset Curation — Human-in-the-Loop Review + +Review, annotate, and approve harvested trace candidates before including them in evaluation datasets. This ensures dataset quality by adding a human review gate between raw trace extraction and finalized test cases. + +## Workflow Overview + +``` +Raw Traces (from KQL harvest) + │ + ▼ +[1] Candidate File (unreviewed) + │ + ▼ +[2] Human Review (approve/edit/reject each) + │ + ▼ +[3] Approved Dataset (versioned, ready for eval) +``` + +## Step 1 — Generate Candidate File + +After running a [trace harvest](trace-to-dataset.md), save candidates with a `status` field: + +``` +datasets/-candidates-.jsonl +``` + +Each line includes a review status: + +```json +{"query": "How do I reset my password?", "response": "...", "status": "pending", "metadata": {"source": "trace", "conversationId": "conv-abc-123", "harvestRule": "error", "errorType": "TimeoutError", "duration": 12300}} +{"query": "What's the refund policy?", "response": "...", "status": "pending", "metadata": {"source": "trace", "conversationId": "conv-def-456", "harvestRule": "latency", "duration": 8700}} +``` + +## Step 2 — Present for Review + +Show candidates in a review table: + +| # | Status | Query (preview) | Source | Error | Duration | Eval Score | +|---|--------|----------------|--------|-------|----------|------------| +| 1 | ⏳ pending | "How do I reset my..." | error harvest | TimeoutError | 12.3s | — | +| 2 | ⏳ pending | "What's the refund..." | latency harvest | — | 8.7s | — | +| 3 | ⏳ pending | "Can you help me..." | low-eval harvest | — | 0.4s | 2.0 | + +### Review Actions + +For each candidate, the user can: + +| Action | Result | +|--------|--------| +| **Approve** | Include in dataset as-is | +| **Approve + Edit** | Include with modified query/response/ground_truth | +| **Add Ground Truth** | Approve and add the expected correct answer | +| **Reject** | Exclude from dataset | +| **Flag** | Mark for later review | + +### Batch Operations + +- *"Approve all"* — include all pending candidates +- *"Approve all errors"* — include all candidates from error harvest +- *"Reject duplicates"* — exclude candidates with similar queries to existing dataset entries +- *"Approve #1, #3, #5; reject #2, #4"* — selective approval by number + +## Step 3 — Finalize Dataset + +After review, filter approved candidates and save to a versioned dataset: + +1. Read `datasets/manifest.json` to find the latest version number +2. Filter candidates where `status == "approved"` +3. Remove the `status` field from the output +4. Save to `datasets/--v.jsonl` +5. Update `datasets/manifest.json` with metadata + +### Update Candidate Status + +Mark the candidate file with final statuses: + +```json +{"query": "How do I reset my password?", "status": "approved", "ground_truth": "Navigate to Settings > Security > Reset Password", "metadata": {...}} +{"query": "What's the refund policy?", "status": "rejected", "rejectReason": "duplicate of existing test case", "metadata": {...}} +{"query": "Can you help me...", "status": "approved", "metadata": {...}} +``` + +> 💡 **Tip:** Keep candidate files as an audit trail. They document what was reviewed, when, and why items were accepted or rejected. + +## Quality Checks + +Before finalizing, verify dataset quality: + +| Check | Criteria | +|-------|----------| +| **No duplicates** | Ensure no query appears in both the new dataset and existing datasets | +| **Balanced categories** | Verify reasonable distribution across categories (not all edge-cases) | +| **Ground truth coverage** | Flag examples without ground_truth that may benefit from one | +| **Minimum size** | Warn if dataset has fewer than 20 examples (may not be statistically meaningful) | +| **Safety coverage** | Ensure safety-related test cases are included if the agent handles sensitive topics | + +## Next Steps + +- **Version the approved dataset** → [Dataset Versioning](dataset-versioning.md) +- **Organize into splits** → [Dataset Organization](dataset-organization.md) +- **Run evaluation** → [observe skill Step 2](../../observe/references/evaluate-step.md) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/dataset-organization.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/dataset-organization.md new file mode 100644 index 00000000..1e627521 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/dataset-organization.md @@ -0,0 +1,112 @@ +# Dataset Organization — Metadata, Splits, and Filtered Evaluation + +Organize datasets using metadata fields, create train/validation/test splits, and run targeted evaluations on dataset subsets. This addresses the need for hierarchical dataset organization without requiring rigid container structures. + +## Metadata Schema + +Add metadata to each JSONL example to enable filtering and organization: + +| Field | Values | Purpose | +|-------|--------|---------| +| `category` | `edge-case`, `regression`, `happy-path`, `multi-turn`, `safety` | Test case classification | +| `source` | `trace`, `synthetic`, `manual`, `feedback` | How the example was created | +| `split` | `train`, `val`, `test` | Dataset split assignment | +| `priority` | `P0`, `P1`, `P2` | Severity/importance ranking | +| `harvestRule` | `error`, `latency`, `low-eval`, `combined` | Which harvest template captured it | +| `agentVersion` | `"1"`, `"2"`, etc. | Agent version when trace was captured | + +### Example JSONL with Metadata + +```json +{"query": "Reset my password", "ground_truth": "Navigate to Settings > Security > Reset Password", "metadata": {"category": "happy-path", "source": "manual", "split": "test", "priority": "P0"}} +{"query": "What happens if I delete my account while a refund is pending?", "metadata": {"category": "edge-case", "source": "trace", "split": "test", "priority": "P1", "harvestRule": "error"}} +{"query": "I want to harm myself", "ground_truth": "I'm concerned about your safety. Please contact...", "metadata": {"category": "safety", "source": "manual", "split": "test", "priority": "P0"}} +``` + +## Creating Splits + +### Automatic Split Assignment + +When creating a new dataset, assign splits based on rules: + +| Rule | Split | Rationale | +|------|-------|-----------| +| First 70% of examples | `train` | Bulk of data for development | +| Next 15% of examples | `val` | Validation during optimization | +| Final 15% of examples | `test` | Held-out for final evaluation | +| All `priority: P0` examples | `test` | Critical cases always in test | +| All `category: safety` examples | `test` | Safety always evaluated | + +### Manual Split Assignment + +Users can assign splits during [curation](dataset-curation.md) or by editing the JSONL metadata directly. + +## Filtered Evaluation Runs + +Run evaluations on specific subsets of a dataset by filtering JSONL before passing to the evaluator. + +### Filter by Split + +```python +import json + +# Read full dataset +with open("datasets/support-bot-traces-v3.jsonl") as f: + examples = [json.loads(line) for line in f] + +# Filter to test split only +test_examples = [e for e in examples if e.get("metadata", {}).get("split") == "test"] + +# Pass test_examples as inputData to evaluation_agent_batch_eval_create +``` + +### Filter by Category + +```python +# Only edge cases +edge_cases = [e for e in examples if e.get("metadata", {}).get("category") == "edge-case"] + +# Only safety test cases +safety_cases = [e for e in examples if e.get("metadata", {}).get("category") == "safety"] + +# Only P0 critical cases +p0_cases = [e for e in examples if e.get("metadata", {}).get("priority") == "P0"] +``` + +### Filter by Source + +```python +# Only production trace-derived cases (most representative) +trace_cases = [e for e in examples if e.get("metadata", {}).get("source") == "trace"] + +# Only manually curated cases (highest quality ground truth) +manual_cases = [e for e in examples if e.get("metadata", {}).get("source") == "manual"] +``` + +## Dataset Statistics + +Generate summary statistics to understand dataset composition: + +```python +from collections import Counter + +categories = Counter(e.get("metadata", {}).get("category", "unknown") for e in examples) +sources = Counter(e.get("metadata", {}).get("source", "unknown") for e in examples) +splits = Counter(e.get("metadata", {}).get("split", "unassigned") for e in examples) +priorities = Counter(e.get("metadata", {}).get("priority", "none") for e in examples) +``` + +Present as a table: + +| Dimension | Values | Count | +|-----------|--------|-------| +| **Category** | happy-path: 20, edge-case: 15, regression: 8, safety: 5, multi-turn: 10 | 58 total | +| **Source** | trace: 30, synthetic: 18, manual: 10 | 58 total | +| **Split** | train: 40, val: 9, test: 9 | 58 total | +| **Priority** | P0: 12, P1: 25, P2: 21 | 58 total | + +## Next Steps + +- **Run targeted evaluation** → [observe skill Step 2](../../observe/references/evaluate-step.md) (pass filtered `inputData`) +- **Compare splits** → [Dataset Comparison](dataset-comparison.md) +- **Track lineage** → [Eval Lineage](eval-lineage.md) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/dataset-versioning.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/dataset-versioning.md new file mode 100644 index 00000000..f0fa5f4e --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/dataset-versioning.md @@ -0,0 +1,156 @@ +# Dataset Versioning — Version Management & Tagging + +Manage dataset versions with naming conventions, tagging, and version pinning for reproducible evaluations. This workflow formalizes dataset lifecycle management using existing MCP tools and local conventions. + +## Naming Convention + +Use the pattern `--v`: + +| Component | Values | Example | +|-----------|--------|---------| +| `` | Agent name from `.env` | `support-bot` | +| `` | `traces`, `synthetic`, `manual`, `combined` | `traces` | +| `v` | Incremental version number | `v3` | + +**Full examples:** +- `support-bot-traces-v1` — first dataset from trace harvesting +- `support-bot-synthetic-v2` — second synthetic dataset +- `support-bot-combined-v5` — fifth dataset combining traces + manual examples + +## Tagging Conventions + +Tags are stored in `datasets/manifest.json` alongside dataset metadata: + +| Tag | Meaning | When to Apply | +|-----|---------|---------------| +| `baseline` | Reference dataset for comparison | When establishing a new evaluation baseline | +| `prod` | Dataset used for current production evaluation | After successful deployment | +| `canary` | Dataset for canary/staging evaluation | During staged rollout | +| `regression-` | Dataset that caught a regression | When a regression is detected | +| `deprecated` | Dataset no longer in active use | When replaced by a newer version | + +## Version Pinning + +Pin evaluations to a specific dataset version to ensure reproducible, comparable results: + +### Local Pinning (JSONL Datasets) + +When using local JSONL files, reference the exact filename in evaluation runs: + +``` +datasets/support-bot-traces-v3.jsonl ← pinned by filename +``` + +Pass the contents via `inputData` parameter in **`evaluation_agent_batch_eval_create`**. + +### ~~Server-Side Pinning~~ (Not Available) + +> ⚠️ **Dataset upload MCP tools are not yet ready.** Skip `evaluation_dataset_create` (uploads) for now. You may use `evaluation_dataset_get` for read-only inspection of any existing server-side datasets, but do **not** rely on them for version pinning—use local JSONL files and pass data via `inputData` when running evaluations. + +## Manifest File + +Track all dataset versions, tags, and lineage in `datasets/manifest.json`: + +```json +{ + "datasets": [ + { + "name": "support-bot-traces-v1", + "file": "support-bot-traces-v1.jsonl", + "version": "1", + "tag": "deprecated", + "source": "trace-harvest", + "harvestRule": "error", + "timeRange": "2025-01-01 to 2025-01-07", + "exampleCount": 32, + "createdAt": "2025-01-08T10:00:00Z", + "evalRunIds": ["run-abc-123"] + }, + { + "name": "support-bot-traces-v2", + "file": "support-bot-traces-v2.jsonl", + "version": "2", + "tag": "baseline", + "source": "trace-harvest", + "harvestRule": "error+latency", + "timeRange": "2025-01-15 to 2025-01-21", + "exampleCount": 47, + "createdAt": "2025-01-22T10:00:00Z", + "evalRunIds": ["run-def-456", "run-ghi-789"] + }, + { + "name": "support-bot-traces-v3", + "file": "support-bot-traces-v3.jsonl", + "version": "3", + "tag": "prod", + "source": "trace-harvest", + "harvestRule": "error+latency+low-eval", + "timeRange": "2025-02-01 to 2025-02-07", + "exampleCount": 63, + "createdAt": "2025-02-08T10:00:00Z", + "evalRunIds": [] + } + ] +} +``` + +## Creating a New Version + +1. **Check existing versions**: Read `datasets/manifest.json` to find the latest version number +2. **Increment version**: Use `v` as the new version +3. **Create dataset**: Via [Trace-to-Dataset](trace-to-dataset.md) or manual JSONL creation +4. **Update manifest**: Add the new entry with metadata +5. **Tag appropriately**: Apply `baseline`, `prod`, or other tags as needed +6. **Deprecate old**: Optionally mark previous versions as `deprecated` + +> ⚠️ **DO NOT stop here.** After creating a new dataset version, continue to the Dataset Update Loop below. + +## Dataset Update Loop — Eval → Analyze → Optimize → Re-Eval + +When a dataset is updated (new rows, better coverage, new failure modes), run this loop to validate the agent against the harder test suite: + +``` +[1] Eval with new dataset (v2) using same agent version + │ + ▼ +[2] Compare: eval on v1 vs eval on v2 (same agent, different datasets) + │ + ▼ +[3] Analyze score changes — expect some drops (harder tests ≠ worse agent) + │ + ▼ +[4] Optimize agent prompt based on NEW failure patterns only + │ + ▼ +[5] Re-eval optimized agent on v2 dataset → compare to pre-optimization + │ + ▼ +[6] If satisfied → tag v2 as `prod`, archive v1 +``` + +### ⛔ Guardrails for This Loop + +- **Never remove dataset rows to recover scores.** If eval scores drop after a dataset update, the dataset is likely exposing real gaps. Removing hard cases defeats the purpose. +- **Never weaken evaluators to recover scores.** Do not lower thresholds, remove evaluators, or switch to easier scoring when scores drop on an expanded dataset. +- **Distinguish dataset difficulty from agent regression.** A score drop on a harder dataset is expected and healthy — it means test coverage improved. Only flag as regression when the same dataset + same evaluators produce worse scores on a new agent version. +- **Optimize for NEW failure patterns only.** When optimizing the agent prompt after a dataset update, target the newly added test cases. Do not re-optimize for cases that were already passing. + +## Comparing Versions + +To understand how a dataset evolved between versions: + +```bash +# Count examples per version +wc -l datasets/support-bot-traces-v*.jsonl + +# Diff example queries between versions +jq -r '.query' datasets/support-bot-traces-v2.jsonl | sort > /tmp/v2-queries.txt +jq -r '.query' datasets/support-bot-traces-v3.jsonl | sort > /tmp/v3-queries.txt +diff /tmp/v2-queries.txt /tmp/v3-queries.txt +``` + +## Next Steps + +- **Organize into splits** → [Dataset Organization](dataset-organization.md) +- **Run evaluation with pinned version** → [observe skill Step 2](../../observe/references/evaluate-step.md) +- **Track lineage** → [Eval Lineage](eval-lineage.md) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/eval-lineage.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/eval-lineage.md new file mode 100644 index 00000000..0c6b56bc --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/eval-lineage.md @@ -0,0 +1,125 @@ +# Eval Lineage — Full Traceability from Production to Deployment + +Track the complete chain from production traces through dataset creation, evaluation runs, comparisons, and deployment decisions. Enables "why was this deployed?" audit queries and compliance reporting. + +## Lineage Chain + +``` +Production Trace (App Insights) + │ conversationId, responseId + ▼ +Dataset Version (datasets/*.jsonl) + │ metadata.conversationId, metadata.harvestRule + ▼ +Evaluation Run (evaluation_agent_batch_eval_create) + │ evaluationId, evalRunId + ▼ +Comparison (evaluation_comparison_create) + │ insightId, baselineRunId, treatmentRunIds + ▼ +Deployment Decision (agent_update + agent_container_control) + │ agentVersion + ▼ +Production Trace (cycle repeats) +``` + +## Lineage Manifest + +Track lineage in `datasets/manifest.json`: + +```json +{ + "datasets": [ + { + "name": "support-bot-traces-v3", + "file": "support-bot-traces-v3.jsonl", + "version": "3", + "tag": "prod", + "source": "trace-harvest", + "harvestRule": "error+latency", + "timeRange": "2025-02-01 to 2025-02-07", + "exampleCount": 63, + "createdAt": "2025-02-08T10:00:00Z", + "evalRuns": [ + { + "evalId": "eval-group-001", + "runId": "run-abc-123", + "agentVersion": "3", + "date": "2025-02-08T12:00:00Z", + "status": "completed" + }, + { + "evalId": "eval-group-001", + "runId": "run-def-456", + "agentVersion": "4", + "date": "2025-02-10T09:00:00Z", + "status": "completed" + } + ], + "comparisons": [ + { + "insightId": "insight-xyz-789", + "baselineRunId": "run-abc-123", + "treatmentRunIds": ["run-def-456"], + "result": "v4 improved on 3/5 metrics", + "date": "2025-02-10T10:00:00Z" + } + ], + "deployments": [ + { + "agentVersion": "4", + "deployedAt": "2025-02-10T14:00:00Z", + "reason": "v4 improved coherence +25%, relevance +10% vs v3" + } + ] + } + ] +} +``` + +## Audit Queries + +### "Why was version X deployed?" + +1. Read `datasets/manifest.json` +2. Find entries where `deployments[].agentVersion == X` +3. Show the comparison that justified the deployment +4. Show the dataset and eval runs that informed the comparison + +### "What traces led to this dataset?" + +1. Read the dataset JSONL file +2. Extract `metadata.conversationId` from each example +3. Look up each conversation in App Insights using the [trace skill](../../trace/trace.md) + +### "What evaluation history does this agent have?" + +1. Use **`evaluation_get`** to list all evaluation groups +2. For each group, list runs with `isRequestForRuns=true` +3. Build the timeline from [Eval Trending](eval-trending.md) +4. Show comparisons from **`evaluation_comparison_get`** + +### "Did this dataset version catch any regressions?" + +1. Find the dataset version in the manifest +2. Check `evalRuns` for runs that used this dataset +3. Check `comparisons` for any regression results +4. Cross-reference with `tag == "regression-"` entries + +## Maintaining Lineage + +Update `datasets/manifest.json` at each step: + +| Event | Fields to Update | +|-------|-----------------| +| Dataset created | Add new entry with `name`, `version`, `source`, `exampleCount` | +| Evaluation run | Append to `evalRuns[]` with `evalId`, `runId`, `agentVersion` | +| Comparison | Append to `comparisons[]` with `insightId`, `result` | +| Deployment | Append to `deployments[]` with `agentVersion`, `reason` | +| Tag change | Update `tag` field | + +## Next Steps + +- **View metric trends** → [Eval Trending](eval-trending.md) +- **Check for regressions** → [Eval Regression](eval-regression.md) +- **Harvest new traces** → [Trace-to-Dataset](trace-to-dataset.md) (start the next cycle) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/eval-regression.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/eval-regression.md new file mode 100644 index 00000000..c9377de2 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/eval-regression.md @@ -0,0 +1,121 @@ +# Eval Regression — Automated Regression Detection + +Automatically detect when evaluation metrics degrade between agent versions. Compare each evaluation run against the baseline and generate pass/fail verdicts with actionable recommendations. + +## Prerequisites + +- At least 2 evaluation runs in the same evaluation group +- Baseline run identified (either the first run or the one tagged as `baseline`) + +## Step 1 — Identify Baseline and Treatment + +### Automatic Baseline Selection + +1. Read `datasets/manifest.json` and find the dataset tagged `baseline`. +2. If the baseline dataset entry includes a stored `baselineRunId` (or mapping to one or more `evalRunIds`), use that `baselineRunId` as the baseline run. +3. If no explicit `baselineRunId` is recorded, select the first (oldest) run in the evaluation group as the baseline. + +### Treatment Selection + +The latest (most recent) run in the evaluation group is the treatment. + +## Step 2 — Run Comparison + +Use **`evaluation_comparison_create`** to compare baseline vs treatment: + +> **Critical:** `displayName` is **required** in the `insightRequest`. Despite the MCP tool schema showing it as optional, the API rejects requests without it. + +```json +{ + "insightRequest": { + "displayName": "Regression Check - v1 vs v4", + "state": "NotStarted", + "request": { + "type": "EvaluationComparison", + "evalId": "", + "baselineRunId": "", + "treatmentRunIds": [""] + } + } +} +``` + +Retrieve results with **`evaluation_comparison_get`** using the returned `insightId`. + +## Step 3 — Regression Verdicts + +For each evaluator in the comparison results, apply regression thresholds: + +| Treatment Effect | Delta | Verdict | Action | +|-----------------|-------|---------|--------| +| `Improved` | > +2% | ✅ PASS | No action needed | +| `Changed` | ±2% | ⚠️ NEUTRAL | Monitor, no immediate action | +| `Degraded` | > -2% | 🔴 REGRESSION | Investigate and remediate | +| `Inconclusive` | — | ❓ INCONCLUSIVE | Increase sample size and re-run | +| `TooFewSamples` | — | ❓ INSUFFICIENT DATA | Need more test cases (≥30 recommended) | + +### Example Regression Report + +``` +╔═══════════════════════════════════════════════════════════════╗ +║ REGRESSION REPORT: v1 (baseline) → v4 ║ +╠═══════════════════════════════════════════════════════════════╣ +║ Evaluator │ Baseline │ Treatment │ Delta │ Verdict ║ +╠════════════════════╪══════════╪═══════════╪════════╪═════════╣ +║ Coherence │ 3.2 │ 4.0 │ +0.8 │ ✅ PASS ║ +║ Fluency │ 4.1 │ 4.5 │ +0.4 │ ✅ PASS ║ +║ Relevance │ 2.8 │ 3.6 │ +0.8 │ ✅ PASS ║ +║ Intent Resolution │ 3.0 │ 4.1 │ +1.1 │ ✅ PASS ║ +║ Task Adherence │ 2.5 │ 3.9 │ +1.4 │ ✅ PASS ║ +║ Safety │ 0.95 │ 0.98 │ +0.03 │ ✅ PASS ║ +╠═══════════════════════════════════════════════════════════════╣ +║ OVERALL: ✅ ALL EVALUATORS PASSED — Safe to deploy ║ +╚═══════════════════════════════════════════════════════════════╝ +``` + +### Example with Regression + +``` +╔═══════════════════════════════════════════════════════════════╗ +║ REGRESSION REPORT: v3 → v4 ║ +╠═══════════════════════════════════════════════════════════════╣ +║ Evaluator │ v3 │ v4 │ Delta │ Verdict ║ +╠════════════════════╪══════════╪═══════════╪════════╪═════════╣ +║ Coherence │ 4.1 │ 4.0 │ -0.1 │ ⚠️ NEUT║ +║ Fluency │ 4.4 │ 4.5 │ +0.1 │ ✅ PASS ║ +║ Relevance │ 4.0 │ 3.6 │ -0.4 │ 🔴 REGR║ +║ Intent Resolution │ 4.2 │ 4.1 │ -0.1 │ ⚠️ NEUT║ +║ Task Adherence │ 3.8 │ 3.9 │ +0.1 │ ✅ PASS ║ +║ Safety │ 0.96 │ 0.98 │ +0.02 │ ✅ PASS ║ +╠═══════════════════════════════════════════════════════════════╣ +║ OVERALL: 🔴 REGRESSION DETECTED on Relevance (-10%) ║ +║ RECOMMENDATION: Do NOT deploy v4. Investigate relevance drop.║ +╚═══════════════════════════════════════════════════════════════╝ +``` + +## Step 4 — Remediation Recommendations + +When regression is detected, provide actionable guidance: + +| Regression Type | Likely Cause | Recommended Action | +|----------------|-------------|-------------------| +| Relevance drop | Prompt changes reduced focus on user query | Review prompt diff, restore relevance instructions | +| Coherence drop | Added conflicting instructions | Simplify prompt, use `prompt_optimize` | +| Safety regression | Removed safety guardrails | Restore safety instructions, add safety test cases | +| Task adherence drop | Tool configuration changed | Verify tool definitions, check for missing tools | +| Across-the-board drop | Dataset drift or model change | Check if evaluation dataset changed, verify model deployment | + +## CI/CD Integration + +Include regression checks in automated pipelines. See [observe skill CI/CD](../../observe/references/cicd-monitoring.md) for GitHub Actions workflow templates that: + +1. Run batch evaluation after every deployment +2. Compare against baseline +3. Block deployment if any evaluator shows > 5% regression +4. Alert team via GitHub issue or Slack webhook + +## Next Steps + +- **View full trend history** → [Eval Trending](eval-trending.md) +- **Optimize to fix regression** → [observe skill Step 4](../../observe/references/optimize-deploy.md) +- **Roll back if critical** → [deploy skill](../../deploy/deploy.md) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/eval-trending.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/eval-trending.md new file mode 100644 index 00000000..6ea2d45c --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/eval-trending.md @@ -0,0 +1,91 @@ +# Eval Trending — Metrics Over Time + +Track evaluation metrics across multiple runs and versions to visualize improvement trends and detect regressions. This addresses the gap of understanding how agent quality changes over time. + +## Prerequisites + +- At least 2 evaluation runs in the same evaluation group (same `evaluationId`) +- Project endpoint available in `.env` + +## Step 1 — Retrieve Evaluation History + +Use **`evaluation_get`** to list all evaluation groups: + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `projectEndpoint` | ✅ | Azure AI Project endpoint | +| `isRequestForRuns` | | `false` (default) to list evaluation groups | + +Then retrieve all runs within the target evaluation group: + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `projectEndpoint` | ✅ | Azure AI Project endpoint | +| `evalId` | ✅ | Evaluation group ID | +| `isRequestForRuns` | ✅ | `true` to list runs | + +## Step 2 — Build Metrics Timeline + +For each run, extract per-evaluator scores and build a timeline: + +| Run | Agent Version | Date | Coherence | Fluency | Relevance | Intent Resolution | Task Adherence | Safety | +|-----|--------------|------|-----------|---------|-----------|-------------------|----------------|--------| +| run-001 | v1 | 2025-01-15 | 3.2 | 4.1 | 2.8 | 3.0 | 2.5 | 0.95 | +| run-002 | v2 | 2025-01-22 | 3.8 | 4.3 | 3.5 | 3.7 | 3.2 | 0.97 | +| run-003 | v3 | 2025-02-01 | 4.1 | 4.4 | 4.0 | 4.2 | 3.8 | 0.96 | +| run-004 | v4 | 2025-02-08 | 4.0 | 4.5 | 3.6 | 4.1 | 3.9 | 0.98 | + +## Step 3 — Trend Analysis + +Calculate trends for each evaluator: + +| Evaluator | v1 → v4 Change | Trend | Status | +|-----------|----------------|-------|--------| +| Coherence | +0.8 (+25%) | ↑ Improving | ✅ | +| Fluency | +0.4 (+10%) | ↑ Improving | ✅ | +| Relevance | +0.8 (+29%) | ↑ Improving (dip at v4) | ⚠️ | +| Intent Resolution | +1.1 (+37%) | ↑ Improving | ✅ | +| Task Adherence | +1.4 (+56%) | ↑ Improving | ✅ | +| Safety | +0.03 (+3%) | → Stable | ✅ | + +### Detecting Regressions + +Flag any evaluator where the latest run scored **lower** than the previous run: + +| Evaluator | Previous (v3) | Latest (v4) | Delta | Alert | +|-----------|--------------|-------------|-------|-------| +| Relevance | 4.0 | 3.6 | -0.4 (-10%) | ⚠️ **REGRESSION** | + +> ⚠️ **Regression detected:** Relevance dropped 10% from v3 to v4. Investigate prompt changes or dataset drift. See [Eval Regression](eval-regression.md) for automated analysis. + +### Trend Visualization (Text-based) + +``` +Coherence ████████████████████████████████░░░░░░ 4.0/5.0 ↑ +25% +Fluency █████████████████████████████████████░░ 4.5/5.0 ↑ +10% +Relevance ████████████████████████████░░░░░░░░░░ 3.6/5.0 ↑ +29% ⚠️ dip +Intent Res. █████████████████████████████████░░░░░░ 4.1/5.0 ↑ +37% +Task Adh. ████████████████████████████████░░░░░░░ 3.9/5.0 ↑ +56% +Safety ████████████████████████████████████████ 0.98 → Stable +``` + +## Step 4 — Cross-Version Summary + +Present an executive summary: + +*"Over 4 agent versions (v1→v4), your agent has improved significantly across all quality metrics. The biggest gain is Task Adherence (+56%). However, Relevance showed a 10% regression from v3 to v4 — recommend investigating recent prompt changes. Safety remains stable at 98%."* + +## Recommended Thresholds + +| Severity | Threshold | Action | +|----------|-----------|--------| +| ✅ Healthy | ≤ 2% drop from previous run | No action needed | +| ⚠️ Warning | 2–5% drop from previous run | Review recent changes | +| 🔴 Regression | > 5% drop from previous run | Block deployment, investigate | +| 🔴 Critical | Below baseline (v1) on any metric | Rollback to last known good version | + +## Next Steps + +- **Investigate regression** → [Eval Regression](eval-regression.md) +- **Compare specific versions** → [Dataset Comparison](dataset-comparison.md) +- **Set up automated monitoring** → [observe skill CI/CD](../../observe/references/cicd-monitoring.md) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/mcp-gap-analysis.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/mcp-gap-analysis.md new file mode 100644 index 00000000..8b425e81 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/mcp-gap-analysis.md @@ -0,0 +1,133 @@ +# MCP Tool Gap Analysis — Foundry Platform Roadmap Recommendations + +This document identifies MCP tool capabilities that would significantly enhance the evaluation dataset experience but are **not currently available** in the `foundry-mcp` server. These are recommendations for the platform team to close competitive gaps with LangSmith. + +## Current MCP Tool Coverage + +| Tool | Status | Gap | +|------|--------|-----| +| `evaluation_dataset_create` | ⚠️ Not practical | Requires Blob Storage SAS URL upload — no file upload path from agent. Use local JSONL + `inputData` instead | +| `evaluation_dataset_get` | ✅ Available | Cannot list all versions of a dataset; only gets by name+version | +| `evaluation_agent_batch_eval_create` | ✅ Available | Full-featured | +| `evaluation_dataset_batch_eval_create` | ✅ Available | Full-featured | +| `evaluation_get` | ✅ Available | Cannot filter runs by dataset version | +| `evaluation_comparison_create` | ✅ Available | No trend analysis; only pairwise comparison | +| `evaluation_comparison_get` | ✅ Available | Full-featured | +| `evaluator_catalog_*` | ✅ Available | No version history or audit trail | + +## Requested New MCP Tools + +### Priority 1: Critical (Blocks competitive parity with LangSmith) + +#### `dataset_version_list` +**Purpose:** List all versions of a named dataset. + +| Parameter | Type | Description | +|-----------|------|-------------| +| `projectEndpoint` | string (required) | Azure AI Project endpoint | +| `datasetName` | string (required) | Dataset name | + +**Why needed:** Currently, `evaluation_dataset_get` requires both name and version. There is no way to discover what versions exist for a given dataset. Users must track versions externally (our manifest.json workaround). + +**LangSmith equivalent:** Automatic version history with read-only historical access. + +#### `dataset_from_traces` +**Purpose:** Server-side extraction of App Insights traces into a dataset, with filtering and schema transformation. + +| Parameter | Type | Description | +|-----------|------|-------------| +| `projectEndpoint` | string (required) | Azure AI Project endpoint | +| `appInsightsResourceId` | string (required) | App Insights ARM resource ID | +| `filterQuery` | string (required) | KQL filter expression | +| `timeRange` | string (required) | Time range (e.g., "7d", "30d") | +| `datasetName` | string (optional) | Target dataset name | +| `datasetVersion` | string (optional) | Target version | +| `sampleSize` | integer (optional) | Max number of traces to extract | + +**Why needed:** Currently, trace-to-dataset requires client-side KQL execution, result parsing, schema transformation, and upload. A server-side tool would dramatically simplify the workflow and enable automation. + +**LangSmith equivalent:** Run rules with automatic trace-to-dataset routing. + +### Priority 2: High (Differentiating features) + +#### `evaluation_trend_get` +**Purpose:** Retrieve time-series metrics across all runs in an evaluation group. + +| Parameter | Type | Description | +|-----------|------|-------------| +| `projectEndpoint` | string (required) | Azure AI Project endpoint | +| `evalId` | string (required) | Evaluation group ID | +| `evaluatorNames` | string[] (optional) | Filter to specific evaluators | + +**Returns:** Array of `{ runId, agentVersion, date, metrics: { evaluatorName: { average, stddev, passRate } } }`. + +**Why needed:** Currently requires multiple `evaluation_get` calls and client-side aggregation. A dedicated tool would enable trend dashboards and regression detection in a single call. + +**LangSmith equivalent:** Evaluation dashboard with historical metrics and trend analysis. + +#### `dataset_tag_manage` +**Purpose:** Add, remove, or list tags on dataset versions. + +| Parameter | Type | Description | +|-----------|------|-------------| +| `projectEndpoint` | string (required) | Azure AI Project endpoint | +| `datasetName` | string (required) | Dataset name | +| `datasetVersion` | string (required) | Dataset version | +| `action` | string (required) | `add`, `remove`, `list` | +| `tag` | string (optional) | Tag to add/remove (e.g., `prod`, `baseline`) | + +**Why needed:** Tags enable version pinning semantics (e.g., "evaluate against the `prod` dataset"). Currently requires external tracking via manifest.json. + +**LangSmith equivalent:** Built-in dataset tagging with programmatic SDK access. + +### Priority 3: Medium (Nice-to-have for competitive advantage) + +#### `dataset_split_manage` +**Purpose:** Create and manage train/validation/test splits within a dataset. + +**Why needed:** Enables targeted evaluation on specific dataset subsets without creating separate datasets. Currently requires client-side JSONL filtering. + +#### `annotation_queue_create` / `annotation_queue_get` +**Purpose:** Server-side human review queues for trace candidates before dataset inclusion. + +**Why needed:** Enables multi-user review workflows. Currently, curation is a single-user, local-file process. + +**LangSmith equivalent:** Annotation queues with multi-user review, approval workflows, and queue management. + +#### `evaluation_regression_check` +**Purpose:** Automated regression detection with configurable thresholds. + +| Parameter | Type | Description | +|-----------|------|-------------| +| `projectEndpoint` | string (required) | Azure AI Project endpoint | +| `evalId` | string (required) | Evaluation group ID | +| `baselineRunId` | string (required) | Baseline run ID | +| `treatmentRunId` | string (required) | Treatment run ID | +| `regressionThreshold` | number (optional) | Percent drop that triggers regression (default: 5%) | + +**Why needed:** Currently requires comparison + client-side threshold logic. A dedicated tool could integrate with CI/CD pipelines directly. + +## Impact Assessment + +| Requested Tool | Impact on CX Feedback | Effort Estimate | +|---------------|----------------------|-----------------| +| `dataset_version_list` | Directly addresses "organizing datasets" feedback | Low | +| `dataset_from_traces` | Directly addresses "creating datasets from traces" feedback | High | +| `evaluation_trend_get` | Directly addresses "comparing runs and metrics over time" feedback | Medium | +| `dataset_tag_manage` | Supports "hierarchical containers" feedback (via tags) | Low | +| `dataset_split_manage` | Supports "hierarchical containers" feedback (via splits) | Medium | +| `annotation_queue_*` | Enhances trace-to-dataset quality | High | +| `evaluation_regression_check` | Enables CI/CD regression gates | Medium | + +## Interim Workarounds + +Until these MCP tools are available, the [eval-datasets skill](../eval-datasets.md) provides client-side workarounds: + +| Gap | Workaround | +|-----|-----------| +| No version listing | `datasets/manifest.json` tracks all versions locally | +| No trace-to-dataset | KQL harvest templates + local schema transform | +| No trend analysis | Multiple `evaluation_get` calls + client-side aggregation | +| No tagging | Tags stored in manifest.json | +| No annotation queues | Local candidate files with status tracking | +| No regression check | Comparison results + threshold logic in skill | diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/trace-to-dataset.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/trace-to-dataset.md new file mode 100644 index 00000000..07c1907f --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/eval-datasets/references/trace-to-dataset.md @@ -0,0 +1,319 @@ +# Trace-to-Dataset Pipeline — Harvest Production Traces as Test Cases + +Extract production traces from App Insights using KQL, transform them into evaluation dataset format, and persist as versioned datasets. This is the core workflow for turning real-world agent failures into reproducible test cases. + +## ⛔ Do NOT + +- Do NOT upload datasets to blob storage or call `evaluation_dataset_create` — this MCP tool is not ready. +- Do NOT generate SAS URLs. Local JSONL + `inputData` is the only supported path. +- Do NOT use `parse_json(customDimensions)` — `customDimensions` is already a `dynamic` column in App Insights KQL. Access properties directly: `customDimensions["gen_ai.response.id"]`. + +## Related References + +- [Eval Correlation](../../trace/references/eval-correlation.md) (in `agent/trace/references/`) — look up eval scores by response/conversation ID via `customEvents` +- [KQL Templates](../../trace/references/kql-templates.md) (in `agent/trace/references/`) — general trace query patterns and attribute mappings + +## Prerequisites + +- App Insights resource resolved (see [trace skill](../../trace/trace.md) Before Starting) +- Agent name and project endpoint available in `.env` +- Time range confirmed with user (default: last 7 days) + +> 💡 **Run all KQL queries** using **`monitor_resource_log_query`** (Azure MCP tool) against the App Insights resource. This is preferred over delegating to the `azure-kusto` skill. + +> ⚠️ **Always pass `subscription` explicitly** to Azure MCP tools — they don't extract it from resource IDs. + +## Overview + +``` +App Insights traces + │ + ▼ +[1] KQL Harvest Query (filter by error/latency/eval score) + │ + ▼ +[2] Schema Transform (trace → JSONL format) + │ + ▼ +[3] Human Review (show candidates, let user approve/edit/reject) + │ + ▼ +[4] Persist Dataset (local JSONL files) +``` + +## Key Concept: Linking Evaluation Results to Traces + +> 💡 **Evaluation results live in `customEvents`, not in `dependencies`.** Foundry writes eval scores to App Insights as `customEvents` with `name == "gen_ai.evaluation.result"`. Agent traces (spans) live in `dependencies`. The link between them is **`gen_ai.response.id`** — this field appears on both tables. + +| Table | Contains | Join Key | +|-------|----------|----------| +| `dependencies` | Agent traces (spans, tool calls, LLM calls) | `customDimensions["gen_ai.response.id"]` | +| `customEvents` | Evaluation results (scores, labels, explanations) | `customDimensions["gen_ai.response.id"]` | + +**To harvest traces with eval scores**, join `customEvents` → `dependencies` on `responseId`. The [Low-Eval Harvest](#low-eval-harvest--traces-with-poor-evaluation-scores) template below shows this pattern. For standalone eval lookups, see [Eval Correlation](../../trace/references/eval-correlation.md) (in `agent/trace/references/`). + +## Step 1 — Choose a Harvest Template + +Select the appropriate KQL template based on user intent. These templates mirror common LangSmith "run rules" but offer more power through KQL's query language. + +> ⚠️ **Hosted agents:** The Foundry agent name (e.g., `hosted-agent-022-001`) only appears on `requests`, NOT on `dependencies`. For hosted agents, use the [Hosted Agent Harvest](#hosted-agent-harvest) template which joins via `requests.id` → `dependencies.operation_ParentId`. The templates below work directly for **prompt agents** where `gen_ai.agent.name` on `dependencies` matches the Foundry name. + +### Error Harvest — Failed Traces + +Captures all traces where the agent returned errors. Equivalent to LangSmith's `eq(error, True)` run rule. + +```kql +dependencies +| where timestamp > ago(7d) +| where success == false +| where isnotempty(customDimensions["gen_ai.operation.name"]) +| where customDimensions["gen_ai.agent.name"] == "" +| extend + conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + errorType = tostring(customDimensions["error.type"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]) +| summarize + errorCount = count(), + errors = make_set(errorType, 5), + firstSeen = min(timestamp), + lastSeen = max(timestamp) + by conversationId, responseId, operation, model +| order by lastSeen desc +| take 100 +``` + +### Low-Eval Harvest — Traces with Poor Evaluation Scores + +Captures traces where evaluator scores fell below a threshold. Equivalent to LangSmith's `and(eq(feedback_key, "quality"), lt(feedback_score, 0.3))` run rule. + +```kql +let lowEvalResponses = customEvents +| where timestamp > ago(7d) +| where name == "gen_ai.evaluation.result" +| extend + score = todouble(customDimensions["gen_ai.evaluation.score.value"]), + evalName = tostring(customDimensions["gen_ai.evaluation.name"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]) +| where score < +| project responseId, conversationId, evalName, score; +lowEvalResponses +| join kind=inner ( + dependencies + | where timestamp > ago(7d) + | where isnotempty(customDimensions["gen_ai.response.id"]) + | extend responseId = tostring(customDimensions["gen_ai.response.id"]) +) on responseId +| extend + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]) +| project timestamp, conversationId, responseId, evalName, score, operation, model, duration +| order by score asc +| take 100 +``` + +> 💡 **Tip:** Replace `` with the pass threshold from your evaluator config. Common values: `3.0` for 1–5 ordinal scales, `0.5` for 0–1 continuous scales. + +### Latency Harvest — Slow Responses + +Captures traces where response latency exceeds a threshold. Equivalent to LangSmith's `gt(latency, 5000)` run rule. + +```kql +dependencies +| where timestamp > ago(7d) +| where duration > +| where isnotempty(customDimensions["gen_ai.operation.name"]) +| where customDimensions["gen_ai.agent.name"] == "" +| extend + conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]) +| summarize + avgDuration = avg(duration), + maxDuration = max(duration), + spanCount = count() + by conversationId, responseId, operation, model +| order by maxDuration desc +| take 100 +``` + +> 💡 **Tip:** Replace `` with the latency threshold in milliseconds. Common values: `5000` (5s), `10000` (10s), `30000` (30s). + +### Combined Harvest — Multi-Criteria Filter + +Combines multiple filters in a single query. Equivalent to LangSmith's compound rule: `and(gt(latency, 2000), eq(error, true), has(tags, "prod"))`. + +```kql +dependencies +| where timestamp > ago(7d) +| where customDimensions["gen_ai.agent.name"] == "" +| where isnotempty(customDimensions["gen_ai.operation.name"]) +| where success == false or duration > +| extend + conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + errorType = tostring(customDimensions["error.type"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]) +| summarize + errorCount = countif(success == false), + avgDuration = avg(duration), + maxDuration = max(duration), + spanCount = count() + by conversationId, responseId, operation, model +| order by errorCount desc, maxDuration desc +| take 100 +``` + +### Sampling — Control Dataset Size + +Add `| sample ` or `| take ` to any harvest query to control the number of traces extracted. Equivalent to LangSmith's `sampling_rate` parameter. + +```kql +// Random sample of 50 traces from the harvest +... | sample 50 + +// Top 50 most recent traces +... | order by timestamp desc | take 50 + +// Stratified sample: 20 errors + 20 slow + 10 low-eval +// Run each harvest separately and combine +``` + +### Hosted Agent Harvest — Two-Step Join Pattern + +For hosted agents, the Foundry agent name lives on `requests`, not `dependencies`. Use this two-step pattern: + +```kql +let reqIds = requests +| where timestamp > ago(7d) +| where customDimensions["gen_ai.agent.name"] == "" +| distinct id; +dependencies +| where timestamp > ago(7d) +| where operation_ParentId in (reqIds) +| where customDimensions["gen_ai.operation.name"] == "invoke_agent" +| extend + conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]) +| project timestamp, duration, success, conversationId, responseId, operation, model, inputTokens, outputTokens +| order by timestamp desc +| take 100 +``` + +> 💡 **When to use this pattern:** If the direct `dependencies` filter by `gen_ai.agent.name` returns no results, the agent is likely a hosted agent where `gen_ai.agent.name` on `dependencies` holds the code-level class name (e.g., `BingSearchAgent`), not the Foundry name. Switch to this `requests` → `dependencies` join. + +## Step 2 — Schema Transform + +Transform harvested traces into JSONL dataset format. Each line in the JSONL file must contain: + +| Field | Required | Source | +|-------|----------|--------| +| `query` | ✅ | User input — extract from `gen_ai.input.messages` on `invoke_agent` dependency spans | +| `response` | Optional | Agent output — extract from `gen_ai.output.messages` on `invoke_agent` dependency spans | +| `context` | Optional | Tool results or retrieved documents from the trace | +| `ground_truth` | Optional | Expected correct answer (add during curation) | +| `metadata` | Optional | Source info: `{"source": "trace", "conversationId": "...", "harvestRule": "error"}` | + +### Extracting Input/Output from Traces + +The full input/output content lives on `invoke_agent` dependency spans in `gen_ai.input.messages` and `gen_ai.output.messages`. These contain complete message arrays: + +```json +// gen_ai.input.messages structure: +[{"role": "user", "parts": [{"type": "text", "content": "How do I reset my password?"}]}] + +// gen_ai.output.messages structure: +[{"role": "assistant", "parts": [{"type": "text", "content": "To reset your password..."}]}] +``` + +Query to extract input/output for a specific conversation: + +```kql +dependencies +| where customDimensions["gen_ai.conversation.id"] == "" +| where customDimensions["gen_ai.operation.name"] in ("invoke_agent", "execute_agent", "chat", "create_response") +| extend + responseId = tostring(customDimensions["gen_ai.response.id"]), + operation = tostring(customDimensions["gen_ai.operation.name"]), + inputMessages = tostring(customDimensions["gen_ai.input.messages"]), + outputMessages = tostring(customDimensions["gen_ai.output.messages"]) +| order by timestamp asc +| take 10 +``` + +Extract the `query` from the last user-role entry in `gen_ai.input.messages` and the `response` from `gen_ai.output.messages`. Save extracted data to a local JSONL file: + +``` +datasets/-traces-candidates-.jsonl +``` + +## Step 3 — Human Review (Curation) + +> ⚠️ **MANDATORY:** Never auto-commit harvested traces to a dataset. Always show candidates to the user first. + +Present the harvested candidates as a table: + +| # | Conversation ID | Error Type | Duration | Eval Score | Query (preview) | +|---|----------------|------------|----------|------------|----------------| +| 1 | conv-abc-123 | TimeoutError | 12.3s | 2.0 | "How do I reset my..." | +| 2 | conv-def-456 | None | 8.7s | 1.5 | "What's the status of..." | +| 3 | conv-ghi-789 | ValidationError | 0.4s | 3.0 | "Can you help me with..." | + +Ask the user: +- *"Which candidates should I include in the dataset? (all / select by number / filter by criteria)"* +- *"Would you like to add ground_truth reference answers for any of these?"* +- *"What should I name this dataset version?"* + +## Step 4 — Persist Dataset (Local JSONL) + +Save approved candidates to `datasets/--v.jsonl`: + +```json +{"query": "How do I reset my password?", "context": "User account management", "metadata": {"source": "trace", "conversationId": "conv-abc-123", "harvestRule": "error"}} +{"query": "What's the status of my order?", "response": "...", "ground_truth": "Order #12345 shipped on...", "metadata": {"source": "trace", "conversationId": "conv-def-456", "harvestRule": "latency"}} +``` + +### Update Manifest + +After persisting, update `datasets/manifest.json` with lineage information: + +```json +{ + "datasets": [ + { + "name": "support-bot-traces-v3", + "file": "support-bot-traces-v3.jsonl", + "version": "3", + "source": "trace-harvest", + "harvestRule": "error+latency", + "timeRange": "2025-02-01 to 2025-02-07", + "exampleCount": 47, + "createdAt": "2025-02-08T10:00:00Z", + "reviewedBy": "user" + } + ] +} +``` + +## Next Steps + +After creating a dataset: +- **Run evaluation** → [observe skill Step 2](../../observe/references/evaluate-step.md) +- **Version and tag** → [Dataset Versioning](dataset-versioning.md) +- **Organize into splits** → [Dataset Organization](dataset-organization.md) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/invoke/invoke.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/invoke/invoke.md new file mode 100644 index 00000000..0436dd2d --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/invoke/invoke.md @@ -0,0 +1,98 @@ +# Invoke Foundry Agent + +Invoke and test deployed agents in Azure AI Foundry with single-turn and multi-turn conversations. + +## Quick Reference + +| Property | Value | +|----------|-------| +| Agent types | Prompt (LLM-based), Hosted (ACA based), Hosted (vNext) | +| MCP server | `foundry-mcp` | +| Key MCP tools | `agent_invoke`, `agent_container_status_get`, `agent_get` | +| Conversation support | Single-turn and multi-turn (via `conversationId`) | +| Session support | Sticky sessions for vNext hosted agents (via client-generated `sessionId`) | + +## When to Use This Skill + +- Send a test message to a deployed agent +- Have multi-turn conversations with an agent +- Test a prompt agent immediately after creation +- Test a hosted agent after its container is running +- Verify an agent responds correctly to specific inputs + +## MCP Tools + +| Tool | Description | Parameters | +|------|-------------|------------| +| `agent_invoke` | Send a message to an agent and get a response | `projectEndpoint`, `agentName`, `inputText` (required); `agentVersion`, `conversationId`, `containerEndpoint`, `sessionId` (mandatory for vNext hosted agents) | +| `agent_container_status_get` | Check container running status (hosted agents) | `projectEndpoint`, `agentName` (required); `agentVersion` | +| `agent_get` | Get agent details to verify existence and type | `projectEndpoint` (required), `agentName` (optional) | + +## Workflow + +### Step 1: Verify Agent Readiness + +Delegate the readiness check to a sub-agent. Provide the project endpoint and agent name, and instruct it to: + +**Prompt agents** → Use `agent_get` to verify the agent exists. + +**Hosted agents (ACA)** → Use `agent_container_status_get` to check: +- Status `Running` ✅ → Proceed to Step 2 +- Status `Starting` → Wait and re-check +- Status `Stopped` or `Failed` ❌ → Warn the user and suggest using the deploy skill to start the container + +**Hosted agents (vNext)** → Ready immediately after deployment (no container status check needed) + +### Step 2: Invoke Agent + +Use the project endpoint and agent name from the project context (see Common: Project Context Resolution). Ask the user only for values not already resolved. + +Use `agent_invoke` to send a message: +- `projectEndpoint` — AI Foundry project endpoint +- `agentName` — Name of the agent to invoke +- `inputText` — The message to send + +**Optional parameters:** +- `agentVersion` — Target a specific agent version +- `sessionId` — MANDATORY for vNext hosted agents, include the session ID to maintain sticky sessions with the same compute resource + +#### Session Support for vNext Hosted Agents +In vNext hosted agents, the invoke endpoint accepts a 25 character alphanumeric `sessionId` parameter. Sessions are **sticky** - they route the request to same underlying compute resource, so agent can re-use the state stored in compute's file across multiple turns. + +Rules: +1. You MUST generate a unique `sessionId` before making the first `agent_invoke` call. +2. If you have a session ID, you MUST include it in every subsequent `agent_invoke` call for that conversation. +3. When the user explicitly requests a new session, create a new `sessionId` and use it for rest of the `agent_invoke` calls. + +This is different from `conversationId` which tracks conversation history — `sessionId` controls which compute instance handles the request. + +### Step 3: Multi-Turn Conversations + +For follow-up messages, pass the `conversationId` from the previous response to `agent_invoke`. This maintains conversation context across turns. + +Each invocation with the same `conversationId` continues the existing conversation thread. + +## Agent Type Differences + +| Behavior | Prompt Agent | Hosted Agent | +|----------|-------------|--------------| +| Readiness | Immediate after creation | Requires running container | +| Pre-check | `agent_get` to verify exists | `agent_container_status_get` for `Running` status | +| Routing | Automatic | Optional `containerEndpoint` parameter | +| Multi-turn | ✅ via `conversationId` | ✅ via `conversationId` | + +## Error Handling + +| Error | Cause | Resolution | +|-------|-------|------------| +| Agent not found | Invalid agent name or project endpoint | Use `agent_get` to list available agents and verify name | +| Container not running | Hosted agent container is stopped or failed | Use deploy skill to start the container with `agent_container_control` | +| Invocation failed | Model error, timeout, or invalid input | Check agent logs, verify model deployment is active, retry with simpler input | +| Conversation ID invalid | Stale or non-existent conversation | Start a new conversation without `conversationId` | +| Rate limit exceeded | Too many requests | Implement backoff and retry, or wait before sending next message | + +## Additional Resources + +- [Foundry Hosted Agents](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry) +- [Foundry Agent Runtime Components](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/runtime-components?view=foundry) +- [Foundry Samples](https://github.com/azure-ai-foundry/foundry-samples) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/observe.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/observe.md new file mode 100644 index 00000000..c29f4ac8 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/observe.md @@ -0,0 +1,70 @@ +# Agent Observability Loop + +Orchestrate the full eval-driven optimization cycle for a Foundry agent. This skill manages the **multi-step workflow** — auto-creating evaluators, generating test datasets, running batch evals, clustering failures, optimizing prompts, redeploying, and comparing versions. Use this skill instead of calling individual foundry-mcp evaluation tools manually. + +## When to Use This Skill + +USE FOR: evaluate my agent, run an eval, test my agent, check agent quality, run batch evaluation, analyze eval results, why did my eval fail, cluster failures, improve agent quality, optimize agent prompt, compare agent versions, re-evaluate after changes, set up CI/CD evals, agent monitoring, eval-driven optimization. + +> ⚠️ **DO NOT manually call** `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, or `prompt_optimize` **without reading this skill first.** This skill defines required pre-checks, artifact persistence, and multi-step orchestration that the raw tools do not enforce. + +## Quick Reference + +| Property | Value | +|----------|-------| +| MCP server | `foundry-mcp` | +| Key MCP tools | `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, `prompt_optimize`, `agent_update` | +| Prerequisite | Agent deployed and running (use [deploy skill](../deploy/deploy.md)) | + +## Entry Points + +| User Intent | Start At | +|-------------|----------| +| "Deploy and evaluate my agent" | [Step 1: Auto-Setup Evaluators](references/deploy-and-setup.md) (deploy first via [deploy skill](../deploy/deploy.md)) | +| "Agent just deployed" / "Set up evaluation" | [Step 1: Auto-Setup Evaluators](references/deploy-and-setup.md) (skip deploy, run auto-create) | +| "Evaluate my agent" / "Run an eval" | [Step 1: Auto-Setup Evaluators](references/deploy-and-setup.md) first if `evaluators/` is empty, then [Step 2: Evaluate](references/evaluate-step.md) | +| "Why did my eval fail?" / "Analyze results" | [Step 3: Analyze](references/analyze-results.md) | +| "Improve my agent" / "Optimize prompt" | [Step 4: Optimize](references/optimize-deploy.md) | +| "Compare agent versions" | [Step 5: Compare](references/compare-iterate.md) | +| "Set up CI/CD evals" | [Step 6: CI/CD](references/cicd-monitoring.md) | + +> ⚠️ **Important:** Before running any evaluation (Step 2), always check if evaluators and test datasets exist in `evaluators/` and `datasets/`. If they don't, route through [Step 1: Auto-Setup](references/deploy-and-setup.md) first — even if the user only asked to "evaluate." + +## Before Starting — Detect Current State + +1. Check `.env` for `AZURE_AI_PROJECT_ENDPOINT` and `AZURE_AI_AGENT_NAME` +2. Use `agent_get` and `agent_container_status_get` to verify the agent exists and is running +3. Use `evaluation_get` to check for existing eval runs +4. Jump to the appropriate entry point + +## Loop Overview + +``` +1. Auto-setup evaluators & local test dataset + → ask: "Run an evaluation to identify optimization opportunities?" +2. Evaluate (batch eval run) +3. Download & cluster failures +4. Pick a category to optimize +5. Optimize prompt +6. Deploy new version (after user sign-off) +7. Re-evaluate (same eval group) +8. Compare versions → decide which to keep +9. Loop to next category or finish +10. Prompt: enable CI/CD evals & continuous production monitoring +``` + +## Behavioral Rules + +1. **Auto-poll in background.** After creating eval runs or starting containers, poll in a background terminal. Only surface the final result. +2. **Confirm before changes.** Show diff/summary before modifying agent code or deploying. Wait for sign-off. +3. **Prompt for next steps.** After each step, present options. Never assume the path forward. +4. **Write scripts to files.** Python scripts go in `scripts/` — no inline code blocks. +5. **Persist eval artifacts.** Save to `evaluators/`, `datasets/`, and `results/` for version tracking (see [deploy-and-setup](references/deploy-and-setup.md) for structure). + +## Related Skills + +| User Intent | Skill | +|-------------|-------| +| "Analyze production traces" / "Search conversations" / "Find errors in App Insights" | [trace skill](../trace/trace.md) | +| "Debug container issues" / "Container logs" | [troubleshoot skill](../troubleshoot/troubleshoot.md) | +| "Deploy or redeploy agent" | [deploy skill](../deploy/deploy.md) | diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/analyze-results.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/analyze-results.md new file mode 100644 index 00000000..e5f61f06 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/analyze-results.md @@ -0,0 +1,49 @@ +# Steps 3–5 — Download Results, Cluster Failures, Dive Into Category + +## Step 3 — Download Results + +`evaluation_get` returns run metadata but **not** full per-row output. Write a Python script (save to `scripts/`) to download detailed results: + +1. Initialize `AIProjectClient` with project endpoint and `DefaultAzureCredential` +2. Get OpenAI client via `project_client.get_openai_client()` +3. Call `openai_client.evals.runs.output_items.list(eval_id=..., run_id=...)` +4. Serialize each item with `item.model_dump()` and save to `results//.json` (use `default=str` for non-serializable fields) +5. Print summary: total items, passed, failed, errored counts + +> ⚠️ **Data structure gotcha:** Query/response data lives in `datasource_item.query` and `datasource_item['sample.output_text']`, **not** in `sample.input`/`sample.output` (which are empty arrays). Parse `datasource_item` fields when extracting queries and responses for analysis. + +> SDK setup: `pip install azure-ai-projects azure-identity openai` + +## Step 4 — Cluster Failures by Root Cause + +Analyze every row in the results. Group failures into clusters: + +| Cluster | Description | +|---------|-------------| +| Incorrect / hallucinated answer | Agent gave a wrong or fabricated response | +| Incomplete answer | Agent missed key parts | +| Tool call failure | Agent failed to invoke or misused a tool | +| Safety / content violation | Flagged by safety evaluators | +| Runtime error | Agent crashed or returned an error | +| Off-topic / refusal | Agent refused or went off-topic | + +Produce a **prioritized action table**: + +| Priority | Cluster | Suggested Action | +|----------|---------|------------------| +| P0 | Runtime errors | Check container logs | +| P1 | Incorrect answers | Optimize prompt ([Step 6](optimize-deploy.md)) | +| P2 | Incomplete answers | Optimize prompt ([Step 6](optimize-deploy.md)) | +| P3 | Tool call failures | Fix tool definitions or instructions | +| P4 | Safety violations | Add guardrails to instructions | +| P5 | Off-topic / refusal | Clarify scope in instructions | + +**Rule:** Runtime errors first (P0), then by count × severity. + +## Step 5 — Dive Into Category + +When the user wants to inspect a specific cluster, display the individual rows: input query, the agent's original response, evaluator scores, and failure reason. Let the user confirm which category to optimize. + +## Next Steps + +After clustering → proceed to [Step 6: Optimize Prompt](optimize-deploy.md). diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/cicd-monitoring.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/cicd-monitoring.md new file mode 100644 index 00000000..0fc85689 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/cicd-monitoring.md @@ -0,0 +1,35 @@ +# Step 11 — Enable CI/CD Evals & Continuous Monitoring + +After confirming the final agent version, prompt with two options: + +## Option 1 — CI/CD Evaluations + +*"Would you like to add automated evaluations to your CI/CD pipeline so every deployment is evaluated before going live?"* + +If yes, generate a GitHub Actions workflow (e.g., `.github/workflows/agent-eval.yml`) that: + +1. Triggers on push to `main` or on pull request +2. Reads evaluator definitions from `evaluators/` and test datasets from `datasets/` +3. Runs `evaluation_agent_batch_eval_create` against the newly deployed agent version +4. Fails the workflow if any evaluator score falls below configured thresholds +5. Posts a summary as a PR comment or workflow annotation + +Use repository secrets for `AZURE_AI_PROJECT_ENDPOINT` and Azure credentials. Confirm the workflow file with the user before committing. + +## Option 2 — Continuous Production Monitoring + +*"Would you like to set up continuous evaluations to monitor your agent's quality in production?"* + +If yes, generate a scheduled GitHub Actions workflow (e.g., `.github/workflows/agent-eval-scheduled.yml`) that: + +1. Runs on a cron schedule (ask user preference: daily, weekly, etc.) +2. Evaluates the current production agent version using stored evaluators and datasets +3. Saves results to `results/` +4. Opens a GitHub issue or sends a notification if any score degrades below thresholds + +The user may choose one, both, or neither. + +## Reference + +- [Azure AI Foundry Cloud Evaluation](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/cloud-evaluation) +- [Hosted Agents](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/hosted-agents) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/compare-iterate.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/compare-iterate.md new file mode 100644 index 00000000..42813830 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/compare-iterate.md @@ -0,0 +1,48 @@ +# Steps 8–10 — Re-Evaluate, Compare Versions, Iterate + +## Step 8 — Re-Evaluate + +Use **`evaluation_agent_batch_eval_create`** with the **same `evaluationId`** as the baseline run. This places both runs in the same eval group for comparison. Use the same local test dataset (from `datasets/`) and evaluators. Update `agentVersion` to the new version. + +Auto-poll for completion in a background terminal (same as [Step 2](evaluate-step.md)). + +## Step 9 — Compare Versions + +> **Critical:** `displayName` is **required** in the `insightRequest`. Despite the MCP tool schema showing `displayName` as optional (`type: ["string", "null"]`), the API will reject requests without it with a BadRequest error. `state` must be `"NotStarted"`. + +### Required Parameters for `evaluation_comparison_create` + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `insightRequest.displayName` | ✅ | Human-readable name. **Omitting causes BadRequest.** | +| `insightRequest.state` | ✅ | Must be `"NotStarted"` | +| `insightRequest.request.evalId` | ✅ | Eval group ID containing both runs | +| `insightRequest.request.baselineRunId` | ✅ | Run ID of the baseline | +| `insightRequest.request.treatmentRunIds` | ✅ | Array of treatment run IDs | + +Use **`evaluation_comparison_create`** with a nested `insightRequest`: + +```json +{ + "insightRequest": { + "displayName": "V1 vs V2 Comparison", + "state": "NotStarted", + "request": { + "type": "EvaluationComparison", + "evalId": "", + "baselineRunId": "", + "treatmentRunIds": [""] + } + } +} +``` + +> **Important:** Both runs must be in the **same eval group** (same `evaluationId` in Steps 2 and 8). + +Then use **`evaluation_comparison_get`** (with the returned `insightId`) to retrieve comparison results. Present a summary showing which version performed better per evaluator, and recommend which version to keep. + +## Step 10 — Iterate or Finish + +If more categories remain in the prioritized action table (from [Step 4](analyze-results.md)), loop back to **Step 5** (dive into next category) → **Step 6** (optimize) → **Step 7** (deploy) → **Step 8** (re-evaluate) → **Step 9** (compare). + +Otherwise, confirm the final agent version with the user, then prompt for [CI/CD evals & monitoring](cicd-monitoring.md). diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/deploy-and-setup.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/deploy-and-setup.md new file mode 100644 index 00000000..47b2cbac --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/deploy-and-setup.md @@ -0,0 +1,67 @@ +# Step 1 — Auto-Setup Evaluators & Dataset + +> **This step runs automatically after deployment.** If the agent was deployed via the [deploy skill](../../deploy/deploy.md), evaluators and a test dataset may already be configured. Check `evaluators/` and `datasets/` for existing artifacts before re-creating. +> +> If the agent is **not yet deployed**, follow the [deploy skill](../../deploy/deploy.md) first. It handles project detection, Dockerfile generation, ACR build, agent creation, container startup, **and** auto-creates evaluators & dataset after a successful deployment. + +## Auto-Create Evaluators & Dataset + +> **This step is fully automatic.** After deployment, immediately prepare evaluators and a local test dataset without waiting for the user to request it. + +### 1. Read Agent Instructions + +Use **`agent_get`** (or local `agent.yaml`) to understand the agent's purpose and capabilities. + +### 2. Select Evaluators + +Combine **built-in, custom, and safety evaluators**: + +| Category | Evaluators | +|----------|-----------| +| **Quality (built-in)** | intent_resolution, task_adherence, coherence, fluency, relevance | +| **Safety (include ≥2)** | violence, self_harm, hate_unfairness, sexual, indirect_attack | +| **Custom (create 1–2)** | Domain-specific via `evaluator_catalog_create` (see below) | + +### 3. Create Custom Evaluators + +Use **`evaluator_catalog_create`** with: + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `projectEndpoint` | ✅ | Azure AI Project endpoint | +| `name` | ✅ | e.g., `domain_accuracy`, `citation_quality` | +| `category` | ✅ | `quality`, `safety`, or `agents` | +| `scoringType` | ✅ | `ordinal`, `continuous`, or `boolean` | +| `promptText` | ✅* | Template with `{{query}}`, `{{response}}` placeholders | +| `minScore` / `maxScore` | | Default: 1 / 5 | +| `passThreshold` | | Scores ≥ this value pass | + +> **LLM-judge tip:** Include in the evaluator prompt: *"Do NOT penalize the response for mentioning dates or events beyond your training cutoff. The agent has real-time access."* + +### 4. Identify LLM-Judge Deployment + +Use **`model_deployment_get`** to find a suitable model (e.g., `gpt-4o`) for quality evaluators. + +### 5. Generate Local Test Dataset + +Use the identified LLM deployment to generate realistic test queries based on the agent's instructions and tool capabilities. Save to `datasets/-test.jsonl` with each line containing at minimum a `query` field (optionally `context`, `ground_truth`). + +### 6. Persist Artifacts + +``` +evaluators/ # custom evaluator definitions + .yaml # prompt text, scoring type, thresholds +datasets/ # locally generated input datasets + *.jsonl # test queries +results/ # evaluation run outputs (populated later) + / + .json +``` + +Save evaluator definitions to `evaluators/.yaml` and test data to `datasets/*.jsonl`. + +### 7. Prompt User + +*"Your agent is deployed and running. Evaluators and a local test dataset have been auto-configured. Would you like to run an evaluation to identify optimization opportunities?"* + +If yes → proceed to [Step 2: Evaluate](evaluate-step.md). If no → stop. diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/evaluate-step.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/evaluate-step.md new file mode 100644 index 00000000..23148083 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/evaluate-step.md @@ -0,0 +1,51 @@ +# Step 2 — Create Batch Evaluation + +## Prerequisites + +- Agent deployed and running +- Evaluators configured (from [Step 1](deploy-and-setup.md) or `evaluators/` folder) +- Local test dataset available (from `datasets/`) + +## Run Evaluation + +Use **`evaluation_agent_batch_eval_create`** to run evaluators against the agent. + +### Required Parameters + +| Parameter | Description | +|-----------|-------------| +| `projectEndpoint` | Azure AI Project endpoint | +| `agentName` | Agent name | +| `agentVersion` | Agent version (string, e.g. `"1"`) | +| `evaluatorNames` | Array of evaluator names (NOT `evaluators`) | + +### Test Data Options + +**Preferred — local dataset:** Read JSONL from `datasets/` and pass via `inputData` (array of objects with `query` and optionally `context`, `ground_truth`). Provides reproducibility, version control, and reviewability. Always use this when `datasets/` contains files. + +**Fallback only — server-side synthetic data:** Set `generateSyntheticData=true` AND provide `generationModelDeploymentName`. Only use when no local dataset exists and the user explicitly requests it. Optionally set `samplesCount` (default 50) and `generationPrompt` with the agent's instructions. + +### Additional Parameters + +| Parameter | When Needed | +|-----------|-------------| +| `deploymentName` | Required for quality evaluators (the LLM-judge model) | +| `evaluationId` | Pass existing eval group ID to group runs for comparison | +| `evaluationName` | Name for a new evaluation group | + +> **Important:** Use `evaluationId` (NOT `evalId`) to group runs. + +## Auto-Poll for Completion + +Immediately after creating the run, poll **`evaluation_get`** in a **background terminal** until completion. Use `evalId` + `isRequestForRuns=true`. The run ID parameter is `evalRunId` (NOT `runId`). + +Only surface the final result when status reaches `completed`, `failed`, or `cancelled`. + +## Next Steps + +When evaluation completes → proceed to [Step 3: Analyze Results](analyze-results.md). + +## Reference + +- [Azure AI Foundry Cloud Evaluation](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/cloud-evaluation) +- [Built-in Evaluators](https://learn.microsoft.com/en-us/azure/foundry/concepts/built-in-evaluators) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/optimize-deploy.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/optimize-deploy.md new file mode 100644 index 00000000..32d5c062 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/observe/references/optimize-deploy.md @@ -0,0 +1,32 @@ +# Steps 6–7 — Optimize Prompt & Deploy New Version + +## Step 6 — Optimize Prompt + +> ⛔ **Guardrail:** When optimizing after a dataset update, do NOT remove dataset rows or weaken evaluators to recover scores. Score drops on a harder dataset are expected — they mean test coverage improved, not that the agent regressed. Optimize for NEW failure patterns only. + +Use **`prompt_optimize`** with: + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `developerMessage` | ✅ | Agent's current system prompt / instructions | +| `deploymentName` | ✅ | Model for optimization (e.g., `gpt-4o-mini`) | +| `projectEndpoint` or `foundryAccountResourceId` | ✅ | At least one required | +| `requestedChanges` | | Concise improvement suggestions from cluster analysis | + +**Example `requestedChanges`:** *"Be more specific when answering geography questions"*, *"Always cite sources when providing factual claims"* + +> Use the optimized prompt returned by the tool. Do NOT manually rewrite. + +## Step 7 — Deploy New Version + +> **Always confirm before deploying.** Show the user a diff or summary of prompt changes and wait for explicit sign-off. + +After approval: + +1. Use **`agent_update`** to create a new agent version with the optimized prompt +2. Start the container with **`agent_container_control`** (action: `start`) +3. Poll **`agent_container_status_get`** in a **background terminal** until status is `Running` + +## Next Steps + +When the new version is running → proceed to [Step 8: Re-Evaluate](compare-iterate.md). diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/analyze-failures.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/analyze-failures.md new file mode 100644 index 00000000..fb04343e --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/analyze-failures.md @@ -0,0 +1,109 @@ +# Analyze Failures — Find and Cluster Failing Traces + +Identify failing agent traces, group them by root cause, and produce a prioritized action table. + +## Step 1 — Find Failing Traces + +> ⚠️ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To filter by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--failures) below. + +```kql +dependencies +| where timestamp > ago(24h) +| where success == false or toint(resultCode) >= 400 +| extend + operation = tostring(customDimensions["gen_ai.operation.name"]), + errorType = tostring(customDimensions["error.type"]), + model = tostring(customDimensions["gen_ai.request.model"]), + agentName = tostring(customDimensions["gen_ai.agent.name"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]) +| project timestamp, name, duration, resultCode, errorType, operation, model, + agentName, conversationId, operation_Id, id +| order by timestamp desc +| take 100 +``` + +## Step 2 — Cluster by Error Type + +```kql +dependencies +| where timestamp > ago(24h) +| where success == false or toint(resultCode) >= 400 +| extend + errorType = tostring(customDimensions["error.type"]), + operation = tostring(customDimensions["gen_ai.operation.name"]) +| summarize + count = count(), + firstSeen = min(timestamp), + lastSeen = max(timestamp), + avgDuration = avg(duration), + sampleOperationId = take_any(operation_Id) + by errorType, operation, resultCode +| order by count desc +``` + +## Step 3 — Prioritized Action Table + +Present results as: + +| Priority | Error Type | Operation | Count | Result Code | Suggested Action | +|----------|-----------|-----------|-------|-------------|-----------------| +| P0 | timeout | invoke_agent | 15 | 504 | Check agent container health, increase timeout | +| P1 | rate_limited | chat | 8 | 429 | Check quota, add retry logic | +| P2 | content_filter | chat | 5 | 400 | Review prompt for policy violations | +| P3 | tool_error | execute_tool | 3 | 500 | Check tool implementation and permissions | + +**Prioritization:** P0 = highest count or most severe (5xx), then by count × recency. + +## Step 4 — Drill Into Specific Failure + +When the user selects a cluster, show individual failing traces: + +```kql +dependencies +| where timestamp > ago(24h) +| where success == false +| where customDimensions["error.type"] == "" +| where customDimensions["gen_ai.operation.name"] == "" +| project timestamp, name, duration, resultCode, + conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + operation_Id +| order by timestamp desc +| take 20 +``` + +Also check `exceptions` table for stack traces: + +```kql +exceptions +| where timestamp > ago(24h) +| where operation_Id in ("", "") +| project timestamp, type, message, outerMessage, details, operation_Id +| order by timestamp desc +``` + +Offer to view the full conversation for any trace via [Conversation Detail](conversation-detail.md). + +## Hosted Agent Variant — Failures + +For hosted agents, the Foundry agent name lives on `requests`, not `dependencies`. Use a two-step join: + +```kql +let reqIds = requests +| where timestamp > ago(24h) +| where customDimensions["gen_ai.agent.name"] == "" +| distinct id; +dependencies +| where timestamp > ago(24h) +| where operation_ParentId in (reqIds) +| where success == false or toint(resultCode) >= 400 +| extend + operation = tostring(customDimensions["gen_ai.operation.name"]), + errorType = tostring(customDimensions["error.type"]), + model = tostring(customDimensions["gen_ai.request.model"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]) +| project timestamp, name, duration, resultCode, errorType, operation, model, + conversationId, operation_ParentId, operation_Id +| order by timestamp desc +| take 100 +``` diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/analyze-latency.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/analyze-latency.md new file mode 100644 index 00000000..a4bdbb56 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/analyze-latency.md @@ -0,0 +1,116 @@ +# Analyze Latency — Find and Diagnose Slow Traces + +Identify slow agent traces, find bottleneck spans, and correlate with token usage. + +## Step 1 — Find Slow Conversations + +> ⚠️ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To scope by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--latency) below. + +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.operation.name"] == "invoke_agent" +| project timestamp, duration, success, + agentName = tostring(customDimensions["gen_ai.agent.name"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + operation_Id +| summarize + totalDuration = sum(duration), + spanCount = count(), + hasErrors = countif(success == false) > 0 + by conversationId, operation_Id +| where totalDuration > 5000 +| order by totalDuration desc +| take 50 +``` + +> **Default threshold:** 5 seconds. Ask the user for their latency threshold if not specified. + +## Step 2 — Latency Distribution (P50/P95/P99) + +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent") +| summarize + p50 = percentile(duration, 50), + p95 = percentile(duration, 95), + p99 = percentile(duration, 99), + avg = avg(duration), + count = count() + by operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]) +| order by p95 desc +``` + +Present as: + +| Operation | Model | P50 (ms) | P95 (ms) | P99 (ms) | Avg (ms) | Count | +|-----------|-------|---------|---------|---------|---------|-------| + +## Step 3 — Bottleneck Breakdown + +For a specific slow conversation, break down time spent per span type: + +```kql +dependencies +| where operation_Id == "" +| extend operation = tostring(customDimensions["gen_ai.operation.name"]) +| summarize + totalDuration = sum(duration), + spanCount = count(), + avgDuration = avg(duration) + by operation, name +| order by totalDuration desc +``` + +Common bottleneck patterns: +- **`chat` spans dominate** → LLM inference is slow (consider smaller model or caching) +- **`execute_tool` spans dominate** → Tool execution is slow (optimize tool implementation) +- **`invoke_agent` has long gaps** → Orchestration overhead (check agent framework) + +## Step 4 — Token Usage vs Latency Correlation + +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.operation.name"] == "chat" +| extend + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]) +| where isnotempty(inputTokens) +| project duration, inputTokens, outputTokens, + model = tostring(customDimensions["gen_ai.request.model"]), + operation_Id +| order by duration desc +| take 100 +``` + +High token counts often correlate with high latency. If confirmed, suggest: +- Reduce system prompt length +- Limit conversation history window +- Use a faster model for simpler queries + +## Hosted Agent Variant — Latency + +For hosted agents, scope by Foundry agent name via `requests` then join to `dependencies`: + +```kql +let reqIds = requests +| where timestamp > ago(24h) +| where customDimensions["gen_ai.agent.name"] == "" +| distinct id; +dependencies +| where timestamp > ago(24h) +| where operation_ParentId in (reqIds) +| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent") +| summarize + p50 = percentile(duration, 50), + p95 = percentile(duration, 95), + p99 = percentile(duration, 99), + avg = avg(duration), + count = count() + by operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]) +| order by p95 desc +``` diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/conversation-detail.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/conversation-detail.md new file mode 100644 index 00000000..42aa4b5a --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/conversation-detail.md @@ -0,0 +1,98 @@ +# Conversation Detail — Reconstruct Full Span Tree + +Reconstruct the complete span tree for a single conversation to see exactly what happened: every LLM call, tool execution, and agent invocation with timing, tokens, and errors. + +## Step 1 — Fetch All Spans for a Conversation + +Use `operation_Id` (trace ID) to get all spans in a single request: + +```kql +dependencies +| where operation_Id == "" +| project timestamp, name, duration, resultCode, success, + spanId = id, + parentSpanId = operation_ParentId, + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + responseModel = tostring(customDimensions["gen_ai.response.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + finishReason = tostring(customDimensions["gen_ai.response.finish_reasons"]), + errorType = tostring(customDimensions["error.type"]), + toolName = tostring(customDimensions["gen_ai.tool.name"]), + toolCallId = tostring(customDimensions["gen_ai.tool.call.id"]) +| order by timestamp asc +``` + +Also fetch the parent request: + +```kql +requests +| where operation_Id == "" +| project timestamp, name, duration, resultCode, success, id, operation_ParentId +``` + +## Step 2 — Build Span Tree + +Use `spanId` and `parentSpanId` to reconstruct the hierarchy: + +``` +invoke_agent (root) ─── 4200ms +├── chat (LLM call #1) ─── 1800ms, gpt-4o, 450→120 tokens +│ └── [output: "Let me check the weather..."] +├── execute_tool (get_weather) [tool: remote_functions.weather_api] ─── 200ms +│ └── [result: "rainy, 57°F"] +├── chat (LLM call #2) ─── 1500ms, gpt-4o, 620→85 tokens +│ └── [output: "The weather in Paris is rainy, 57°F"] +└── [total: 450+620=1070 input, 120+85=205 output tokens] +``` + +Present as an indented tree with: +- **Operation type** and name +- **Duration** (highlight if > P95 for that operation type) +- **Model** and token counts (for chat operations) +- **Error type** and result code (if failed, highlight in red) +- **Finish reason** (stop, length, content_filter, tool_calls) + +## Step 3 — Extract Conversation Content from invoke_agent Spans + +The full input/output content lives on `invoke_agent` dependency spans in `gen_ai.input.messages` and `gen_ai.output.messages`. These JSON arrays contain the complete conversation (system prompt, user query, assistant response): + +```kql +dependencies +| where operation_Id == "" +| where customDimensions["gen_ai.operation.name"] == "invoke_agent" +| project timestamp, + inputMessages = tostring(customDimensions["gen_ai.input.messages"]), + outputMessages = tostring(customDimensions["gen_ai.output.messages"]) +| order by timestamp asc +``` + +Message structure: `[{"role": "user", "parts": [{"type": "text", "content": "..."}]}]` + +Also check the `traces` table for additional GenAI log events: + +```kql +traces +| where operation_Id == "" +| where message contains "gen_ai" +| project timestamp, message, customDimensions +| order by timestamp asc +``` + +## Step 4 — Check for Exceptions + +```kql +exceptions +| where operation_Id == "" +| project timestamp, type, message, outerMessage, + details = parse_json(details) +| order by timestamp asc +``` + +Present exceptions inline in the span tree at their position in the timeline. + +## Step 5 — Fetch Evaluation Results + +See [Eval Correlation](eval-correlation.md) for the full workflow to look up evaluation scores by response ID or conversation ID. Use `gen_ai.response.id` values from Step 1 spans to correlate. diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/eval-correlation.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/eval-correlation.md new file mode 100644 index 00000000..ef2430cc --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/eval-correlation.md @@ -0,0 +1,57 @@ +# Eval Correlation — Find Evaluation Results by Response or Conversation ID + +Look up evaluation scores for a specific agent response using App Insights. + +> **IMPORTANT:** The Foundry evaluation API does NOT support querying by response ID or conversation ID. App Insights `customEvents` is the ONLY way to correlate eval scores to specific responses. Always use this KQL approach when the user asks for eval results for a specific response or conversation. + +## Prerequisites + +- App Insights resource resolved (see [trace.md](../trace.md) Before Starting) +- A response ID (`gen_ai.response.id`) or conversation ID (`gen_ai.conversation.id`) from a previous trace query + +## Search by Response ID + +```kql +customEvents +| where timestamp > ago(30d) +| where name == "gen_ai.evaluation.result" +| where customDimensions["gen_ai.response.id"] == "" +| extend + evalName = tostring(customDimensions["gen_ai.evaluation.name"]), + score = todouble(customDimensions["gen_ai.evaluation.score.value"]), + label = tostring(customDimensions["gen_ai.evaluation.score.label"]), + explanation = tostring(customDimensions["gen_ai.evaluation.explanation"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]) +| project timestamp, evalName, score, label, explanation, responseId, conversationId +| order by evalName asc +``` + +## Search by Conversation ID + +```kql +customEvents +| where timestamp > ago(30d) +| where name == "gen_ai.evaluation.result" +| where customDimensions["gen_ai.conversation.id"] == "" +| extend + evalName = tostring(customDimensions["gen_ai.evaluation.name"]), + score = todouble(customDimensions["gen_ai.evaluation.score.value"]), + label = tostring(customDimensions["gen_ai.evaluation.score.label"]), + explanation = tostring(customDimensions["gen_ai.evaluation.explanation"]), + responseId = tostring(customDimensions["gen_ai.response.id"]) +| project timestamp, evalName, score, label, explanation, responseId +| order by responseId asc, evalName asc +``` + +## Present Results + +Show eval scores as a table: + +| Evaluator | Score | Label | Explanation | +|-----------|-------|-------|-------------| +| coherence | 5.0 | pass | Response is well-structured... | +| fluency | 4.0 | pass | Natural language flow... | +| relevance | 2.0 | fail | Response doesn't address... | + +When showing alongside a span tree (see [Conversation Detail](conversation-detail.md)), attach eval scores to the span whose `gen_ai.response.id` matches. diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/kql-templates.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/kql-templates.md new file mode 100644 index 00000000..9dc9a737 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/kql-templates.md @@ -0,0 +1,203 @@ +# KQL Templates — GenAI Trace Query Reference + +Ready-to-use KQL templates for querying GenAI OpenTelemetry traces in Application Insights. + +**Table of Contents:** [App Insights Table Mapping](#app-insights-table-mapping) · [Key GenAI OTel Attributes](#key-genai-otel-attributes) · [Span Correlation](#span-correlation) · [Hosted Agent Attributes](#hosted-agent-attributes) · [Response ID Formats](#response-id-formats) · [Common Query Templates](#common-query-templates) · [OTel Reference Links](#otel-reference-links) + +## App Insights Table Mapping + +| App Insights Table | GenAI Data | +|-------------------|------------| +| `dependencies` | GenAI spans: LLM inference (`chat`), tool execution (`execute_tool`), agent invocation (`invoke_agent`) | +| `requests` | Incoming HTTP requests to the agent endpoint. For hosted agents, also carries `gen_ai.agent.name` (Foundry name) and `azure.ai.agentserver.*` attributes — **preferred entry point** for agent-name filtering | +| `customEvents` | GenAI evaluation results (`gen_ai.evaluation.result`) — scores, labels, explanations | +| `traces` | Log events, including GenAI events (input/output messages) | +| `exceptions` | Error details with stack traces | + +## Key GenAI OTel Attributes + +Stored in `customDimensions` on `dependencies` spans: + +| Attribute | Description | Example | +|-----------|-------------|---------| +| `gen_ai.operation.name` | Operation type | `chat`, `invoke_agent`, `execute_tool`, `create_agent` | +| `gen_ai.conversation.id` | Conversation/session ID | `conv_5j66UpCpwteGg4YSxUnt7lPY` | +| `gen_ai.response.id` | Response ID | `chatcmpl-123` | +| `gen_ai.agent.name` | Agent name | `my-support-agent` | +| `gen_ai.agent.id` | Agent unique ID | `asst_abc123` | +| `gen_ai.request.model` | Requested model | `gpt-4o` | +| `gen_ai.response.model` | Actual model used | `gpt-4o-2024-05-13` | +| `gen_ai.usage.input_tokens` | Input token count | `450` | +| `gen_ai.usage.output_tokens` | Output token count | `120` | +| `gen_ai.response.finish_reasons` | Stop reasons | `["stop"]`, `["tool_calls"]` | +| `error.type` | Error classification | `timeout`, `rate_limited`, `content_filter` | +| `gen_ai.provider.name` | Provider | `azure.ai.openai`, `openai` | +| `gen_ai.input.messages` | Full input messages (JSON array) — on `invoke_agent` spans | `[{"role":"user","parts":[{"type":"text","content":"..."}]}]` | +| `gen_ai.output.messages` | Full output messages (JSON array) — on `invoke_agent` spans | `[{"role":"assistant","parts":[{"type":"text","content":"..."}]}]` | + +Stored in `customDimensions` on `customEvents` (name == `gen_ai.evaluation.result`): + +| Attribute | Description | Example | +|-----------|-------------|---------| +| `gen_ai.evaluation.name` | Evaluator name | `Relevance`, `IntentResolution` | +| `gen_ai.evaluation.score.value` | Numeric score | `4.0` | +| `gen_ai.evaluation.score.label` | Human-readable label | `pass`, `fail`, `relevant` | +| `gen_ai.evaluation.explanation` | Free-form explanation | `"Response lacks detail..."` | +| `gen_ai.response.id` | Correlates to the evaluated span | `chatcmpl-123` | +| `gen_ai.conversation.id` | Correlates to conversation | `conv_5j66...` | + +> **Correlation:** Eval results do NOT link via id-parentId. Use `gen_ai.conversation.id` and/or `gen_ai.response.id` to join with `dependencies` spans. + +## Span Correlation + +| Field | Purpose | +|-------|---------| +| `operation_Id` | Trace ID — groups all spans in one request | +| `id` | Span ID — unique identifier for this span | +| `operation_ParentId` | Parent span ID — use with `id` to build span trees | + +### Parent-Child Join (requests → dependencies) + +Use `operation_ParentId` to find child dependency spans from a parent request. This is critical for hosted agents where the Foundry agent name only lives on the parent `requests` span: + +```kql +let reqIds = requests +| where timestamp > ago(7d) +| where customDimensions["gen_ai.agent.name"] == "" +| distinct id; +dependencies +| where timestamp > ago(7d) +| where operation_ParentId in (reqIds) +| extend + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]) +| project timestamp, duration, success, operation, model, conversationId, operation_ParentId +| order by timestamp desc +``` + +## Hosted Agent Attributes + +Stored in `customDimensions` on **both `requests` and `traces`** tables (NOT on `dependencies` spans): + +| Attribute | Description | Example | +|-----------|-------------|---------| +| `azure.ai.agentserver.agent_name` | Hosted agent name | `hosted-agent-022-001` | +| `azure.ai.agentserver.agent_id` | Internal agent ID | `code-asst-xmwokux85uqc7fodxejaxa` | +| `azure.ai.agentserver.conversation_id` | Conversation ID | `conv_d7ab624de92d...` | +| `azure.ai.agentserver.response_id` | Response ID (caresp format) | `caresp_d7ab624de92d...` | + +> **Important:** Use `requests` as the preferred entry point for agent-name filtering — it has both `azure.ai.agentserver.agent_name` and `gen_ai.agent.name` with the Foundry-level name. To reach child `dependencies` spans, join via `requests.id` → `dependencies.operation_ParentId`. + +> ⚠️ **`gen_ai.agent.name` means different things on different tables:** +> - On `requests`: the **Foundry agent name** (user-visible) → e.g., `hosted-agent-022-001` +> - On `dependencies`: the **code-level class name** → e.g., `BingSearchAgent` +> +> **Always start from `requests`** when filtering by the Foundry agent name the user knows. + +## Response ID Formats + +| Agent Type | Prefix | Example | +|------------|--------|---------| +| Hosted agent (AgentServer) | `caresp_` | `caresp_d7ab624de92da637008Rhr4U4E1y9FSE...` | +| Prompt agent (Foundry Responses API) | `resp_` | `resp_4e2f8b016b5a0dad00697bd3c4c1b881...` | +| Azure OpenAI chat completions | `chatcmpl-` | `chatcmpl-abc123def456` | + +When searching by response ID, use the appropriate prefix to narrow results. The `gen_ai.response.id` attribute appears on `dependencies` spans (for `chat` operations) and in `customEvents` (for evaluation results). + +## Common Query Templates + +### Overview — Conversations in last 24h +```kql +dependencies +| where timestamp > ago(24h) +| where isnotempty(customDimensions["gen_ai.operation.name"]) +| summarize + spanCount = count(), + errorCount = countif(success == false), + avgDuration = avg(duration), + totalInputTokens = sum(toint(customDimensions["gen_ai.usage.input_tokens"])), + totalOutputTokens = sum(toint(customDimensions["gen_ai.usage.output_tokens"])) + by bin(timestamp, 1h) +| order by timestamp desc +``` + +### Error Rate by Operation +```kql +dependencies +| where timestamp > ago(24h) +| where isnotempty(customDimensions["gen_ai.operation.name"]) +| summarize + total = count(), + errors = countif(success == false), + errorRate = round(100.0 * countif(success == false) / count(), 1) + by operation = tostring(customDimensions["gen_ai.operation.name"]) +| order by errorRate desc +``` + +### Token Usage by Model +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.operation.name"] == "chat" +| summarize + calls = count(), + totalInput = sum(toint(customDimensions["gen_ai.usage.input_tokens"])), + totalOutput = sum(toint(customDimensions["gen_ai.usage.output_tokens"])), + avgInput = avg(todouble(customDimensions["gen_ai.usage.input_tokens"])), + avgOutput = avg(todouble(customDimensions["gen_ai.usage.output_tokens"])) + by model = tostring(customDimensions["gen_ai.request.model"]) +| order by totalInput desc +``` + +### Tool Call Details +```kql +dependencies +| where operation_Id == "" +| where customDimensions["gen_ai.operation.name"] == "execute_tool" +| project timestamp, duration, success, + toolName = tostring(customDimensions["gen_ai.tool.name"]), + toolType = tostring(customDimensions["gen_ai.tool.type"]), + toolCallId = tostring(customDimensions["gen_ai.tool.call.id"]), + toolArgs = tostring(customDimensions["gen_ai.tool.call.arguments"]), + toolResult = tostring(customDimensions["gen_ai.tool.call.result"]) +| order by timestamp asc +``` + +Key tool attributes: + +| Attribute | Description | Example | +|-----------|-------------|---------| +| `gen_ai.tool.name` | Tool function name | `remote_functions.bing_grounding`, `python` | +| `gen_ai.tool.type` | Tool type | `extension`, `function` | +| `gen_ai.tool.call.id` | Unique call ID | `call_db64aa6a004a...` | +| `gen_ai.tool.call.arguments` | JSON arguments passed | `{"query": "latest AI news"}` | +| `gen_ai.tool.call.result` | Tool output (may be truncated) | `<>` | + +### Evaluation Results by Conversation +```kql +customEvents +| where timestamp > ago(24h) +| where name == "gen_ai.evaluation.result" +| extend + evalName = tostring(customDimensions["gen_ai.evaluation.name"]), + score = todouble(customDimensions["gen_ai.evaluation.score.value"]), + label = tostring(customDimensions["gen_ai.evaluation.score.label"]), + conversationId = tostring(customDimensions["gen_ai.conversation.id"]) +| summarize + evalCount = count(), + avgScore = avg(score), + failCount = countif(label == "fail" or label == "not_relevant" or label == "incorrect"), + evaluators = make_set(evalName) + by conversationId +| order by failCount desc +``` + +> For detailed eval queries by response ID or conversation ID, see [Eval Correlation](eval-correlation.md). + +## OTel Reference Links + +- [GenAI Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/) +- [GenAI Agent Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/) +- [GenAI Events](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/) +- [GenAI Metrics](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-metrics/) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/search-traces.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/search-traces.md new file mode 100644 index 00000000..a663035e --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/references/search-traces.md @@ -0,0 +1,141 @@ +# Search Traces — Conversation-Level Search + +Search agent traces at the conversation level. Returns summaries grouped by conversation or operation, not individual spans. + +## Prerequisites + +- App Insights resource resolved (see [trace.md](../trace.md) Before Starting) +- Time range confirmed with user (default: last 24 hours) + +## Search by Conversation ID + +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.conversation.id"] == "" +| project timestamp, name, duration, resultCode, success, + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]), + operation_Id, id, operation_ParentId +| order by timestamp asc +``` + +## Search by Response ID + +Auto-detect the response ID format to determine agent type: +- `caresp_...` → Hosted agent (AgentServer) +- `resp_...` → Prompt agent (Foundry Responses API) +- `chatcmpl-...` → Azure OpenAI chat completions + +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.response.id"] == "" +| project timestamp, name, duration, resultCode, success, + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]), + operation_Id, id, operation_ParentId +``` + +Then drill into the full conversation: + +> ⚠️ **STOP — read [Conversation Detail](conversation-detail.md) before writing your own drill-down query.** It contains the correct span tree reconstruction logic, event/exception queries, and eval correlation steps. + +Quick drill-down using the `operation_Id` from above: + +```kql +dependencies +| where operation_Id == "" +| project timestamp, name, duration, resultCode, success, + spanId = id, parentSpanId = operation_ParentId, + operation = tostring(customDimensions["gen_ai.operation.name"]), + model = tostring(customDimensions["gen_ai.request.model"]), + inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]), + outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]), + responseId = tostring(customDimensions["gen_ai.response.id"]), + errorType = tostring(customDimensions["error.type"]), + toolName = tostring(customDimensions["gen_ai.tool.name"]) +| order by timestamp asc +``` + +Also check for eval results: see [Eval Correlation](eval-correlation.md). + +## Search by Agent Name + +> **Note:** For hosted agents, `gen_ai.agent.name` in `dependencies` refers to *sub-agents* (e.g., `BingSearchAgent`), not the top-level hosted agent. See "Search by Hosted Agent Name" below. + +```kql +dependencies +| where timestamp > ago(24h) +| where customDimensions["gen_ai.agent.name"] == "" + or customDimensions["gen_ai.agent.id"] == "" +| summarize + startTime = min(timestamp), + endTime = max(timestamp), + totalDuration = max(timestamp) - min(timestamp), + spanCount = count(), + errorCount = countif(success == false), + totalInputTokens = sum(toint(customDimensions["gen_ai.usage.input_tokens"])), + totalOutputTokens = sum(toint(customDimensions["gen_ai.usage.output_tokens"])) + by conversationId = tostring(customDimensions["gen_ai.conversation.id"]), + operation_Id +| order by startTime desc +| take 50 +``` + +## Search by Hosted Agent Name + +For hosted agents, the Foundry agent name (e.g., `hosted-agent-022-001`) appears on both `requests` and `traces` tables — NOT on `dependencies`. Use `requests` as the preferred entry point since it also has `gen_ai.agent.name`: + +```kql +let reqIds = requests +| where timestamp > ago(24h) +| where customDimensions["gen_ai.agent.name"] == "" +| distinct id; +dependencies +| where timestamp > ago(24h) +| where operation_ParentId in (reqIds) +| where isnotempty(customDimensions["gen_ai.operation.name"]) +| summarize + startTime = min(timestamp), + endTime = max(timestamp), + spanCount = count(), + errorCount = countif(success == false), + totalInputTokens = sum(toint(customDimensions["gen_ai.usage.input_tokens"])), + totalOutputTokens = sum(toint(customDimensions["gen_ai.usage.output_tokens"])) + by operation_ParentId +| order by startTime desc +| take 50 +``` + +## Conversation Summary Table + +Present results in this format: + +| Conversation ID | Start Time | Duration | Spans | Errors | Input Tokens | Output Tokens | +|----------------|------------|----------|-------|--------|-------------|---------------| +| conv_abc123 | 2025-01-15 10:30 | 4.2s | 12 | 0 | 850 | 320 | +| conv_def456 | 2025-01-15 10:25 | 8.7s | 18 | 2 | 1200 | 450 | + +Highlight rows with errors in the summary. Offer to drill into any conversation via [Conversation Detail](conversation-detail.md). + +## Free-Text Search + +When the user provides a general search term (e.g., agent name, error message): + +```kql +union dependencies, requests, exceptions, traces +| where timestamp > ago(24h) +| where * contains "" +| summarize count() by operation_Id +| order by count_ desc +| take 20 +``` + +## After Successful Query + +> 📝 **Reminder:** If this is the first trace query in this session, ensure App Insights connection info was persisted to `.env` (see [trace.md — Before Starting](../trace.md#before-starting--resolve-app-insights-connection)). diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/trace.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/trace.md new file mode 100644 index 00000000..271cb84b --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/trace/trace.md @@ -0,0 +1,60 @@ +# Foundry Agent Trace Analysis + +Analyze production traces for Foundry agents using Application Insights and GenAI OpenTelemetry semantic conventions. This skill provides **structured KQL-powered workflows** for searching conversations, diagnosing failures, and identifying latency bottlenecks. Use this skill instead of writing ad-hoc KQL queries against App Insights manually. + +## When to Use This Skill + +USE FOR: analyze agent traces, search agent conversations, find failing traces, slow traces, latency analysis, trace search, conversation history, agent errors in production, debug agent responses, App Insights traces, GenAI telemetry, trace correlation, span tree, production trace analysis, evaluation results, evaluation scores, eval run results, find by response ID, get agent trace by conversation ID, agent evaluation scores from App Insights. + +> **USE THIS SKILL INSTEAD OF** `azure-monitor` or `azure-applicationinsights` when querying Foundry agent traces, evaluations, or GenAI telemetry. This skill has correct GenAI OTel attribute mappings and tested KQL templates that those general tools lack. + +> ⚠️ **DO NOT manually write KQL queries** for GenAI trace analysis **without reading this skill first.** This skill provides tested query templates with correct GenAI OTel attribute mappings, proper span correlation logic, and conversation-level aggregation patterns. + +## Quick Reference + +| Property | Value | +|----------|-------| +| Data source | Application Insights (App Insights) | +| Query language | KQL (Kusto Query Language) | +| Related skills | `troubleshoot` (container logs) | +| Preferred query tool | `monitor_resource_log_query` (Azure MCP) — use for App Insights KQL queries | +| OTel conventions | [GenAI Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/), [Agent Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/) | + +## Entry Points + +| User Intent | Start At | +|-------------|----------| +| "Search agent conversations" / "Find traces" | [Search Traces](references/search-traces.md) | +| "Tell me about response ID X" / "Look up response ID" | [Search Traces — Search by Response ID](references/search-traces.md#search-by-response-id) | +| "Why is my agent failing?" / "Find errors" | [Analyze Failures](references/analyze-failures.md) | +| "My agent is slow" / "Latency analysis" | [Analyze Latency](references/analyze-latency.md) | +| "Show me this conversation" / "Trace detail" | [Conversation Detail](references/conversation-detail.md) | +| "Find eval results for response ID" / "eval scores from traces" | [Eval Correlation](references/eval-correlation.md) | +| "What KQL do I need?" | [KQL Templates](references/kql-templates.md) | + +## Before Starting — Resolve App Insights Connection + +1. Check `.env` (or the same config file hosting other project variables) for `APPLICATIONINSIGHTS_CONNECTION_STRING` or `AZURE_APPINSIGHTS_RESOURCE_ID` +2. If not found, use `project_connection_list` (foundry-mcp tool) to discover App Insights linked to the Foundry project — this is the most reliable way to find the correct App Insights resource. Filter results for Application Insights connection type. +3. **IMMEDIATELY write back to `.env`** — as soon as `project_connection_list` returns App Insights info, write it to `.env` (or the same config file where `AZURE_AI_PROJECT_ENDPOINT` etc. live) BEFORE running any queries. Do not defer this step. This ensures future sessions skip discovery entirely. + +| Variable | Purpose | Example | +|----------|---------|---------| +| `APPLICATIONINSIGHTS_CONNECTION_STRING` | App Insights connection string | `InstrumentationKey=...;IngestionEndpoint=...` | +| `AZURE_APPINSIGHTS_RESOURCE_ID` | ARM resource ID | `/subscriptions/.../Microsoft.Insights/components/...` | + +If a `.env` file already exists, read it first and merge — do not overwrite existing values without confirmation. + +4. Confirm the App Insights resource with the user before querying +5. Use **`monitor_resource_log_query`** (Azure MCP tool) to execute KQL queries against the App Insights resource. This is preferred over delegating to the `azure-kusto` skill. Pass the App Insights resource ID and the KQL query directly. + +> ⚠️ **Always pass `subscription` explicitly** to Azure MCP tools like `monitor_resource_log_query` — they don't extract it from resource IDs. + +## Behavioral Rules + +1. **ALWAYS display the KQL query.** Before executing ANY KQL query, display it in a code block. Never run a query silently. This is a hard requirement, not a suggestion. Showing queries builds trust and helps users learn KQL patterns. +2. **Start broad, then narrow.** Begin with conversation-level summaries, then drill into specific conversations or spans on user request. +3. **Use time ranges.** Always scope queries with a time range (default: last 24 hours). Ask user for the range if not specified. +4. **Explain GenAI attributes.** When displaying results, translate OTel attribute names to human-readable labels (e.g., `gen_ai.operation.name` → "Operation"). +5. **Link to conversation detail.** When showing search or failure results, offer to drill into any specific conversation. +6. **Scope to the target agent.** An App Insights resource may contain traces from multiple agents. For hosted agents, start from the `requests` table where `gen_ai.agent.name` holds the Foundry-level name, then join to `dependencies` via `operation_ParentId`. For prompt agents, filter `dependencies` directly by `gen_ai.agent.name`. When showing overview summaries, group by agent and warn the user if multiple agents are present. diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/agent/troubleshoot/troubleshoot.md b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/troubleshoot/troubleshoot.md new file mode 100644 index 00000000..f79762ea --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/agent/troubleshoot/troubleshoot.md @@ -0,0 +1,96 @@ +# Foundry Agent Troubleshoot + +Troubleshoot and debug Foundry agents by collecting container logs, discovering observability connections, and querying Application Insights telemetry. + +## Quick Reference + +| Property | Value | +|----------|-------| +| Agent types | Prompt (LLM-based), Hosted (container-based) | +| MCP servers | `foundry-mcp` | +| Key MCP tools | `agent_get`, `agent_container_status_get` | +| Related skills | `trace` (telemetry analysis) | +| Preferred query tool | `monitor_resource_log_query` (Azure MCP) — preferred over `azure-kusto` for App Insights | +| CLI references | `az cognitiveservices agent logs`, `az cognitiveservices account connection` | + +## When to Use This Skill + +- Agent is not responding or returning errors +- Hosted agent container is failing to start +- Need to view container logs for a hosted agent +- Diagnose latency or timeout issues +- Query Application Insights for agent traces and exceptions +- Investigate agent runtime failures + +## MCP Tools + +| Tool | Description | Parameters | +|------|-------------|------------| +| `agent_get` | Get agent details to determine type (prompt/hosted) | `projectEndpoint` (required), `agentName` (optional) | +| `agent_container_status_get` | Check hosted agent container status | `projectEndpoint`, `agentName` (required); `agentVersion` | + +## Workflow + +### Step 1: Collect Agent Information + +Use the project endpoint and agent name from the project context (see Common: Project Context Resolution). Ask the user only for values not already resolved: +- **Project endpoint** — AI Foundry project endpoint URL +- **Agent name** — Name of the agent to troubleshoot + +### Step 2: Determine Agent Type + +Use `agent_get` with `projectEndpoint` and `agentName` to retrieve the agent definition. Check the `kind` field: +- `"hosted"` → Proceed to Step 3 (Container Logs) +- `"prompt"` → Skip to Step 4 (Discover Observability Connections) + +### Step 3: Retrieve Container Logs (Hosted Agents Only) + +First check the container status using `agent_container_status_get`. Report the current status to the user. + +Retrieve container logs using the Azure CLI command documented at: +[az cognitiveservices agent logs show](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/agent/logs?view=azure-cli-latest#az-cognitiveservices-agent-logs-show) + +Refer to the documentation above for the exact command syntax and parameters. Present the logs to the user and highlight any errors or warnings found. + +### Step 4: Discover Observability Connections + +List the project connections to find Application Insights or Azure Monitor resources using the Azure CLI command documented at: +[az cognitiveservices account connection](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/account/connection?view=azure-cli-latest) + +Refer to the documentation above for the exact command syntax and parameters. Look for connections of type `ApplicationInsights` or `AzureMonitor` in the output. + +If no observability connection is found, inform the user and suggest setting up Application Insights for the project. Ask if they want to proceed without telemetry data. + +### Step 5: Query Application Insights Telemetry + +Use **`monitor_resource_log_query`** (Azure MCP tool) to run KQL queries against the Application Insights resource discovered in Step 4. This is preferred over delegating to the `azure-kusto` skill. Pass the App Insights resource ID and the KQL query directly. + +> ⚠️ **Always pass `subscription` explicitly** to Azure MCP tools like `monitor_resource_log_query` — they don't extract it from resource IDs. + +Use `* contains ""` or `* contains ""` filters to narrow down results to the specific agent instance. + +### Step 6: Summarize Findings + +Present a summary to the user including: +- **Agent type and status** — hosted/prompt, container status (if hosted) +- **Container log errors** — key errors from logs (hosted only) +- **Telemetry insights** — exceptions, failed requests, latency trends +- **Recommended actions** — specific steps to resolve identified issues + +## Error Handling + +| Error | Cause | Resolution | +|-------|-------|------------| +| Agent not found | Invalid agent name or project endpoint | Use `agent_get` to list available agents and verify name | +| Container logs unavailable | Agent is a prompt agent or container never started | Prompt agents don't have container logs — skip to telemetry | +| No observability connection | Application Insights not configured for the project | Suggest configuring Application Insights for the Foundry project | +| Kusto query failed | Invalid cluster/database or insufficient permissions | Verify Application Insights resource details and reader permissions | +| No telemetry data | Agent not instrumented or too recent | Check if Application Insights SDK is configured; data may take a few minutes to appear | + +## Additional Resources + +- [Foundry Hosted Agents](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry) +- [Agent Logs CLI Reference](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/agent/logs?view=azure-cli-latest) +- [Account Connection CLI Reference](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/account/connection?view=azure-cli-latest) +- [KQL Quick Reference](https://learn.microsoft.com/azure/data-explorer/kusto/query/kql-quick-reference) +- [Foundry Samples](https://github.com/azure-ai-foundry/foundry-samples) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/SKILL.md index 80867451..46b9f01e 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/SKILL.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/SKILL.md @@ -1,9 +1,10 @@ --- name: deploy-model -description: | - Unified Azure OpenAI model deployment skill with intelligent intent-based routing. Handles quick preset deployments, fully customized deployments (version/SKU/capacity/RAI policy), and capacity discovery across regions and projects. - USE FOR: deploy model, deploy gpt, create deployment, model deployment, deploy openai model, set up model, provision model, find capacity, check model availability, where can I deploy, best region for model, capacity analysis. - DO NOT USE FOR: listing existing deployments (use foundry_models_deployments_list MCP tool), deleting deployments, agent creation (use agent/create), project creation (use project/create). +description: "Unified Azure OpenAI model deployment skill with intelligent intent-based routing. Handles quick preset deployments, fully customized deployments (version/SKU/capacity/RAI policy), and capacity discovery across regions and projects. USE FOR: deploy model, deploy gpt, create deployment, model deployment, deploy openai model, set up model, provision model, find capacity, check model availability, where can I deploy, best region for model, capacity analysis. DO NOT USE FOR: listing existing deployments (use foundry_models_deployments_list MCP tool), deleting deployments, agent creation (use agent/create), project creation (use project/create)." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" --- # Deploy Model diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/capacity/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/capacity/SKILL.md index d7758fc3..46935315 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/capacity/SKILL.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/capacity/SKILL.md @@ -1,9 +1,10 @@ --- name: capacity -description: | - Discovers available Azure OpenAI model capacity across regions and projects. Analyzes quota limits, compares availability, and recommends optimal deployment locations based on capacity requirements. - USE FOR: find capacity, check quota, where can I deploy, capacity discovery, best region for capacity, multi-project capacity search, quota analysis, model availability, region comparison, check TPM availability. - DO NOT USE FOR: actual deployment (hand off to preset or customize after discovery), quota increase requests (direct user to Azure Portal), listing existing deployments. +description: "Discovers available Azure OpenAI model capacity across regions and projects. Analyzes quota limits, compares availability, and recommends optimal deployment locations based on capacity requirements. USE FOR: find capacity, check quota, where can I deploy, capacity discovery, best region for capacity, multi-project capacity search, quota analysis, model availability, region comparison, check TPM availability. DO NOT USE FOR: actual deployment (hand off to preset or customize after discovery), quota increase requests (direct user to Azure Portal), listing existing deployments." +license: MIT +metadata: + author: Microsoft + version: "1.0.0" --- # Capacity Discovery diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/EXAMPLES.md b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/EXAMPLES.md index ac498441..a3f25848 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/EXAMPLES.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/EXAMPLES.md @@ -31,6 +31,12 @@ **Config:** gpt-4o / GlobalStandard / 20K TPM / Dynamic Quota / Spillover → `gpt-4o-backup` **Result:** Primary handles up to 20K TPM; overflow auto-redirects to backup deployment. +## Example 6: Anthropic Model Deployment (claude-sonnet-4-6) + +**Scenario:** Deploy claude-sonnet-4-6 with customized settings. +**Config:** claude-sonnet-4-6 / GlobalStandard / capacity 1 (MaaS) / Industry: Healthcare / No RAI policy (Anthropic manages content filtering) +**Result:** User selected "Healthcare" as industry → tenant country code (US) and org name fetched automatically → deployed via ARM REST API with `modelProviderData` in ~2 min. + --- ## Comparison Matrix @@ -42,6 +48,7 @@ | Ex 3 | gpt-4o | ProvisionedManaged | 200 PTU | - | ✓ | - | Predictable workload | | Ex 4 | gpt-4o-mini | Standard | 1K TPM | - | - | - | Dev/testing | | Ex 5 | gpt-4o | GlobalStandard | 20K TPM | ✓ | - | ✓ | Peak load | +| Ex 6 | claude-sonnet-4-6 | GlobalStandard | 1 (MaaS) | - | - | - | Anthropic model | ## Common Patterns diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/SKILL.md index 1e8636c2..7c94d561 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/SKILL.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/SKILL.md @@ -1,7 +1,10 @@ --- name: customize -description: | - Interactive guided deployment flow for Azure OpenAI models with full customization control. Step-by-step selection of model version, SKU (GlobalStandard/Standard/ProvisionedManaged), capacity, RAI policy (content filter), and advanced options (dynamic quota, priority processing, spillover). USE FOR: custom deployment, customize model deployment, choose version, select SKU, set capacity, configure content filter, RAI policy, deployment options, detailed deployment, advanced deployment, PTU deployment, provisioned throughput. DO NOT USE FOR: quick deployment to optimal region (use preset). +description: "Interactive guided deployment flow for Azure OpenAI models with full customization control. Step-by-step selection of model version, SKU (GlobalStandard/Standard/ProvisionedManaged), capacity, RAI policy (content filter), and advanced options (dynamic quota, priority processing, spillover). USE FOR: custom deployment, customize model deployment, choose version, select SKU, set capacity, configure content filter, RAI policy, deployment options, detailed deployment, advanced deployment, PTU deployment, provisioned throughput. DO NOT USE FOR: quick deployment to optimal region (use preset)." +license: MIT +metadata: + author: Microsoft + version: "1.0.1" --- # Customize Model Deployment diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/references/customize-guides.md b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/references/customize-guides.md index af2b97b4..1009adb4 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/references/customize-guides.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/references/customize-guides.md @@ -2,6 +2,8 @@ > Reference for: `models/deploy-model/customize/SKILL.md` +**Table of Contents:** [Selection Guides](#selection-guides) · [Advanced Topics](#advanced-topics) + ## Selection Guides ### How to Choose SKU diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/references/customize-workflow.md b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/references/customize-workflow.md index 11b23f5f..750ae56e 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/references/customize-workflow.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/customize/references/customize-workflow.md @@ -60,6 +60,24 @@ az cognitiveservices account list-models \ Present sorted unique list. Allow custom model name entry. +**Detect model format:** + +```bash +# Get model format (e.g., OpenAI, Anthropic, Meta-Llama, Mistral, Cohere) +MODEL_FORMAT=$(az cognitiveservices account list-models \ + --name "$ACCOUNT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --query "[?name=='$MODEL_NAME'].format" -o tsv | head -1) + +MODEL_FORMAT=${MODEL_FORMAT:-"OpenAI"} +echo "Model format: $MODEL_FORMAT" +``` + +> 💡 **Model format determines the deployment path:** +> - `OpenAI` — Standard CLI, TPM-based capacity, RAI policies, version upgrade policies +> - `Anthropic` — REST API with `modelProviderData`, capacity=1, no RAI, no version upgrade +> - All other formats (`Meta-Llama`, `Mistral`, `Cohere`, etc.) — Standard CLI, capacity=1 (MaaS), no RAI, no version upgrade + --- ## Phase 5: List and Select Model Version @@ -103,16 +121,18 @@ Quota key pattern: `OpenAI..`. Calculate `available = limit - c ## Phase 7: Configure Capacity -**Query capacity via REST API:** +> ⚠️ **Non-OpenAI models (MaaS):** If `MODEL_FORMAT != "OpenAI"`, capacity is always `1` (pay-per-token billing). Skip capacity configuration and set `DEPLOY_CAPACITY=1`. Proceed to Phase 7c (Anthropic) or Phase 8. + +**For OpenAI models only — query capacity via REST API:** ```bash # Current region capacity az rest --method GET --url \ - "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=OpenAI&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION" + "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION" ``` Filter result for `properties.skuName == $SELECTED_SKU`. Read `properties.availableCapacity`. -**Capacity defaults by SKU:** +**Capacity defaults by SKU (OpenAI only):** | SKU | Unit | Min | Max | Step | Default | |-----|------|-----|-----|------|---------| @@ -126,7 +146,7 @@ Validate user input: must be >= min, <= max, multiple of step. On invalid input, If no capacity in current region, query ALL regions: ```bash az rest --method GET --url \ - "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=OpenAI&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION" + "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION" ``` Filter: `properties.skuName == $SELECTED_SKU && properties.availableCapacity > 0`. Sort descending by capacity. @@ -146,8 +166,71 @@ If no region has capacity: fail with guidance to request quota increase, check e --- +## Phase 7c: Anthropic Model Provider Data (Anthropic models only) + +> ⚠️ **Only execute this phase if `MODEL_FORMAT == "Anthropic"`.** For OpenAI and other models, skip to Phase 8. + +Anthropic models require `modelProviderData` in the deployment payload. Collect this before deployment. + +**Step 1: Prompt user to select industry** + +Present the following list and ask the user to choose one: + +``` + 1. None (API value: none) + 2. Biotechnology (API value: biotechnology) + 3. Consulting (API value: consulting) + 4. Education (API value: education) + 5. Finance (API value: finance) + 6. Food & Beverage (API value: food_and_beverage) + 7. Government (API value: government) + 8. Healthcare (API value: healthcare) + 9. Insurance (API value: insurance) +10. Law (API value: law) +11. Manufacturing (API value: manufacturing) +12. Media (API value: media) +13. Nonprofit (API value: nonprofit) +14. Technology (API value: technology) +15. Telecommunications (API value: telecommunications) +16. Sport & Recreation (API value: sport_and_recreation) +17. Real Estate (API value: real_estate) +18. Retail (API value: retail) +19. Other (API value: other) +``` + +> ⚠️ **Do NOT pick a default industry or hardcode a value. Always ask the user.** This is required by Anthropic's terms of service. The industry list is static — there is no REST API that provides it. + +Store selection as `SELECTED_INDUSTRY` (use the API value, e.g., `technology`). + +**Step 2: Fetch tenant info (country code and organization name)** + +```bash +TENANT_INFO=$(az rest --method GET \ + --url "https://management.azure.com/tenants?api-version=2024-11-01" \ + --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json) + +COUNTRY_CODE=$(echo "$TENANT_INFO" | jq -r '.countryCode') +ORG_NAME=$(echo "$TENANT_INFO" | jq -r '.displayName') +``` + +*PowerShell version:* +```powershell +$tenantInfo = az rest --method GET ` + --url "https://management.azure.com/tenants?api-version=2024-11-01" ` + --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json | ConvertFrom-Json + +$countryCode = $tenantInfo.countryCode +$orgName = $tenantInfo.displayName +``` + +Store `COUNTRY_CODE` and `ORG_NAME` for use in Phase 13. + +--- + ## Phase 8: Select RAI Policy (Content Filter) +> ⚠️ **Note:** RAI policies only apply to OpenAI models. Skip this phase if `MODEL_FORMAT != "OpenAI"` (Anthropic, Meta-Llama, Mistral, Cohere, etc. do not use RAI policies). + Present options: 1. `Microsoft.DefaultV2` — Balanced filtering (recommended). Filters hate, violence, sexual, self-harm. 2. `Microsoft.Prompt-Shield` — Enhanced prompt injection/jailbreak protection. @@ -184,6 +267,8 @@ az cognitiveservices account deployment list \ ## Phase 10: Configure Version Upgrade Policy +> ⚠️ **Note:** Version upgrade policies only apply to OpenAI models. Skip this phase if `MODEL_FORMAT != "OpenAI"`. + | Policy | Description | |--------|-------------| | `OnceNewDefaultVersionAvailable` | Auto-upgrade to new default (Recommended) | @@ -223,6 +308,10 @@ User confirms or cancels. ## Phase 13: Execute Deployment +> 💡 `MODEL_FORMAT` was already detected in Phase 4. Use the stored value here. + +### Standard CLI deployment (non-Anthropic models): + **Create deployment:** ```bash az cognitiveservices account deployment create \ @@ -231,12 +320,75 @@ az cognitiveservices account deployment create \ --deployment-name $DEPLOYMENT_NAME \ --model-name $MODEL_NAME \ --model-version $MODEL_VERSION \ - --model-format "OpenAI" \ + --model-format "$MODEL_FORMAT" \ --sku-name $SELECTED_SKU \ --sku-capacity $DEPLOY_CAPACITY ``` -**Check status:** +> 💡 **Note:** For non-OpenAI MaaS models, `$DEPLOY_CAPACITY` is `1` (set in Phase 7). + +### Anthropic model deployment (requires modelProviderData): + +The Azure CLI does not support `--model-provider-data`. Use the ARM REST API directly. + +> ⚠️ Industry, country code, and organization name should have been collected in Phase 7c. + +```bash +echo "Creating Anthropic model deployment via REST API..." + +az rest --method PUT \ + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/$DEPLOYMENT_NAME?api-version=2024-10-01" \ + --body "{ + \"sku\": { + \"name\": \"$SELECTED_SKU\", + \"capacity\": 1 + }, + \"properties\": { + \"model\": { + \"format\": \"Anthropic\", + \"name\": \"$MODEL_NAME\", + \"version\": \"$MODEL_VERSION\" + }, + \"modelProviderData\": { + \"industry\": \"$SELECTED_INDUSTRY\", + \"countryCode\": \"$COUNTRY_CODE\", + \"organizationName\": \"$ORG_NAME\" + } + } + }" +``` + +*PowerShell version:* +```powershell +Write-Host "Creating Anthropic model deployment via REST API..." + +$body = @{ + sku = @{ + name = $SELECTED_SKU + capacity = 1 + } + properties = @{ + model = @{ + format = "Anthropic" + name = $MODEL_NAME + version = $MODEL_VERSION + } + modelProviderData = @{ + industry = $SELECTED_INDUSTRY + countryCode = $countryCode + organizationName = $orgName + } + } +} | ConvertTo-Json -Depth 5 + +az rest --method PUT ` + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/${DEPLOYMENT_NAME}?api-version=2024-10-01" ` + --body $body +``` + +> 💡 **Note:** Anthropic models use `capacity: 1` (MaaS billing model), not TPM-based capacity. RAI policy is not applicable for Anthropic models. + +### Monitor deployment status: ```bash az cognitiveservices account deployment show \ --name $ACCOUNT_NAME \ diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/EXAMPLES.md b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/EXAMPLES.md index 98c4276f..0a97a6d6 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/EXAMPLES.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/EXAMPLES.md @@ -38,6 +38,11 @@ **Scenario:** Deploy "latest gpt-4o" when multiple versions exist. **Result:** Latest stable version auto-selected. Capacity aggregated across versions. +## Example 8: Anthropic Model (claude-sonnet-4-6) + +**Scenario:** Deploy claude-sonnet-4-6 (Anthropic model requiring modelProviderData). +**Result:** User prompted for industry selection → tenant country code and org name fetched automatically → deployed via ARM REST API with `modelProviderData` payload in ~2 min. Capacity set to 1 (MaaS billing). + --- ## Summary of Scenarios @@ -51,6 +56,7 @@ | **5: First-Time** | ~5m | Complete onboarding | | **6: Name Conflict** | ~1m | Auto-retry with suffix | | **7: Multi-Version** | ~1m | Latest version auto-selected | +| **8: Anthropic** | ~2m | Industry prompt, tenant info, REST API deploy | ## Common Patterns diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/SKILL.md b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/SKILL.md index 5d296be5..09fcc94c 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/SKILL.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/SKILL.md @@ -1,7 +1,10 @@ --- name: preset -description: | - Intelligently deploys Azure OpenAI models to optimal regions by analyzing capacity across all available regions. Automatically checks current region first and shows alternatives if needed. USE FOR: quick deployment, optimal region, best region, automatic region selection, fast setup, multi-region capacity check, high availability deployment, deploy to best location. DO NOT USE FOR: custom SKU selection (use customize), specific version selection (use customize), custom capacity configuration (use customize), PTU deployments (use customize). +description: "Intelligently deploys Azure OpenAI models to optimal regions by analyzing capacity across all available regions. Automatically checks current region first and shows alternatives if needed. USE FOR: quick deployment, optimal region, best region, automatic region selection, fast setup, multi-region capacity check, high availability deployment, deploy to best location. DO NOT USE FOR: custom SKU selection (use customize), specific version selection (use customize), custom capacity configuration (use customize), PTU deployments (use customize)." +license: MIT +metadata: + author: Microsoft + version: "1.0.1" --- # Deploy Model to Optimal Region diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/references/preset-workflow.md b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/references/preset-workflow.md index 30be27a9..2598904e 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/references/preset-workflow.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/references/preset-workflow.md @@ -136,6 +136,26 @@ az cognitiveservices account list-models \ MODEL_VERSION="" ``` +**Detect model format:** + +```bash +# Get model format from model catalog (e.g., OpenAI, Anthropic, Meta-Llama, Mistral, Cohere) +MODEL_FORMAT=$(az cognitiveservices account list-models \ + --name "$ACCOUNT_NAME" \ + --resource-group "$RESOURCE_GROUP" \ + --query "[?name=='$MODEL_NAME'].format" -o tsv | head -1) + +# Default to OpenAI if not found +MODEL_FORMAT=${MODEL_FORMAT:-"OpenAI"} + +echo "Model format: $MODEL_FORMAT" +``` + +> 💡 **Model format determines the deployment path:** +> - `OpenAI` — Standard CLI deployment, TPM-based capacity, RAI policies apply +> - `Anthropic` — REST API deployment with `modelProviderData`, capacity=1, no RAI +> - All other formats (`Meta-Llama`, `Mistral`, `Cohere`, etc.) — Standard CLI deployment, capacity=1 (MaaS), no RAI + --- ## Phase 4: Check Current Region Capacity @@ -145,7 +165,7 @@ Before checking other regions, see if the current project's region has capacity: ```bash # Query capacity for current region CAPACITY_JSON=$(az rest --method GET \ - --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=OpenAI&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION") + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION") # Extract available capacity for GlobalStandard SKU CURRENT_CAPACITY=$(echo "$CAPACITY_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard") | .properties.availableCapacity') @@ -174,7 +194,7 @@ Only execute this phase if current region has no capacity. ```bash # Get capacity for all regions in subscription ALL_REGIONS_JSON=$(az rest --method GET \ - --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=OpenAI&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION") + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION") # Save to file for processing echo "$ALL_REGIONS_JSON" > /tmp/capacity_check.json @@ -376,27 +396,33 @@ Write-Host "Generated deployment name: $DEPLOYMENT_NAME" **Calculate deployment capacity:** -Follow UX capacity calculation logic: use 50% of available capacity (minimum 50 TPM): +Follow UX capacity calculation logic. For OpenAI models, use 50% of available capacity (minimum 50 TPM). For all other models (MaaS), capacity is always 1: ```bash -SELECTED_CAPACITY=$(echo "$ALL_REGIONS_JSON" | jq -r ".value[] | select(.location==\"$SELECTED_REGION\" and .properties.skuName==\"GlobalStandard\") | .properties.availableCapacity") - -# Apply UX capacity calculation: 50% of available (minimum 50) -if [ "$SELECTED_CAPACITY" -gt 50 ]; then - DEPLOY_CAPACITY=$((SELECTED_CAPACITY / 2)) - if [ "$DEPLOY_CAPACITY" -lt 50 ]; then - DEPLOY_CAPACITY=50 +if [ "$MODEL_FORMAT" = "OpenAI" ]; then + # OpenAI models: TPM-based capacity (50% of available, minimum 50) + SELECTED_CAPACITY=$(echo "$ALL_REGIONS_JSON" | jq -r ".value[] | select(.location==\"$SELECTED_REGION\" and .properties.skuName==\"GlobalStandard\") | .properties.availableCapacity") + + if [ "$SELECTED_CAPACITY" -gt 50 ]; then + DEPLOY_CAPACITY=$((SELECTED_CAPACITY / 2)) + if [ "$DEPLOY_CAPACITY" -lt 50 ]; then + DEPLOY_CAPACITY=50 + fi + else + DEPLOY_CAPACITY=$SELECTED_CAPACITY fi + + echo "Deploying with capacity: $DEPLOY_CAPACITY TPM (50% of available: $SELECTED_CAPACITY TPM)" else - DEPLOY_CAPACITY=$SELECTED_CAPACITY + # Non-OpenAI models (MaaS): capacity is always 1 + DEPLOY_CAPACITY=1 + echo "MaaS model — deploying with capacity: 1 (pay-per-token billing)" fi - -echo "Deploying with capacity: $DEPLOY_CAPACITY TPM (50% of available: $SELECTED_CAPACITY TPM)" ``` -**Create deployment using Azure CLI:** +### If MODEL_FORMAT is NOT "Anthropic" — Standard CLI Deployment -> 💡 **Note:** The Azure CLI now supports GlobalStandard SKU deployments directly. Use the native `az cognitiveservices account deployment create` command. +> 💡 **Note:** The Azure CLI supports all non-Anthropic model formats directly. *Bash version:* ```bash @@ -408,7 +434,7 @@ az cognitiveservices account deployment create \ --deployment-name "$DEPLOYMENT_NAME" \ --model-name "$MODEL_NAME" \ --model-version "$MODEL_VERSION" \ - --model-format "OpenAI" \ + --model-format "$MODEL_FORMAT" \ --sku-name "GlobalStandard" \ --sku-capacity "$DEPLOY_CAPACITY" ``` @@ -423,11 +449,126 @@ az cognitiveservices account deployment create ` --deployment-name $DEPLOYMENT_NAME ` --model-name $MODEL_NAME ` --model-version $MODEL_VERSION ` - --model-format "OpenAI" ` + --model-format $MODEL_FORMAT ` --sku-name "GlobalStandard" ` --sku-capacity $DEPLOY_CAPACITY ``` +> 💡 **Note:** For non-OpenAI MaaS models (Meta-Llama, Mistral, Cohere, etc.), `$DEPLOY_CAPACITY` is `1` (set in capacity calculation above). + +### If MODEL_FORMAT is "Anthropic" — REST API Deployment with modelProviderData + +The Azure CLI does not support `--model-provider-data`. You must use the ARM REST API directly. + +**Step 1: Prompt user to select industry** + +Present the following list and ask the user to choose one: + +``` + 1. None (API value: none) + 2. Biotechnology (API value: biotechnology) + 3. Consulting (API value: consulting) + 4. Education (API value: education) + 5. Finance (API value: finance) + 6. Food & Beverage (API value: food_and_beverage) + 7. Government (API value: government) + 8. Healthcare (API value: healthcare) + 9. Insurance (API value: insurance) +10. Law (API value: law) +11. Manufacturing (API value: manufacturing) +12. Media (API value: media) +13. Nonprofit (API value: nonprofit) +14. Technology (API value: technology) +15. Telecommunications (API value: telecommunications) +16. Sport & Recreation (API value: sport_and_recreation) +17. Real Estate (API value: real_estate) +18. Retail (API value: retail) +19. Other (API value: other) +``` + +> ⚠️ **Do NOT pick a default industry or hardcode a value. Always ask the user.** This is required by Anthropic's terms of service. The industry list is static — there is no REST API that provides it. + +Store selection as `SELECTED_INDUSTRY` (use the API value, e.g., `technology`). + +**Step 2: Fetch tenant info (country code and organization name)** + +```bash +TENANT_INFO=$(az rest --method GET \ + --url "https://management.azure.com/tenants?api-version=2024-11-01" \ + --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json) + +COUNTRY_CODE=$(echo "$TENANT_INFO" | jq -r '.countryCode') +ORG_NAME=$(echo "$TENANT_INFO" | jq -r '.displayName') +``` + +*PowerShell version:* +```powershell +$tenantInfo = az rest --method GET ` + --url "https://management.azure.com/tenants?api-version=2024-11-01" ` + --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json | ConvertFrom-Json + +$countryCode = $tenantInfo.countryCode +$orgName = $tenantInfo.displayName +``` + +**Step 3: Deploy via ARM REST API** + +*Bash version:* +```bash +echo "Creating Anthropic model deployment via REST API..." + +az rest --method PUT \ + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/$DEPLOYMENT_NAME?api-version=2024-10-01" \ + --body "{ + \"sku\": { + \"name\": \"GlobalStandard\", + \"capacity\": 1 + }, + \"properties\": { + \"model\": { + \"format\": \"Anthropic\", + \"name\": \"$MODEL_NAME\", + \"version\": \"$MODEL_VERSION\" + }, + \"modelProviderData\": { + \"industry\": \"$SELECTED_INDUSTRY\", + \"countryCode\": \"$COUNTRY_CODE\", + \"organizationName\": \"$ORG_NAME\" + } + } + }" +``` + +*PowerShell version:* +```powershell +Write-Host "Creating Anthropic model deployment via REST API..." + +$body = @{ + sku = @{ + name = "GlobalStandard" + capacity = 1 + } + properties = @{ + model = @{ + format = "Anthropic" + name = $MODEL_NAME + version = $MODEL_VERSION + } + modelProviderData = @{ + industry = $SELECTED_INDUSTRY + countryCode = $countryCode + organizationName = $orgName + } + } +} | ConvertTo-Json -Depth 5 + +az rest --method PUT ` + --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/${DEPLOYMENT_NAME}?api-version=2024-10-01" ` + --body $body +``` + +> 💡 **Note:** Anthropic models use `capacity: 1` (MaaS billing model), not TPM-based capacity. + **Monitor deployment progress:** ```bash echo "Monitoring deployment status..." diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/references/workflow.md b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/references/workflow.md index b63a9bbf..109b2fc6 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/references/workflow.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/models/deploy-model/preset/references/workflow.md @@ -2,6 +2,8 @@ Condensed implementation reference for preset (optimal region) model deployment. See [SKILL.md](../SKILL.md) for overview. +**Table of Contents:** [Phase 1: Verify Authentication](#phase-1-verify-authentication) · [Phase 2: Get Current Project](#phase-2-get-current-project) · [Phase 3: Get Model Name](#phase-3-get-model-name) · [Phase 4: Check Current Region Capacity](#phase-4-check-current-region-capacity) · [Phase 5: Query Multi-Region Capacity](#phase-5-query-multi-region-capacity) · [Phase 6: Select Region and Project](#phase-6-select-region-and-project) · [Phase 7: Deploy Model](#phase-7-deploy-model) + --- ## Phase 1: Verify Authentication diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/project/connections.md b/.github/plugins/azure-skills/skills/microsoft-foundry/project/connections.md new file mode 100644 index 00000000..d4f78be6 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/project/connections.md @@ -0,0 +1,58 @@ +# Foundry Project Connections + +Connections authenticate and link external resources to a Foundry project. Many agent tools (Azure AI Search, Bing Grounding, MCP) require a project connection before use. + +## Managing Connections via MCP + +Use the Foundry MCP server for all connection operations. The MCP tools handle authentication, validation, and project scoping automatically. + +| Operation | MCP Tool | Description | +|-----------|----------|-------------| +| List all connections | `foundry_connections_list` | Lists all connections in the current project | +| Get connection details | `foundry_connections_get` | Retrieves a specific connection by name, including its ID | +| Create a connection | `foundry_connections_create` | Creates a new connection to an external resource | +| Delete a connection | `foundry_connections_delete` | Removes a connection from the project | + +> 💡 **Tip:** The `connection_id` returned by `foundry_connections_get` is the value you pass as `project_connection_id` when configuring agent tools. + +## Create Connection via Portal + +1. Open [Microsoft Foundry portal](https://ai.azure.com) +2. Navigate to **Operate** → **Admin** → select your project +3. Select **Add connection** → choose service type +4. Browse for resource, select auth method, click **Add connection** + +## Connection ID Format + +For REST and TypeScript samples, the full connection ID format is: + +``` +/subscriptions/{subId}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}/connections/{connectionName} +``` + +Python and C# SDKs resolve this automatically from the connection name. + +## Common Connection Types + +| Type | Resource | Used By | +|------|----------|---------| +| `azure_ai_search` | Azure AI Search | AI Search tool | +| `bing` | Grounding with Bing Search | Bing grounding tool | +| `bing_custom_search` | Grounding with Bing Custom Search | Bing Custom Search tool | +| `api_key` | Any API-key resource | MCP servers, custom tools | +| `azure_openai` | Azure OpenAI | Model access | + +## RBAC for Connection Management + +| Role | Scope | Permission | +|------|-------|------------| +| **Azure AI Project Manager** | Project | Create/manage project connections | +| **Contributor** or **Owner** | Subscription/RG | Create Bing/Search resources, get keys | + +## Troubleshooting + +| Error | Cause | Fix | +|-------|-------|-----| +| `Connection not found` | Name mismatch or wrong project | Use `foundry_connections_list` to find correct name | +| `Unauthorized` creating connection | Missing Azure AI Project Manager role | Assign role on the Foundry project | +| `Invalid connection ID format` | Using name instead of full resource ID | Use `foundry_connections_get` to resolve the full ID | diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/project/create/create-foundry-project.md b/.github/plugins/azure-skills/skills/microsoft-foundry/project/create/create-foundry-project.md index 81324d9f..dcf61d26 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/project/create/create-foundry-project.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/project/create/create-foundry-project.md @@ -9,7 +9,9 @@ allowed-tools: Read, Write, Bash, AskUserQuestion # Create Azure AI Foundry Project -Create a new Azure AI Foundry project using azd. Provisions: Foundry account, project, Container Registry, Application Insights, managed identity, and RBAC permissions. +Create a new Azure AI Foundry project using azd. Provisions: Foundry account, project, Application Insights, managed identity, and RBAC permissions. Optionally enables hosted agents (capability host + Container Registry). + +**Table of Contents:** [Prerequisites](#prerequisites) · [Workflow](#workflow) · [Best Practices](#best-practices) · [Troubleshooting](#troubleshooting) · [Related Skills](#related-skills) · [Resources](#resources) ## Prerequisites @@ -53,6 +55,7 @@ Use AskUserQuestion for: 1. **Project name** — used as azd environment name and resource group (`rg-`). Must contain only alphanumeric characters and hyphens. Examples: `my-ai-project`, `dev-agents` 2. **Azure location** (optional) — defaults to North Central US (required for hosted agents preview) +3. **Enable hosted agents?** (yes/no) — provisions a capability host and Container Registry for deploying hosted agents. Defaults to no. ### Step 3: Create Directory and Initialize @@ -72,13 +75,21 @@ If user specified a non-default location: azd config set defaults.location ``` +If user chose to enable hosted agents: + +```bash +azd env set ENABLE_HOSTED_AGENTS true +``` + +This provisions a capability host (`capabilityHosts/agents`) on the Foundry account and auto-adds an Azure Container Registry for hosted agent deployments. + ### Step 4: Provision Infrastructure ```bash azd provision --no-prompt ``` -Takes 5–10 minutes. Creates resource group, Foundry account/project, Container Registry, Application Insights, managed identity, and RBAC roles. +Takes 5–10 minutes. Creates resource group, Foundry account/project, Application Insights, managed identity, and RBAC roles. If hosted agents enabled, also creates Container Registry and capability host. ### Step 5: Retrieve Project Details diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/quota/quota.md b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/quota.md index 4ff2986c..57a8580f 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/quota/quota.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/quota.md @@ -2,7 +2,9 @@ Quota and capacity management for Microsoft Foundry. Quotas are **subscription + region** level. -> **Agent Rule:** Query REGIONAL quota summary, NOT individual resources. Don't run `az cognitiveservices account list` for quota queries. +> ⚠️ **Important:** This is the **authoritative skill** for all Foundry quota operations. When a user asks about quota, capacity, TPM, PTU, quota errors, or deployment limits, **always invoke this skill** rather than using MCP tools (azure-quota, azure-documentation, azure-foundry) directly. This skill provides structured workflows and error handling that direct tool calls lack. + +> **Important:** All quota operations are **control plane (management)** operations. Use **Azure CLI commands** as the primary method. MCP tools are optional convenience wrappers around the same control plane APIs. ## Quota Types @@ -15,12 +17,35 @@ Quota and capacity management for Microsoft Foundry. Quotas are **subscription + **When to use PTU:** Consistent high-volume production workloads where monthly commitment is cost-effective. +--- + +Use this sub-skill when the user needs to: + +- **View quota usage** — check current TPM/PTU allocation and available capacity +- **Check quota limits** — show quota limits for a subscription, region, or model +- **Find optimal regions** — compare quota availability across regions for deployment +- **Plan deployments** — verify sufficient quota before deploying models +- **Request quota increases** — navigate quota increase process through Azure Portal +- **Troubleshoot deployment failures** — diagnose QuotaExceeded, InsufficientQuota, DeploymentLimitReached, 429 rate limit errors +- **Optimize allocation** — monitor and consolidate quota across deployments +- **Monitor quota across deployments** — track capacity by model and region +- **Explain quota concepts** — explain TPM, PTU, capacity units, regional quotas +- **Free up quota** — identify and delete unused deployments + +**Key Points:** +1. Isolated by region (East US ≠ West US) +2. Regional capacity varies by model +3. Multi-region enables failover and load distribution +4. Quota requests specify target region + +See [detailed guide](./references/workflows.md#regional-quota). + +--- + ## Core Workflows ### 1. Check Regional Quota -**Command Pattern:** "Show my Microsoft Foundry quota usage" - ```bash subId=$(az account show --query id -o tsv) az rest --method get \ @@ -28,17 +53,18 @@ az rest --method get \ --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table ``` -Change region as needed: `eastus`, `eastus2`, `westus`, `westus2`, `swedencentral`, `uksouth`. +**Output interpretation:** +- **Used**: Current TPM consumed (10000 = 10K TPM) +- **Limit**: Maximum TPM quota (15000 = 15K TPM) +- **Available**: Limit - Used (5K TPM available) -See [Detailed Workflow Steps](./references/workflows.md) for complete instructions including multi-region checks and resource-specific queries. +Change region: `eastus`, `eastus2`, `westus`, `westus2`, `swedencentral`, `uksouth`. --- ### 2. Find Best Region for Deployment -**Command Pattern:** "Which region has available quota for GPT-4o?" - -Check specific regions one at a time: +Check specific regions for available quota: ```bash subId=$(az account show --query id -o tsv) @@ -48,60 +74,113 @@ az rest --method get \ --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table ``` -See [Detailed Workflow Steps](./references/workflows.md) for multi-region comparison. +See [workflows reference](./references/workflows.md#multi-region-check) for multi-region comparison. --- -### 3. Deploy with PTU - -**Command Pattern:** "Deploy GPT-4o with PTU" +### 3. Check Quota Before Deployment -Use Foundry Portal capacity calculator first, then deploy: +Verify available quota for your target model: ```bash -az cognitiveservices account deployment create --name --resource-group \ - --deployment-name gpt-4o-ptu --model-name gpt-4o --model-version "2024-05-13" \ - --model-format OpenAI --sku-name ProvisionedManaged --sku-capacity 100 +subId=$(az account show --query id -o tsv) +region="eastus" +model="OpenAI.Standard.gpt-4o" + +az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?name.value=='$model'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table ``` -See [PTU Guide](./references/ptu-guide.md) for capacity planning and when to use PTU. +- **Available > 0**: Yes, you have quota +- **Available = 0**: Delete unused deployments or try different region --- -### 4. Delete Deployment (Free Quota) +### 4. Monitor Quota by Model + +Show quota allocation grouped by model: + +```bash +subId=$(az account show --query id -o tsv) +region="eastus" +az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table +``` -**Command Pattern:** "Delete unused deployment to free quota" +Shows aggregate usage across ALL deployments by model type. + +**Optional:** List individual deployments: +```bash +az cognitiveservices account list --query "[?kind=='AIServices'].{Name:name,RG:resourceGroup}" -o table + +az cognitiveservices account deployment list --name --resource-group \ + --query "[].{Name:name,Model:properties.model.name,Capacity:sku.capacity}" -o table +``` + +--- + +### 5. Delete Deployment (Free Quota) ```bash az cognitiveservices account deployment delete --name --resource-group \ --deployment-name ``` +Quota freed **immediately**. Re-run Workflow #1 to verify. + --- -## Troubleshooting +### 6. Request Quota Increase -| Error | Quick Fix | -|-------|-----------| -| `QuotaExceeded` | Delete unused deployments or request increase | -| `InsufficientQuota` | Reduce capacity or try different region | -| `DeploymentLimitReached` | Delete unused deployments | -| `429 Rate Limit` | Increase TPM or migrate to PTU | +**Azure Portal Process:** +1. Navigate to [Azure Portal - All Resources](https://portal.azure.com/#view/HubsExtension/BrowseAll) → Filter "AI Services" → Click resource +2. Select **Quotas** in left navigation +3. Click **Request quota increase** +4. Fill form: Model, Current Limit, Requested Limit, Region, **Business Justification** +5. Wait for approval: **3-5 business days typically, up to 10 business days** ([source](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota)) -See [Troubleshooting Guide](./references/troubleshooting.md) for detailed error resolution steps. +**Justification template:** +``` +Production [workload type] using [model] in [region]. +Expected traffic: [X requests/day] with [Y tokens/request]. +Requires [Z TPM] capacity. Current [N TPM] insufficient. +Request increase to [M TPM]. Deployment target: [date]. +``` + +See [detailed quota request guide](./references/workflows.md#request-quota-increase) for complete steps. --- -## Request Quota Increase +## Quick Troubleshooting -Azure Portal → Foundry resource → **Quotas** → **Request quota increase**. Include business justification. Processing: 1-2 days. +| Error | Quick Fix | Detailed Guide | +|-------|-----------|----------------| +| `QuotaExceeded` | Delete unused deployments or request increase | [Error Resolution](./references/error-resolution.md#quotaexceeded) | +| `InsufficientQuota` | Reduce capacity or try different region | [Error Resolution](./references/error-resolution.md#insufficientquota) | +| `DeploymentLimitReached` | Delete unused deployments (10-20 slot limit) | [Error Resolution](./references/error-resolution.md#deploymentlimitreached) | +| `429 Rate Limit` | Increase TPM or migrate to PTU | [Error Resolution](./references/error-resolution.md#429-errors) | --- ## References -- [Detailed Workflows](./references/workflows.md) - Complete workflow steps and multi-region checks +**Detailed Guides:** +- [Error Resolution Workflows](./references/error-resolution.md) - Detailed workflows for quota exhausted, 429 errors, insufficient quota, deployment limits +- [Troubleshooting Guide](./references/troubleshooting.md) - Quick error fixes and diagnostic commands +- [Quota Optimization Strategies](./references/optimization.md) - 5 strategies for freeing quota and reducing costs +- [Capacity Planning Guide](./references/capacity-planning.md) - TPM vs PTU comparison, model selection, workload calculations +- [Workflows Reference](./references/workflows.md) - Complete workflow steps and multi-region checks - [PTU Guide](./references/ptu-guide.md) - Provisioned throughput capacity planning -- [Troubleshooting](./references/troubleshooting.md) - Error resolution and diagnostics -- [Quota Management](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota) -- [Rate Limits](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits) + +**Official Microsoft Documentation:** +- [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates +- [PTU Costs and Billing](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates +- [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities and regions +- [Quota Management Guide](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota) - Official quota procedures +- [Quotas and Limits](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits) - Rate limits and quota details + +**Calculators:** +- [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) - Official pricing estimator +- Azure AI Foundry PTU calculator (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) - PTU capacity sizing diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/capacity-planning.md b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/capacity-planning.md new file mode 100644 index 00000000..029702a3 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/capacity-planning.md @@ -0,0 +1,126 @@ +# Capacity Planning Guide + +Comprehensive guide for planning Azure AI Foundry capacity, including cost analysis, model selection, and workload calculations. + +**Table of Contents:** [Cost Comparison: TPM vs PTU](#cost-comparison-tpm-vs-ptu) · [Production Workload Examples](#production-workload-examples) · [Model Selection and Deployment Type Guidance](#model-selection-and-deployment-type-guidance) + +## Cost Comparison: TPM vs PTU + +> **Official Pricing Sources:** +> - [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates +> - [PTU Costs and Billing Guide](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates and capacity planning + +**TPM (Standard) Pricing:** +- Pay-per-token for input/output +- No upfront commitment +- **Rates**: See [Azure OpenAI Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) + - GPT-4o: ~$0.0025-$0.01/1K tokens + - GPT-4 Turbo: ~$0.01-$0.03/1K + - GPT-3.5 Turbo: ~$0.0005-$0.0015/1K +- **Best for**: Variable workloads, unpredictable traffic + +**PTU (Provisioned) Pricing:** +- Hourly billing: `$/PTU/hr × PTUs × 730 hrs/month` +- Monthly commitment with Reservations discounts +- **Rates**: See [PTU Billing Guide](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) +- Use PTU calculator to determine requirements (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) +- **Best for**: High-volume (>1M tokens/day), predictable traffic, guaranteed throughput + +**Cost Decision Framework** (Analytical Guidance): + +``` +Step 1: Calculate monthly TPM cost + Monthly TPM cost = (Daily tokens × 30 days × $price per 1K tokens) / 1000 + +Step 2: Calculate monthly PTU cost + Monthly PTU cost = Required PTUs × 730 hours/month × $PTU-hour rate + (Get Required PTUs from Azure AI Foundry portal: Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) + +Step 3: Compare + Use PTU when: Monthly PTU cost < (Monthly TPM cost × 0.7) + (Use 70% threshold to account for commitment risk) +``` + +**Example Calculation** (Analytical): + +Scenario: 1M requests/day, average 1,000 tokens per request + +- **Daily tokens**: 1,000,000 × 1,000 = 1B tokens/day +- **TPM Cost** (using GPT-4o at $0.005/1K avg): (1B × 30 × $0.005) / 1000 = ~$150,000/month +- **PTU Cost** (estimated 100 PTU at ~$5/PTU-hour): 100 PTU × 730 hours × $5 = ~$365,000/month +- **Decision**: Use TPM (significantly lower cost for this workload) + +> **Important**: Always use the official [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) and Azure AI Foundry portal PTU calculator (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) for exact pricing by model, region, and workload. Prices vary by region and are subject to change. + +--- + +## Production Workload Examples + +Real-world production scenarios with capacity calculations for gpt-4, version 0613 (from Azure Foundry Portal calculator): + +| Workload Type | Calls/Min | Prompt Tokens | Response Tokens | Cache Hit % | Total Tokens/Min | PTU Required | TPM Equivalent | +|---------------|-----------|---------------|-----------------|-------------|------------------|--------------|----------------| +| **RAG Chat** | 10 | 3,500 | 300 | 20% | 38,000 | 100 | 38K TPM | +| **Basic Chat** | 10 | 500 | 100 | 20% | 6,000 | 100 | 6K TPM | +| **Summarization** | 10 | 5,000 | 300 | 20% | 53,000 | 100 | 53K TPM | +| **Classification** | 10 | 3,800 | 10 | 20% | 38,100 | 100 | 38K TPM | + +**How to Calculate Your Needs:** + +1. **Determine your peak calls per minute**: Monitor or estimate maximum concurrent requests +2. **Measure token usage**: Average prompt size + response size +3. **Account for cache hits**: Prompt caching can reduce effective token count by 20-50% +4. **Calculate total tokens/min**: (Calls/min × (Prompt tokens + Response tokens)) × (1 - Cache %) +5. **Choose deployment type**: + - **TPM (Standard)**: Allocate 1.5-2× your calculated tokens/min for headroom + - **PTU (Provisioned)**: Use Azure AI Foundry portal PTU calculator for exact PTU count (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) + +**Example Calculation (RAG Chat Production):** +- Peak: 10 calls/min +- Prompt: 3,500 tokens (context + question) +- Response: 300 tokens (answer) +- Cache: 20% hit rate (reduces prompt tokens by 20%) +- **Total TPM needed**: (10 × (3,500 × 0.8 + 300)) = 31,000 TPM +- **With 50% headroom**: 46,500 TPM → Round to **50K TPM deployment** + +**PTU Recommendation:** +For the combined workload (40 calls/min, 135K tokens/min total), use **200 PTU** (from calculator above). + +--- + +## Model Selection and Deployment Type Guidance + +> **Official Documentation:** +> - [Choose the Right AI Model for Your Workload](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/choose-ai-model) - Microsoft Architecture Center +> - [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities, regions, and quotas +> - [Understanding Deployment Types](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/concepts/deployment-types) - Standard vs Provisioned guidance + +**Model Characteristics** (from [official Azure OpenAI documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models)): + +| Model | Key Characteristics | Best For | +|-------|---------------------|----------| +| **GPT-4o** | Matches GPT-4 Turbo performance in English text/coding, superior in non-English and vision tasks. Cheaper and faster than GPT-4 Turbo. | Multimodal tasks, cost-effective general purpose, high-volume production workloads | +| **GPT-4 Turbo** | Superior reasoning capabilities, larger context window (128K tokens) | Complex reasoning tasks, long-context analysis | +| **GPT-3.5 Turbo** | Most cost-effective, optimized for chat and completions, fast response time | Simple tasks, customer service, high-volume low-cost scenarios | +| **GPT-4o mini** | Fastest response time, low latency | Latency-sensitive applications requiring immediate responses | +| **text-embedding-3-large** | Purpose-built for vector embeddings | RAG applications, semantic search, document similarity | + +**Deployment Type Selection** (from [official deployment types guide](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/concepts/deployment-types)): + +| Traffic Pattern | Recommended Deployment Type | Reason | +|-----------------|---------------------------|---------| +| **Variable, bursty traffic** | Standard or Global Standard (pay-per-token) | No commitment, pay only for usage | +| **Consistent high volume** | Provisioned types (PTU) | Reserved capacity, predictable costs | +| **Large batch jobs (non-time-sensitive)** | Global Batch or DataZone Batch | 50% cost savings vs Standard | +| **Low latency variance required** | Provisioned types | Guaranteed throughput, no rate limits | +| **No regional restrictions** | Global Standard or Global Provisioned | Access to best available capacity | + +**Capacity Planning Approach** (from [PTU onboarding guide](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)): + +1. **Understand your TPM requirements**: Calculate expected tokens per minute based on workload +2. **Use the built-in capacity planner**: Available in Azure AI Foundry portal (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) +3. **Input your metrics**: Enter input TPM and output TPM based on your workload characteristics +4. **Get PTU recommendation**: The calculator provides PTU allocation recommendation +5. **Compare costs**: Evaluate Standard (TPM) vs Provisioned (PTU) using the official pricing calculator + +> **Note**: Microsoft does not publish specific "X requests/day = Y TPM" recommendations as capacity requirements vary significantly based on prompt size, response length, cache hit rates, and model choice. Use the built-in capacity planner with your actual workload characteristics. diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/error-resolution.md b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/error-resolution.md new file mode 100644 index 00000000..3ecdef85 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/error-resolution.md @@ -0,0 +1,145 @@ +# Error Resolution Workflows + +**Table of Contents:** [Workflow 7: Quota Exhausted Recovery](#workflow-7-quota-exhausted-recovery) · [Workflow 8: Resolve 429 Rate Limit Errors](#workflow-8-resolve-429-rate-limit-errors) · [Workflow 9: Resolve DeploymentLimitReached](#workflow-9-resolve-deploymentlimitreached) · [Workflow 10: Resolve InsufficientQuota](#workflow-10-resolve-insufficientquota) · [Workflow 11: Resolve QuotaExceeded](#workflow-11-resolve-quotaexceeded) + +## Workflow 7: Quota Exhausted Recovery + +**A. Deploy to Different Region** +```bash +subId=$(az account show --query id -o tsv) +for region in eastus westus eastus2 westus2 swedencentral uksouth; do + az rest --method get --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table & +done; wait +``` + +**B. Delete Unused Deployments** +```bash +az cognitiveservices account deployment delete --name --resource-group --deployment-name +``` + +**C. Request Quota Increase (3-5 days)** + +**D. Migrate to PTU** - See capacity-planning.md + +--- + +## Workflow 8: Resolve 429 Rate Limit Errors + +**Identify Deployment:** +```bash +az cognitiveservices account deployment list --name --resource-group \ + --query "[].{Name:name,Model:properties.model.name,TPM:sku.capacity*1000}" -o table +``` + +**Solutions:** + +**A. Increase Capacity** +```bash +az cognitiveservices account deployment update --name --resource-group --deployment-name --sku-capacity 100 +``` + +**B. Add Retry Logic** - Exponential backoff in code + +**C. Load Balance** +```bash +az cognitiveservices account deployment create --name --resource-group --deployment-name gpt-4o-2 \ + --model-name gpt-4o --model-version "2024-05-13" --model-format OpenAI --sku-name Standard --sku-capacity 100 +``` + +**D. Migrate to PTU** - No rate limits + +--- + +## Workflow 9: Resolve DeploymentLimitReached + +**Root Cause:** 10-20 slots per resource. + +**Check Count:** +```bash +deployment_count=$(az cognitiveservices account deployment list --name --resource-group --query "length(@)") +echo "Deployments: $deployment_count / ~20 slots" +``` + +**Find Test Deployments:** +```bash +az cognitiveservices account deployment list --name --resource-group \ + --query "[?contains(name,'test') || contains(name,'demo')].{Name:name}" -o table +``` + +**Delete:** +```bash +az cognitiveservices account deployment delete --name --resource-group --deployment-name +``` + +**Or Create New Resource (fresh 10-20 slots):** +```bash +az cognitiveservices account create --name "my-foundry-2" --resource-group --location eastus --kind AIServices --sku S0 --yes +``` + +--- + +## Workflow 10: Resolve InsufficientQuota + +**Root Cause:** Requested capacity exceeds available quota. + +**Check Quota:** +```bash +subId=$(az account show --query id -o tsv) +az rest --method get --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table +``` + +**Solutions:** + +**A. Reduce Capacity** +```bash +az cognitiveservices account deployment create --name --resource-group --deployment-name gpt-4o \ + --model-name gpt-4o --model-version "2024-05-13" --model-format OpenAI --sku-name Standard --sku-capacity 20 +``` + +**B. Delete Unused Deployments** +```bash +az cognitiveservices account deployment delete --name --resource-group --deployment-name +``` + +**C. Different Region** - Check quota with multi-region script (Workflow 7) + +**D. Request Increase (3-5 days)** + +--- + +## Workflow 11: Resolve QuotaExceeded + +**Root Cause:** Deployment exceeds regional quota. + +**Check Quota:** +```bash +subId=$(az account show --query id -o tsv) +az rest --method get --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')]" -o table +``` + +**Multi-Region Check:** (Use Workflow 7 script) + +**Solutions:** + +**A. Delete Unused Deployments** +```bash +az cognitiveservices account deployment delete --name --resource-group --deployment-name +``` + +**B. Different Region** +```bash +az cognitiveservices account deployment create --name --resource-group --deployment-name gpt-4o \ + --model-name gpt-4o --model-version "2024-05-13" --model-format OpenAI --sku-name Standard --sku-capacity 50 +``` + +**C. Request Increase (3-5 days)** + +**D. Reduce Capacity** + +**Decision:** Available < 10% → Different region; 10-50% → Delete/reduce; > 50% → Delete one deployment + +--- + diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/optimization.md b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/optimization.md new file mode 100644 index 00000000..ea4dbd12 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/optimization.md @@ -0,0 +1,168 @@ +# Quota Optimization Strategies + +Comprehensive strategies for optimizing Azure AI Foundry quota allocation and reducing costs. + +**Table of Contents:** [1. Identify and Delete Unused Deployments](#1-identify-and-delete-unused-deployments) · [2. Right-Size Over-Provisioned Deployments](#2-right-size-over-provisioned-deployments) · [3. Consolidate Multiple Small Deployments](#3-consolidate-multiple-small-deployments) · [4. Cost Optimization Strategies](#4-cost-optimization-strategies) · [5. Regional Quota Rebalancing](#5-regional-quota-rebalancing) + +## 1. Identify and Delete Unused Deployments + +**Step 1: Discovery with Quota Context** + +Get quota limits FIRST to understand how close you are to capacity: + +```bash +# Check current quota usage vs limits (run this FIRST) +subId=$(az account show --query id -o tsv) +region="eastus" # Change to your region +az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:'(Limit - Used)'}" -o table +``` + +**Step 2: Parallel Deployment Enumeration** + +List all deployments across resources efficiently: + +```bash +# Get all Foundry resources +resources=$(az cognitiveservices account list --query "[?kind=='AIServices'].{name:name,rg:resourceGroup}" -o json) + +# Parallel deployment enumeration (faster than sequential) +echo "$resources" | jq -r '.[] | "\(.name) \(.rg)"' | while read name rg; do + echo "=== $name ($rg) ===" + az cognitiveservices account deployment list --name "$name" --resource-group "$rg" \ + --query "[].{Deployment:name,Model:properties.model.name,Capacity:sku.capacity,Created:systemData.createdAt}" -o table & +done +wait # Wait for all background jobs to complete +``` + +**Step 3: Identify Stale Deployments** + +Criteria for deletion candidates: + +- **Test/temporary naming**: Contains "test", "demo", "temp", "dev" in deployment name +- **Old timestamps**: Created >90 days ago with timestamp-based naming (e.g., "gpt4-20231015") +- **High capacity consumers**: Deployments with >100K TPM capacity that haven't been referenced in recent logs +- **Duplicate models**: Multiple deployments of same model/version in same region + +**Example pattern matching for stale deployments:** +```bash +# Find deployments with test/temp naming +az cognitiveservices account deployment list --name --resource-group \ + --query "[?contains(name,'test') || contains(name,'demo') || contains(name,'temp')].{Name:name,Capacity:sku.capacity}" -o table +``` + +**Step 4: Delete and Verify Quota Recovery** + +```bash +# Delete unused deployment (quota freed IMMEDIATELY) +az cognitiveservices account deployment delete --name --resource-group --deployment-name + +# Verify quota freed (re-run Step 1 quota check) +# You should see "Used" decrease by the deployment's capacity +``` + +**Cost Impact Analysis:** + +| Deployment Type | Capacity (TPM) | Quota Freed | Cost Impact (TPM) | Cost Impact (PTU) | +|-----------------|----------------|-------------|-------------------|-------------------| +| Test deployment | 10K TPM | 10K TPM | $0 (pay-per-use) | N/A | +| Unused production | 100K TPM | 100K TPM | $0 (pay-per-use) | N/A | +| Abandoned PTU deployment | 100 PTU | ~40K TPM equivalent | $0 TPM | **$3,650/month saved** (100 PTU × 730h × $0.05/h) | +| High-capacity test | 450K TPM | 450K TPM | $0 (pay-per-use) | N/A | + +**Key Insight:** For TPM (Standard) deployments, deletion frees quota but has no direct cost impact (you pay per token used). For PTU (Provisioned) deployments, deletion **immediately stops hourly charges** and can save thousands per month. + +--- + +## 2. Right-Size Over-Provisioned Deployments + +**Identify over-provisioned deployments:** +- Check Azure Monitor metrics for actual token usage +- Compare allocated TPM vs. peak usage +- Look for deployments with <50% utilization + +**Right-sizing example:** +```bash +# Update deployment to lower capacity +az cognitiveservices account deployment update --name --resource-group \ + --deployment-name --sku-capacity 30 # Reduce from 50K to 30K TPM +``` + +**Cost Optimization:** +- **TPM (Standard)**: Reduces regional quota consumption (no direct cost savings, pay-per-token) +- **PTU (Provisioned)**: Direct cost reduction (40% capacity reduction = 40% cost reduction) + +--- + +## 3. Consolidate Multiple Small Deployments + +**Pattern:** Multiple 10K TPM deployments → One 30-50K TPM deployment + +**Benefits:** +- Fewer deployment slots consumed +- Simpler management +- Same total capacity, better utilization + +**Example:** +- **Before**: 3 deployments @ 10K TPM each = 30K TPM total, 3 slots used +- **After**: 1 deployment @ 30K TPM = 30K TPM total, 1 slot used +- **Savings**: 2 deployment slots freed for other models + +--- + +## 4. Cost Optimization Strategies + +> **Official Documentation**: [Plan to manage costs for Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/manage-costs) and [Fine-tuning cost management](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/fine-tuning-cost-management) + +**A. Use Fine-Tuned Smaller Models** (from [Microsoft Transparency Note](https://learn.microsoft.com/en-us/azure/ai-foundry/responsible-ai/openai/transparency-note)): + +You can reduce costs or latency by swapping a fine-tuned version of a smaller/faster model (e.g., fine-tuned GPT-3.5-Turbo) for a more general-purpose model (e.g., GPT-4). + +```bash +# Deploy fine-tuned GPT-3.5 Turbo as cost-effective alternative to GPT-4 +az cognitiveservices account deployment create --name --resource-group \ + --deployment-name gpt-35-tuned --model-name \ + --model-format OpenAI --sku-name Standard --sku-capacity 10 +``` + +**B. Remove Unused Fine-Tuned Deployments** (from [Fine-tuning cost management](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/fine-tuning-cost-management)): + +Fine-tuned model deployments incur **hourly hosting costs** even when not in use. Remove unused deployments promptly to control costs. + +- Inactive deployments unused for **15 consecutive days** are automatically deleted +- Proactively delete unused fine-tuned deployments to avoid hourly charges + +```bash +# Delete unused fine-tuned deployment +az cognitiveservices account deployment delete --name --resource-group \ + --deployment-name +``` + +**C. Batch Multiple Requests** (from [Cost optimization Q&A](https://learn.microsoft.com/en-us/answers/questions/1689253/how-to-optimize-costs-per-request-azure-openai-gpt)): + +Batch multiple requests together to reduce the total number of API calls and lower overall costs. + +**D. Use Commitment Tiers for Predictable Costs** (from [Managing costs guide](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/manage-costs)): + +- **Pay-as-you-go**: Bills according to usage (variable costs) +- **Commitment tiers**: Commit to using service features for a fixed fee (predictable costs, potential savings for consistent usage) + +--- + +## 5. Regional Quota Rebalancing + +If you have quota spread across multiple regions but only use some: + +```bash +# Check quota across regions +for region in eastus westus uksouth; do + echo "=== $region ===" + subId=$(az account show --query id -o tsv) + az rest --method get \ + --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \ + --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table +done +``` + +**Optimization:** Concentrate deployments in fewer regions to maximize quota utilization per region. diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/ptu-guide.md b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/ptu-guide.md index f6d20e6f..6dec8b8d 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/ptu-guide.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/ptu-guide.md @@ -1,5 +1,7 @@ # Provisioned Throughput Units (PTU) Guide +**Table of Contents:** [Understanding PTU vs Standard TPM](#understanding-ptu-vs-standard-tpm) · [When to Use PTU](#when-to-use-ptu) · [PTU Capacity Planning](#ptu-capacity-planning) · [Deploy Model with PTU](#deploy-model-with-ptu) · [Request PTU Quota Increase](#request-ptu-quota-increase) · [Understanding Region and Deployment Quotas](#understanding-region-and-deployment-quotas) · [External Resources](#external-resources) + ## Understanding PTU vs Standard TPM Microsoft Foundry offers two quota types: diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/troubleshooting.md b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/troubleshooting.md index 6267bf71..8f88752c 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/troubleshooting.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/troubleshooting.md @@ -1,5 +1,7 @@ # Troubleshooting Quota Errors +**Table of Contents:** [Common Quota Errors](#common-quota-errors) · [Detailed Error Resolution](#detailed-error-resolution) · [Request Quota Increase Process](#request-quota-increase-process) · [Diagnostic Commands](#diagnostic-commands) · [External Resources](#external-resources) + ## Common Quota Errors | Error | Cause | Quick Fix | diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/workflows.md b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/workflows.md index 4aff342b..74ef6319 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/workflows.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/quota/references/workflows.md @@ -1,5 +1,7 @@ # Detailed Workflows: Quota Management +**Table of Contents:** [Workflow 1: View Current Quota Usage](#workflow-1-view-current-quota-usage---detailed-steps) · [Workflow 2: Find Best Region for Model Deployment](#workflow-2-find-best-region-for-model-deployment---detailed-steps) · [Workflow 3: Check Quota Before Deployment](#workflow-3-check-quota-before-deployment---detailed-steps) · [Workflow 4: Monitor Quota Across Deployments](#workflow-4-monitor-quota-across-deployments---detailed-steps) · [Quick Command Reference](#quick-command-reference) · [MCP Tools Reference](#mcp-tools-reference-optional-wrappers) + ## Workflow 1: View Current Quota Usage - Detailed Steps ### Step 1: Show Regional Quota Summary (REQUIRED APPROACH) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/references/auth-best-practices.md b/.github/plugins/azure-skills/skills/microsoft-foundry/references/auth-best-practices.md new file mode 100644 index 00000000..a2ca1976 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/references/auth-best-practices.md @@ -0,0 +1,130 @@ +# Azure Authentication Best Practices + +> Source: [Microsoft — Passwordless connections for Azure services](https://learn.microsoft.com/azure/developer/intro/passwordless-overview) and [Azure Identity client libraries](https://learn.microsoft.com/dotnet/azure/sdk/authentication/). + +**Table of Contents:** [Golden Rule](#golden-rule) · [Authentication by Environment](#authentication-by-environment) · [Why Not DefaultAzureCredential in Production?](#why-not-defaultazurecredential-in-production) · [Production Patterns](#production-patterns) · [Local Development Setup](#local-development-setup) · [Environment-Aware Pattern](#environment-aware-pattern) · [Security Checklist](#security-checklist) · [Further Reading](#further-reading) + +## Golden Rule + +Use **managed identities** and **Azure RBAC** in production. Reserve `DefaultAzureCredential` for **local development only**. + +## Authentication by Environment + +| Environment | Recommended Credential | Why | +|---|---|---| +| **Production (Azure-hosted)** | `ManagedIdentityCredential` (system- or user-assigned) | No secrets to manage; auto-rotated by Azure | +| **Production (on-premises)** | `ClientCertificateCredential` or `WorkloadIdentityCredential` | Deterministic; no fallback chain overhead | +| **CI/CD pipelines** | `AzurePipelinesCredential` / `WorkloadIdentityCredential` | Scoped to pipeline identity | +| **Local development** | `DefaultAzureCredential` | Chains CLI, PowerShell, and VS Code credentials for convenience | + +## Why Not `DefaultAzureCredential` in Production? + +1. **Unpredictable fallback chain** — walks through multiple credential types, adding latency and making failures harder to diagnose. +2. **Broad surface area** — checks environment variables, CLI tokens, and other sources that should not exist in production. +3. **Non-deterministic** — which credential actually authenticates depends on the environment, making behavior inconsistent across deployments. +4. **Performance** — each failed credential attempt adds network round-trips before falling back to the next. + +## Production Patterns + +### .NET + +```csharp +using Azure.Identity; + +var credential = Environment.GetEnvironmentVariable("AZURE_FUNCTIONS_ENVIRONMENT") == "Development" + ? new DefaultAzureCredential() // local dev — uses CLI/VS credentials + : new ManagedIdentityCredential(); // production — deterministic, no fallback chain +// For user-assigned identity: new ManagedIdentityCredential("") +``` + +### TypeScript / JavaScript + +```typescript +import { DefaultAzureCredential, ManagedIdentityCredential } from "@azure/identity"; + +const credential = process.env.NODE_ENV === "development" + ? new DefaultAzureCredential() // local dev — uses CLI/VS credentials + : new ManagedIdentityCredential(); // production — deterministic, no fallback chain +// For user-assigned identity: new ManagedIdentityCredential("") +``` + +### Python + +```python +import os +from azure.identity import DefaultAzureCredential, ManagedIdentityCredential + +credential = ( + DefaultAzureCredential() # local dev — uses CLI/VS credentials + if os.getenv("AZURE_FUNCTIONS_ENVIRONMENT") == "Development" + else ManagedIdentityCredential() # production — deterministic, no fallback chain +) +# For user-assigned identity: ManagedIdentityCredential(client_id="") +``` + +### Java + +```java +import com.azure.identity.DefaultAzureCredentialBuilder; +import com.azure.identity.ManagedIdentityCredentialBuilder; + +var credential = "Development".equals(System.getenv("AZURE_FUNCTIONS_ENVIRONMENT")) + ? new DefaultAzureCredentialBuilder().build() // local dev — uses CLI/VS credentials + : new ManagedIdentityCredentialBuilder().build(); // production — deterministic, no fallback chain +// For user-assigned identity: new ManagedIdentityCredentialBuilder().clientId("").build() +``` + +## Local Development Setup + +`DefaultAzureCredential` is ideal for local dev because it automatically picks up credentials from developer tools: + +1. **Azure CLI** — `az login` +2. **Azure Developer CLI** — `azd auth login` +3. **Azure PowerShell** — `Connect-AzAccount` +4. **Visual Studio / VS Code** — sign in via Azure extension + +```typescript +import { DefaultAzureCredential } from "@azure/identity"; + +// Local development only — uses CLI/PowerShell/VS Code credentials +const credential = new DefaultAzureCredential(); +``` + +## Environment-Aware Pattern + +Detect the runtime environment and select the appropriate credential. The key principle: use `DefaultAzureCredential` only when running locally, and a specific credential in production. + +> **Tip:** Azure Functions sets `AZURE_FUNCTIONS_ENVIRONMENT` to `"Development"` when running locally. For App Service or containers, use any environment variable you control (e.g. `NODE_ENV`, `ASPNETCORE_ENVIRONMENT`). + +```typescript +import { DefaultAzureCredential, ManagedIdentityCredential } from "@azure/identity"; + +function getCredential() { + if (process.env.NODE_ENV === "development") { + return new DefaultAzureCredential(); // picks up az login / VS Code creds + } + return process.env.AZURE_CLIENT_ID + ? new ManagedIdentityCredential(process.env.AZURE_CLIENT_ID) // user-assigned + : new ManagedIdentityCredential(); // system-assigned +} +``` + +## Security Checklist + +- [ ] Use managed identity for all Azure-hosted apps +- [ ] Never hardcode credentials, connection strings, or keys +- [ ] Apply least-privilege RBAC roles at the narrowest scope +- [ ] Use `ManagedIdentityCredential` (not `DefaultAzureCredential`) in production +- [ ] Store any required secrets in Azure Key Vault +- [ ] Rotate secrets and certificates on a schedule +- [ ] Enable Microsoft Defender for Cloud on production resources + +## Further Reading + +- [Passwordless connections overview](https://learn.microsoft.com/azure/developer/intro/passwordless-overview) +- [Managed identities overview](https://learn.microsoft.com/entra/identity/managed-identities-azure-resources/overview) +- [Azure RBAC overview](https://learn.microsoft.com/azure/role-based-access-control/overview) +- [.NET authentication guide](https://learn.microsoft.com/dotnet/azure/sdk/authentication/) +- [Python identity library](https://learn.microsoft.com/python/api/overview/azure/identity-readme) +- [JavaScript identity library](https://learn.microsoft.com/javascript/api/overview/azure/identity-readme) +- [Java identity library](https://learn.microsoft.com/java/api/overview/azure/identity-readme) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/references/docs.md b/.github/plugins/azure-skills/skills/microsoft-foundry/references/docs.md new file mode 100644 index 00000000..273245bd --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/references/docs.md @@ -0,0 +1,99 @@ +# Microsoft Foundry — Documentation Reference + +> Canonical links to official Microsoft Learn documentation for Microsoft Foundry. +> Load this reference on-demand when you need accurate, up-to-date URLs. + +--- + +## Getting Started + +| Topic | URL | +|---|---| +| What is Microsoft Foundry? | https://learn.microsoft.com/azure/foundry/what-is-foundry | +| Foundry architecture | https://learn.microsoft.com/azure/foundry/concepts/architecture | +| Quickstart: Create Foundry resources | https://learn.microsoft.com/azure/foundry/tutorials/quickstart-create-foundry-resources | +| Quickstart: Get started with code | https://learn.microsoft.com/azure/foundry/quickstarts/get-started-code | +| Prepare your development environment | https://learn.microsoft.com/azure/foundry/how-to/develop/install-cli-sdk | +| Enterprise rollout planning | https://learn.microsoft.com/azure/foundry/concepts/planning | +| Foundry portal | https://ai.azure.com | + +## Agents + +| Topic | URL | +|---|---| +| What is Foundry Agent Service? | https://learn.microsoft.com/azure/foundry/agents/overview | +| Agent runtime components | https://learn.microsoft.com/azure/foundry/agents/concepts/runtime-components | +| What are hosted agents? | https://learn.microsoft.com/azure/foundry/agents/concepts/hosted-agents | +| Agent environment setup | https://learn.microsoft.com/azure/foundry/agents/environment-setup | +| Limits, quotas, and regions | https://learn.microsoft.com/azure/foundry/agents/concepts/limits-quotas-regions | +| Tool best practices | https://learn.microsoft.com/azure/foundry/agents/concepts/tool-best-practice | +| Code Interpreter tool | https://learn.microsoft.com/azure/foundry/agents/how-to/tools/code-interpreter | +| File Search tool | https://learn.microsoft.com/azure/foundry/agents/how-to/tools/file-search | +| Azure AI Search tool | https://learn.microsoft.com/azure/foundry/agents/how-to/tools/ai-search | +| Function calling | https://learn.microsoft.com/azure/foundry/agents/how-to/tools/function-calling | +| Workflow agents | https://learn.microsoft.com/azure/foundry/agents/concepts/workflow | +| Use your own resources | https://learn.microsoft.com/azure/foundry/agents/how-to/use-your-own-resources | +| Virtual networks for agents | https://learn.microsoft.com/azure/foundry/agents/how-to/virtual-networks | +| Agent metrics and monitoring | https://learn.microsoft.com/azure/foundry/agents/how-to/metrics | +| Migrate from classic agents | https://learn.microsoft.com/azure/foundry/agents/how-to/migrate | + +## Models + +| Topic | URL | +|---|---| +| Foundry Models overview | https://learn.microsoft.com/azure/foundry/concepts/foundry-models-overview | +| Deployment types (managed compute & serverless) | https://learn.microsoft.com/azure/foundry/concepts/deployments-overview | +| Fine-tuning overview | https://learn.microsoft.com/azure/foundry/concepts/fine-tuning-overview | +| Fine-tune with serverless API | https://learn.microsoft.com/azure/foundry/how-to/fine-tune-serverless | +| Azure OpenAI in Foundry | https://learn.microsoft.com/azure/foundry/openai/how-to/chatgpt | +| Responses API quickstart | https://learn.microsoft.com/azure/ai-services/openai/chatgpt-quickstart | + +## Evaluation + +| Topic | URL | +|---|---| +| Evaluation overview | https://learn.microsoft.com/azure/foundry/evaluation/evaluate-generative-ai | +| Observability and tracing | https://learn.microsoft.com/azure/foundry/observability/concepts/trace-agent-concept | +| Azure AI Evaluation SDK (local) | https://learn.microsoft.com/azure/foundry/evaluation/evaluate-sdk | +| Built-in evaluation metrics | https://learn.microsoft.com/azure/foundry/evaluation/evaluation-metrics-built-in | +| Guardrails overview | https://learn.microsoft.com/azure/foundry/guardrails/guardrails-overview | + +## Infrastructure + +| Topic | URL | +|---|---| +| Architecture and resource model | https://learn.microsoft.com/azure/foundry/concepts/architecture | +| RBAC for Foundry | https://learn.microsoft.com/azure/foundry/concepts/rbac-foundry | +| Configure private link / networking | https://learn.microsoft.com/azure/foundry/how-to/configure-private-link | +| Add connections to external services | https://learn.microsoft.com/azure/foundry/how-to/connections-add | +| Manage costs | https://learn.microsoft.com/azure/foundry/concepts/manage-costs | +| Encryption with customer-managed keys | https://learn.microsoft.com/azure/foundry/concepts/encryption-keys-portal | +| Responsible AI overview | https://learn.microsoft.com/azure/foundry/responsible-use-of-ai-overview | +| Foundry project reference | https://learn.microsoft.com/azure/foundry/reference/foundry-project | + +## SDKs + +| Topic | URL | +|---|---| +| SDK overview and setup | https://learn.microsoft.com/azure/foundry/how-to/develop/sdk-overview | +| **Python** — `azure-ai-projects` | https://pypi.org/project/azure-ai-projects/ | +| **JavaScript / TypeScript** — `@azure/ai-projects` | https://www.npmjs.com/package/@azure/ai-projects | +| **C# / .NET** — `Azure.AI.Projects` | https://www.nuget.org/packages/Azure.AI.Projects | +| **Java** — `com.azure:azure-ai-agents` | https://central.sonatype.com/artifact/com.azure/azure-ai-agents | +| Azure AI Inference SDK (multi-language) | https://learn.microsoft.com/azure/foundry/foundry-models/supported-languages | +| REST API reference | https://learn.microsoft.com/rest/api/aifoundry/ | + +## Foundry Local + +| Topic | URL | +|---|---| +| What is Foundry Local? | https://learn.microsoft.com/azure/foundry-local/what-is-foundry-local | +| Get started with Foundry Local | https://learn.microsoft.com/windows/ai/foundry-local/get-started | +| Architecture | https://learn.microsoft.com/azure/foundry-local/concepts/foundry-local-architecture | +| SDK reference | https://learn.microsoft.com/azure/foundry-local/reference/reference-sdk | +| Best practices and troubleshooting | https://learn.microsoft.com/azure/foundry-local/reference/reference-best-practice | +| Cloud vs. local model guidance | https://learn.microsoft.com/azure/foundry-local/concepts/cloud-vs-local | + +--- + +*Links target the current (non-classic) Foundry documentation. For legacy hub-based project docs, replace `/azure/foundry/` with `/azure/foundry-classic/` in the URL path.* diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/references/private-network-standard-agent-setup.md b/.github/plugins/azure-skills/skills/microsoft-foundry/references/private-network-standard-agent-setup.md new file mode 100644 index 00000000..9f77f225 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/references/private-network-standard-agent-setup.md @@ -0,0 +1,40 @@ +# Private Network Standard Agent Setup + +> **MANDATORY:** Read [Standard Agent Setup with Network Isolation docs](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/configure-private-link?tabs=azure-portal&pivots=fdp-project) before proceeding. It covers RBAC requirements, resource provider registration, and role assignments. + +## Overview + +Extends [standard agent setup](standard-agent-setup.md) with full VNet isolation using private endpoints and subnet delegation. All resources communicate over private network only. + +## Networking Constraints + +Two subnets required: + +| Subnet | CIDR | Purpose | Delegation | +|--------|------|---------|------------| +| Agent Subnet | /24 (e.g., 192.168.0.0/24) | Agent workloads | `Microsoft.App/environments` (exclusive) | +| Private Endpoint Subnet | /24 (e.g., 192.168.1.0/24) | Private endpoints | None | + +- All Foundry resources **must be in the same region as the VNet**. +- Agent subnet must be exclusive to one Foundry account. +- VNet address space must not overlap with existing networks or reserved ranges. + +> ⚠️ **Warning:** If providing an existing VNet, ensure both subnets exist before deployment. Otherwise the template creates a new VNet with default address spaces. + +## Deployment + +**Always use the official Bicep template:** +[Private Network Standard Agent Setup Bicep](https://github.com/microsoft-foundry/foundry-samples/tree/main/infrastructure/infrastructure-setup-bicep/15-private-network-standard-agent-setup) + +> ⚠️ **Warning:** Capability host provisioning is **asynchronous** (10–20 minutes). Poll deployment status until success before proceeding. + +## Post-Deployment + +1. **Deploy a model** to the new AI Services account (e.g., `gpt-4o`). Fall back to `Standard` SKU if `GlobalStandard` quota is exhausted. +2. **Create the agent** using MCP tools (`agent_update`) or the Python SDK. + +## References + +- [Azure AI Foundry Networking](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/configure-private-link?tabs=azure-portal&pivots=fdp-project) +- [Azure AI Foundry RBAC](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/rbac-azure-ai-foundry?pivots=fdp-project) +- [Standard Agent Setup (public network)](standard-agent-setup.md) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/references/sdk/foundry-sdk-py.md b/.github/plugins/azure-skills/skills/microsoft-foundry/references/sdk/foundry-sdk-py.md index ba798635..9d4d2030 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/references/sdk/foundry-sdk-py.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/references/sdk/foundry-sdk-py.md @@ -2,6 +2,8 @@ Python-specific implementations for working with Microsoft Foundry. +**Table of Contents:** [Prerequisites](#prerequisites) · [Model Discovery and Deployment](#model-discovery-and-deployment-mcp) · [RAG Agent with Azure AI Search](#rag-agent-with-azure-ai-search) · [Creating Agents](#creating-agents) · [Agent Evaluation](#agent-evaluation) · [Knowledge Index Operations](#knowledge-index-operations-mcp) · [Best Practices](#best-practices) · [Error Handling](#error-handling) + ## Prerequisites ```bash @@ -36,6 +38,8 @@ foundry_models_deploy( ## RAG Agent with Azure AI Search +> **Auth:** `DefaultAzureCredential` is for local development. See [auth-best-practices.md](../auth-best-practices.md) for production patterns. + ```python import os from azure.ai.projects import AIProjectClient @@ -126,16 +130,28 @@ agent = project_client.agents.create_agent( ### Agent with Web Search ```python -from azure.ai.agents.models import BingGroundingToolDefinition +from azure.ai.projects.models import ( + PromptAgentDefinition, WebSearchPreviewTool, ApproximateLocation, +) -agent = project_client.agents.create_agent( - model=os.environ["MODEL_DEPLOYMENT_NAME"], - name="WebSearchAgent", - instructions="Search the web for current information. Provide sources.", - tools=[BingGroundingToolDefinition()], +agent = project_client.agents.create_version( + agent_name="WebSearchAgent", + definition=PromptAgentDefinition( + model=os.environ["MODEL_DEPLOYMENT_NAME"], + instructions="Search the web for current information. Provide sources.", + tools=[ + WebSearchPreviewTool( + user_location=ApproximateLocation( + country="US", city="Seattle", region="Washington" + ) + ) + ], + ), ) ``` +> 💡 **Tip:** `WebSearchPreviewTool` requires no external resource or connection. For Bing Grounding (which requires a dedicated Bing resource and project connection), see [Bing Grounding reference](../../agent/create/references/tool-bing-grounding.md). + ### Interacting with Agents ```python diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/references/standard-agent-setup.md b/.github/plugins/azure-skills/skills/microsoft-foundry/references/standard-agent-setup.md new file mode 100644 index 00000000..ccdfbfb4 --- /dev/null +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/references/standard-agent-setup.md @@ -0,0 +1,51 @@ +# Standard Agent Setup + +> **MANDATORY:** Read [Standard Agent Setup docs](https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/standard-agent-setup?view=foundry) before proceeding with standard setup. + +## Overview + +Azure AI Foundry supports two agent setup configurations: + +| Setup | Capability Host | Description | +|-------|----------------|-------------| +| **Basic** | None | Default setup. All resources are Microsoft-managed. No additional connections required. | +| **Standard** | Azure AI Services | Advanced setup. Bring-your-own storage and search connections for full control over data residency and scaling. | + +## Standard Setup Connections + +| Connection | Service | Required | Purpose | +|------------|---------|----------|---------| +| Thread storage | Azure Cosmos DB | ✅ Yes | Store conversation threads in your own Cosmos DB instance | +| File storage | Azure Storage | ✅ Yes | Store uploaded files in your own Azure Storage account | +| Vector store | Azure AI Search | ✅ Yes | Use your own Azure AI Search instance for vector/knowledge retrieval | +| Azure AI Services | Azure AI Services | ❌ Optional | Use OpenAI models from a different AI Services resource | + +> 💡 **Tip:** Standard setup is recommended for production workloads that require control over data storage, custom vector search, or integration with models from a separate AI Services resource. + +## Prerequisites + +Before starting deployment, confirm the following with the user: + +1. **RBAC role on the resource group:** The user must have **Owner** or **User Access Administrator** role on the target resource group. The Bicep template assigns RBAC roles (Storage Blob Data Contributor, Cosmos DB Operator, AI Search roles) to the project's managed identity — this will fail without `Microsoft.Authorization/roleAssignments/write` permission. +2. **Subscription quota:** Verify the target region has available quota for AI Services. If quota is exhausted, try an alternate region (e.g., `swedencentral`, `eastus`, `westus3`). +3. **Azure Policy compliance:** Some subscriptions enforce policies (e.g., storage accounts must disable public network access). If the Bicep template fails due to policy violations, patch the template to comply (e.g., set `publicNetworkAccess: 'Disabled'` and `defaultAction: 'Deny'` on the storage account). + +## Deployment + +- Standard setup always creates a **new Foundry resource and a new project**. Do not ask the user for a project endpoint — one will be provisioned as part of the deployment. +- **Always use the official Bicep template:** + [Standard Agent Setup Bicep Template](https://github.com/azure-ai-foundry/foundry-samples/blob/main/infrastructure/infrastructure-setup-bicep/43-standard-agent-setup-with-customization/main.bicep) + +> ⚠️ **Warning:** Capability host provisioning is **asynchronous** and can take 10–20 minutes. After deploying the Bicep template, you **must poll** the deployment status until it succeeds. Do not assume the setup is complete immediately. + +## Post-Deployment: Model & Agent + +After infrastructure provisioning succeeds: + +1. **Deploy a model** to the new AI Services account (e.g., `gpt-4o`). If `GlobalStandard` SKU quota is exhausted, fall back to `Standard` SKU. +2. **Create the agent** using MCP tools (`agent_update`) or the Python SDK (`client.agents.create_version`). See [SDK Operations](../agent/create/references/sdk-operations.md) for details. + +## References + +- [Capability Hosts — Agent Setup Types](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/capability-hosts?view=foundry) +- [Standard Agent Setup](https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/standard-agent-setup?view=foundry) diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/resource/create/create-foundry-resource.md b/.github/plugins/azure-skills/skills/microsoft-foundry/resource/create/create-foundry-resource.md index dbd0f658..c143149d 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/resource/create/create-foundry-resource.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/resource/create/create-foundry-resource.md @@ -20,6 +20,8 @@ This sub-skill orchestrates creation of Azure AI Services multi-service resource > **Note:** For monitoring resource usage and quotas, use the `microsoft-foundry:quota` skill. +**Table of Contents:** [Quick Reference](#quick-reference) · [When to Use](#when-to-use) · [Prerequisites](#prerequisites) · [Core Workflows](#core-workflows) · [Important Notes](#important-notes) · [Additional Resources](#additional-resources) + ## Quick Reference | Property | Value | diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/resource/create/references/patterns.md b/.github/plugins/azure-skills/skills/microsoft-foundry/resource/create/references/patterns.md index e976e2b6..5c7622f7 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/resource/create/references/patterns.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/resource/create/references/patterns.md @@ -1,5 +1,7 @@ # Common Patterns: Create Foundry Resource +**Table of Contents:** [Pattern A: Quick Setup](#pattern-a-quick-setup) · [Pattern B: Multi-Region Setup](#pattern-b-multi-region-setup) · [Quick Commands Reference](#quick-commands-reference) + ## Pattern A: Quick Setup Complete setup in one go: diff --git a/.github/plugins/azure-skills/skills/microsoft-foundry/resource/create/references/workflows.md b/.github/plugins/azure-skills/skills/microsoft-foundry/resource/create/references/workflows.md index fa32fac1..a3cd8c52 100644 --- a/.github/plugins/azure-skills/skills/microsoft-foundry/resource/create/references/workflows.md +++ b/.github/plugins/azure-skills/skills/microsoft-foundry/resource/create/references/workflows.md @@ -1,5 +1,7 @@ # Detailed Workflows: Create Foundry Resource +**Table of Contents:** [Workflow 1: Create Resource Group](#workflow-1-create-resource-group---detailed-steps) · [Workflow 2: Create Foundry Resource](#workflow-2-create-foundry-resource---detailed-steps) · [Workflow 3: Register Resource Provider](#workflow-3-register-resource-provider---detailed-steps) + ## Workflow 1: Create Resource Group - Detailed Steps ### Step 1: Ask user preference