diff --git a/docs/en/Components/AgentSkills.md b/docs/en/Components/AgentSkills.md new file mode 100644 index 000000000..5459636c7 --- /dev/null +++ b/docs/en/Components/AgentSkills.md @@ -0,0 +1,516 @@ +--- +slug: AgentSkills +title: Agent Skills +description: Ms-Agent Agent Skills Module - A skill discovery, analysis, and execution framework based on the Anthropic Agent Skills protocol. +--- + +# Agent Skills + +The MS-Agent Skill Module is a powerful, extensible skill execution framework that enables LLM agents to automatically discover, analyze, and execute domain-specific skills for complex task completion. It is an implementation of the [Anthropic Agent Skills](https://docs.claude.com/en/docs/agents-and-tools/agent-skills) protocol. + +With the Skill Module, agents can handle complex tasks such as: +- "Generate a PDF report for Q4 sales data" +- "Create a presentation about AI trends with charts" +- "Convert this document to PPTX format with custom themes" + +## Key Features + +- **Intelligent Skill Retrieval**: Hybrid search combining FAISS dense retrieval with BM25 sparse retrieval, plus LLM-based relevance filtering +- **DAG Execution Engine**: Builds execution DAGs based on skill dependencies, supports parallel execution of independent skills, and automatically passes inputs/outputs between skills +- **Progressive Skill Analysis**: Two-phase analysis (plan first, then load resources), incrementally loads scripts/references/resources on demand, optimizing context window usage +- **Secure Execution Environment**: Supports isolated execution via [ms-enclave](https://github.com/modelscope/ms-enclave) Docker sandboxes, or controlled local execution +- **Self-Reflection & Retry**: LLM-based error analysis, automatic code fixes, and configurable retry attempts +- **Standard Protocol Compatibility**: Fully compatible with the [Anthropic Skills](https://github.com/anthropics/skills) protocol + +## Architecture + +### High-Level Architecture + +``` +┌─────────────────────────────────────────────────────────┐ +│ LLMAgent │ +│ ┌───────────────────────────────────────────────────┐ │ +│ │ AutoSkills │ │ +│ │ ┌──────────┐ ┌──────────┐ ┌────────────────┐ │ │ +│ │ │ Loader │ │ Retriever│ │ SkillAnalyzer │ │ │ +│ │ │ │ │ (Hybrid) │ │ (Progressive) │ │ │ +│ │ └────┬─────┘ └────┬─────┘ └───────┬────────┘ │ │ +│ │ │ │ │ │ │ +│ │ ▼ ▼ ▼ │ │ +│ │ ┌───────────────────────────────────────────────┐│ │ +│ │ │ DAGExecutor ││ │ +│ │ │ ┌────────┐ ┌────────┐ ┌────────┐ ││ │ +│ │ │ │Skill 1 │→ │Skill 2 │→ │Skill N │ ││ │ +│ │ │ └───┬────┘ └───┬────┘ └───┬────┘ ││ │ +│ │ │ └────────────┴──────────┘ ││ │ +│ │ │ ↓ ││ │ +│ │ │ SkillContainer (Execution) ││ │ +│ │ └───────────────────────────────────────────────┘│ │ +│ └───────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +### Execution Flow + +``` +User Query + │ + ▼ +┌─────────────────┐ +│ Query Analysis │ ─── Is this a skill-related query? +└────────┬────────┘ + │ Yes + ▼ +┌─────────────────┐ +│ Skill Retrieval │ ─── Hybrid search (FAISS + BM25) +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ Skill Filtering │ ─── LLM-based relevance filtering +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ DAG Building │ ─── Build dependency graph +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ Progressive │ ─── Plan → Load → Execute +│ Execution │ +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ Result │ ─── Merge outputs, format response +│ Aggregation │ +└─────────────────┘ +``` + +The skill module implements a multi-level progressive context loading mechanism: + +1. **Level 1 (Metadata)**: Loads only skill metadata (name, description) for semantic search +2. **Level 2 (Retrieval)**: Retrieves relevant skills and loads the full SKILL.md +3. **Level 3 (Resources)**: Further loads required reference materials and resource files +4. **Level 4 (Analysis|Planning|Execution)**: Analyzes skill context, autonomously creates plans and task lists, loads required resources and runs related scripts + +## Skill Directory Structure + +Each skill is a self-contained directory: + +``` +skill-name/ +├── SKILL.md # Required: Main documentation and instructions +├── META.yaml # Optional: Metadata (name, description, version, tags) +├── scripts/ # Optional: Executable scripts +│ ├── main.py +│ ├── utils.py +│ └── run.sh +├── references/ # Optional: Reference documents +│ ├── api_docs.md +│ └── examples.json +├── resources/ # Optional: Assets and resources +│ ├── template.html +│ ├── fonts/ +│ └── images/ +└── requirements.txt # Optional: Python dependencies +``` + +### SKILL.md Format + +```markdown +# Skill Name + +Brief description of what this skill does. + +## Capabilities + +- Capability 1 +- Capability 2 + +## Usage + +Instructions for using this skill... + +## Parameters + +| Parameter | Type | Description | +|-----------|------|-------------| +| input | str | Input data | +| format | str | Output format | + +## Examples + +Example usage scenarios... +``` + +### META.yaml Format + +```yaml +name: "PDF Generator" +description: "Generates professional PDF documents from markdown or data" +version: "1.0.0" +author: "Your Name" +tags: + - document + - pdf + - report +``` + +## Quick Start + +### Prerequisites + +- Python 3.9+ +- Docker (for sandbox execution, optional) +- FAISS (for skill retrieval) + +### Installation + +```bash +pip install 'ms-agent>=1.4.0' + +# Or install from source +git clone https://github.com/modelscope/ms-agent.git +cd ms-agent +pip install -e . +``` + +### Method 1: Using LLMAgent Configuration + +```python +import asyncio +from omegaconf import DictConfig +from ms_agent.agent import LLMAgent + +config = DictConfig({ + 'llm': { + 'service': 'openai', + 'model': 'gpt-4', + 'openai_api_key': 'your-api-key', + 'openai_base_url': 'https://api.openai.com/v1' + }, + 'skills': { + 'path': '/path/to/skills', + 'auto_execute': True, + 'work_dir': '/path/to/workspace', + 'use_sandbox': False, + } +}) + +agent = LLMAgent(config, tag='skill-agent') + +async def main(): + result = await agent.run('Generate a mock PDF report about AI trends') + print(result) + +asyncio.run(main()) +``` + +### Method 2: Using AutoSkills Directly + +```python +import asyncio +from ms_agent.skill import AutoSkills +from ms_agent.llm import LLM +from omegaconf import DictConfig + +llm_config = DictConfig({ + 'llm': { + 'service': 'openai', + 'model': 'gpt-4', + 'openai_api_key': 'your-api-key', + 'openai_base_url': 'https://api.openai.com/v1' + } +}) +llm = LLM.from_config(llm_config) + +auto_skills = AutoSkills( + skills='/path/to/skills', + llm=llm, + work_dir='/path/to/workspace', + use_sandbox=False, +) + +async def main(): + result = await auto_skills.run( + query='Generate a mock PDF report about AI trends' + ) + print(f"Result: {result.execution_result}") + +asyncio.run(main()) +``` + +**Parameter Reference:** + +- `skills`: Skill source, supporting the following formats: + - Single path or list of paths to local skill directories + - Single or multiple ModelScope skill repository IDs, e.g. `ms-agent/claude_skills` (see [ModelScope Hub](https://modelscope.cn/models/ms-agent/claude_skills)) + - Format: `owner/skill_name` or `owner/skill_name/subfolder` + - Single or multiple `SkillSchema` objects +- `work_dir`: Working directory for skill execution outputs +- `use_sandbox`: Whether to use Docker sandbox for execution (default `True`); set to `False` for local execution with security checks +- `auto_execute`: Whether to automatically execute skills (default `True`) + +## Configuration + +Configure the skill module in `agent.yaml`: + +```yaml +skills: + # Required: Path to skills directory or ModelScope repo ID + path: /path/to/skills + + # Optional: Whether to enable retriever (auto-detect based on skill count if omitted) + enable_retrieve: + + # Optional: Retriever arguments + retrieve_args: + top_k: 3 + min_score: 0.8 + + # Optional: Maximum candidate skills to consider (default: 10) + max_candidate_skills: 10 + + # Optional: Maximum retry attempts (default: 3) + max_retries: 3 + + # Optional: Working directory for outputs + work_dir: /path/to/workspace + + # Optional: Use Docker sandbox for execution (default: True) + use_sandbox: false + + # Optional: Auto-execute skills (default: True) + auto_execute: true +``` + +For general YAML configuration details, see [Config & Parameters](./Config). + +## Core Components + +| Component | Description | +|-----------|-------------| +| `AutoSkills` | Main entry point for skill execution, coordinating retrieval, analysis and execution | +| `SkillContainer` | Secure skill execution environment (sandbox or local) | +| `SkillAnalyzer` | Progressive skill analyzer with incremental resource loading | +| `DAGExecutor` | DAG executor with dependency management | +| `SkillLoader` | Skill loading and management | +| `Retriever` | Finds relevant skills using semantic search | +| `SkillSchema` | Skill schema definition | + +### AutoSkills + +```python +class AutoSkills: + def __init__( + self, + skills: Union[str, List[str], List[SkillSchema]], + llm: LLM, + enable_retrieve: Optional[bool] = None, + retrieve_args: Dict[str, Any] = None, + max_candidate_skills: int = 10, + max_retries: int = 3, + work_dir: Optional[str] = None, + use_sandbox: bool = True, + ): ... + + async def run(self, query: str, ...) -> SkillDAGResult: + """Execute skills for a query.""" + + async def get_skill_dag(self, query: str) -> SkillDAGResult: + """Get skill DAG without executing.""" +``` + +### SkillContainer + +```python +class SkillContainer: + def __init__( + self, + workspace_dir: Optional[Path] = None, + use_sandbox: bool = True, + timeout: int = 300, + memory_limit: str = "2g", + enable_security_check: bool = True, + ): ... + + async def execute_python_code(self, code: str, ...) -> ExecutionOutput: + """Execute Python code.""" + + async def execute_shell(self, command: str, ...) -> ExecutionOutput: + """Execute shell command.""" +``` + +### SkillAnalyzer + +```python +class SkillAnalyzer: + def __init__(self, llm: LLM): ... + + def analyze_skill_plan(self, skill: SkillSchema, query: str) -> SkillContext: + """Phase 1: Analyze skill and create execution plan.""" + + def load_skill_resources(self, context: SkillContext) -> SkillContext: + """Phase 2: Load resources based on plan.""" + + def generate_execution_commands(self, context: SkillContext) -> List[Dict]: + """Phase 3: Generate execution commands.""" +``` + +### DAGExecutor + +```python +class DAGExecutor: + def __init__( + self, + container: SkillContainer, + skills: Dict[str, SkillSchema], + llm: LLM = None, + enable_progressive_analysis: bool = True, + enable_self_reflection: bool = True, + max_retries: int = 3, + ): ... + + async def execute( + self, + dag: Dict[str, List[str]], + execution_order: List[Union[str, List[str]]], + stop_on_failure: bool = True, + query: str = '', + ) -> DAGExecutionResult: + """Execute the skill DAG.""" +``` + +## Usage Examples + +### Example 1: PDF Report Generation + +```python +import asyncio +from ms_agent.skill import AutoSkills +from ms_agent.llm import LLM + +async def generate_pdf_report(): + llm = LLM.from_config(config) + auto_skills = AutoSkills( + skills='/path/to/skills', + llm=llm, + work_dir='/tmp/reports' + ) + + result = await auto_skills.run( + query='Generate a PDF report analyzing Q4 2024 sales data with charts' + ) + + if result.execution_result and result.execution_result.success: + for skill_id, skill_result in result.execution_result.results.items(): + if skill_result.output.output_files: + print(f"Generated files: {skill_result.output.output_files}") + +asyncio.run(generate_pdf_report()) +``` + +### Example 2: Multi-Skill Pipeline + +```python +async def create_presentation(): + auto_skills = AutoSkills( + skills='/path/to/skills', + llm=llm, + work_dir='/tmp/presentation' + ) + + # This query might trigger multiple skills working together: + # 1. data-analysis skill to process data + # 2. chart-generator skill to create visualizations + # 3. pptx skill to create the presentation + result = await auto_skills.run( + query='Create a presentation about AI market trends with data visualizations' + ) + + print(f"Execution order: {result.execution_order}") + + for skill_id in result.execution_order: + if isinstance(skill_id, str): + context = auto_skills.get_skill_context(skill_id) + if context and context.plan: + print(f"{skill_id}: {context.plan.plan_summary}") + +asyncio.run(create_presentation()) +``` + +### Example 3: Custom Input Execution + +```python +from ms_agent.skill.container import ExecutionInput + +async def execute_with_custom_input(): + auto_skills = AutoSkills( + skills='/path/to/skills', + llm=llm, + work_dir='/tmp/custom' + ) + + dag_result = await auto_skills.get_skill_dag( + query='Convert my document to PDF' + ) + + custom_input = ExecutionInput( + input_files={'document.md': '/path/to/my/document.md'}, + env_vars={'OUTPUT_FORMAT': 'A4', 'MARGINS': '1in'} + ) + + exec_result = await auto_skills.execute_dag( + dag_result=dag_result, + execution_input=custom_input, + query='Convert my document to PDF' + ) + + print(f"Success: {exec_result.success}") + +asyncio.run(execute_with_custom_input()) +``` + +## Security + +### Sandbox Execution (Recommended) + +When `use_sandbox=True`, skills run in isolated Docker containers with: +- Network isolation (configurable) +- Filesystem isolation (only workspace directory mounted) +- Resource limits (memory, CPU) +- No access to host system +- Automatic installation of skill-declared dependencies + +### Local Execution + +When `use_sandbox=False`, security is enforced through: +- Pattern-based scanning for dangerous code +- Restricted file system access +- Environment variable sanitization + +> Make sure you trust the skill scripts before executing them to avoid potential security risks. For local execution, ensure all required dependencies are installed in your Python environment. + +## Creating Custom Skills + +1. Create a new subdirectory under your skills path +2. Add a `SKILL.md` file with documentation and instructions +3. Add a `META.yaml` file with metadata +4. Add scripts, references, and resources as needed +5. Test with `AutoSkills.get_skill_dag()` to verify the skill can be retrieved correctly + +### Best Practices + +- Write clear, comprehensive `SKILL.md` that fully describes the skill's capabilities, usage, and parameters +- Explicitly declare all dependencies in `requirements.txt` +- Keep skills self-contained by packaging all necessary resources within the directory +- Handle errors gracefully in scripts +- Use the `SKILL_OUTPUT_DIR` environment variable to specify output directories + +## References + +- [Anthropic Agent Skills Documentation](https://docs.claude.com/en/docs/agents-and-tools/agent-skills) +- [Anthropic Skills GitHub Repository](https://github.com/anthropics/skills) +- [MS-Agent Skill Examples](https://modelscope.cn/models/ms-agent/skill_examples) diff --git a/docs/en/Components/Config.md b/docs/en/Components/Config.md index e65c1f26e..06f4d130a 100644 --- a/docs/en/Components/Config.md +++ b/docs/en/Components/Config.md @@ -106,6 +106,24 @@ tools: For the complete list of supported tools and custom tools, please refer to [here](./Tools.md) +## Skills Configuration + +> Optional, used when enabling Agent Skills + +```yaml +skills: + # Path to skills directory or ModelScope repo ID + path: /path/to/skills + # Whether to auto-execute skills (default: True) + auto_execute: true + # Working directory for outputs + work_dir: /path/to/workspace + # Whether to use Docker sandbox for execution (default: True) + use_sandbox: false +``` + +For the complete skill module documentation (including architecture, directory structure, API reference, and security mechanisms), see [Agent Skills](./AgentSkills). + ## Memory Compression Configuration > Optional, for context management in long conversations diff --git a/docs/en/Components/MultimodalSupport.md b/docs/en/Components/MultimodalSupport.md new file mode 100644 index 000000000..494a48b65 --- /dev/null +++ b/docs/en/Components/MultimodalSupport.md @@ -0,0 +1,299 @@ +--- +slug: MultimodalSupport +title: Multimodal Support +description: Ms-Agent multimodal conversation guide - image understanding and analysis configuration and usage. +--- + +# Multimodal Support + +This document describes how to use ms-agent for multimodal conversations, including image understanding and analysis capabilities. + +## Overview + +ms-agent supports multimodal models such as Alibaba Cloud's `qwen3.5-plus`. Multimodal models can: +- Analyze image content +- Recognize objects, scenes, and text in images +- Engage in conversations based on image content + +## Prerequisites + +### 1. Install Dependencies + +Ensure the required packages are installed: + +```bash +pip install openai +``` + +### 2. Configure API Key + +(Using qwen3.5-plus as an example) Obtain a DashScope API Key and set the environment variable: + +```bash +export DASHSCOPE_API_KEY='your-dashscope-api-key' +``` + +Or set `dashscope_api_key` directly in the configuration file. + +## Configure Multimodal Models + +Multimodal functionality depends on two factors: +1. **Choose a model that supports multimodal input** (e.g. `qwen3.5-plus`) +2. **Use the correct message format** (containing `image_url` blocks) + +You can dynamically modify the model configuration in code on top of an existing config: + +```python +from ms_agent.config import Config +from ms_agent import LLMAgent +import os + +# Use an existing configuration file (e.g. ms_agent/agent/agent.yaml) +config = Config.from_task('ms_agent/agent/agent.yaml') + +# Override configuration for multimodal model +config.llm.model = 'qwen3.5-plus' +config.llm.service = 'dashscope' +config.llm.dashscope_api_key = os.environ.get('DASHSCOPE_API_KEY', '') +config.llm.modelscope_base_url = 'https://dashscope.aliyuncs.com/compatible-mode/v1' + +# Create LLMAgent +agent = LLMAgent(config=config) +``` + +## Using LLMAgent for Multimodal Conversations + +Using `LLMAgent` for multimodal conversations is recommended, as it provides more complete features including memory management, tool calling, and callback support. + +### Basic Usage + +```python +import asyncio +import os +from ms_agent import LLMAgent +from ms_agent.config import Config +from ms_agent.llm.utils import Message + +async def multimodal_chat(): + # Create configuration + config = Config.from_task('ms_agent/agent/agent.yaml') + config.llm.model = 'qwen3.5-plus' + config.llm.service = 'dashscope' + config.llm.dashscope_api_key = os.environ.get('DASHSCOPE_API_KEY', '') + config.llm.modelscope_base_url = 'https://dashscope.aliyuncs.com/compatible-mode/v1' + + # Create LLMAgent + agent = LLMAgent(config=config) + + # Build multimodal message + multimodal_content = [ + {"type": "text", "text": "Please describe this image."}, + {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}} + ] + + # Call the agent + response = await agent.run(messages=[Message(role="user", content=multimodal_content)]) + print(response[-1].content) + +asyncio.run(multimodal_chat()) +``` + +### Non-Stream Mode + +```python +# Disable stream in configuration +config.generation_config.stream = False + +agent = LLMAgent(config=config) + +multimodal_content = [ + {"type": "text", "text": "Please describe this image."}, + {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}} +] + +# Non-stream mode: returns complete response directly +response = await agent.run(messages=[Message(role="user", content=multimodal_content)]) +print(f"[Response] {response[-1].content}") +print(f"[Token Usage] Input: {response[-1].prompt_tokens}, Output: {response[-1].completion_tokens}") +``` + +### Stream Mode + +```python +# Enable stream in configuration +config.generation_config.stream = True + +agent = LLMAgent(config=config) + +multimodal_content = [ + {"type": "text", "text": "Please describe this image."}, + {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}} +] + +# Stream mode: returns a generator +generator = await agent.run( + messages=[Message(role="user", content=multimodal_content)], + stream=True +) + +full_response = "" +async for response_chunk in generator: + if response_chunk and len(response_chunk) > 0: + last_msg = response_chunk[-1] + if last_msg.content: + # Stream output of new content + print(last_msg.content[len(full_response):], end='', flush=True) + full_response = last_msg.content + +print(f"\n[Full Response] {full_response}") +``` + +### Multi-Turn Conversations + +LLMAgent supports multi-turn conversations, allowing you to mix images and text: + +```python +agent = LLMAgent(config=config, tag="multimodal_conversation") + +# Turn 1: Send an image +multimodal_content = [ + {"type": "text", "text": "How many people are in this image?"}, + {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}} +] + +messages = [Message(role="user", content=multimodal_content)] +response = await agent.run(messages=messages) +print(f"[Turn 1 Response] {response[-1].content}") + +# Turn 2: Follow-up question (text only, preserving context) +messages = response # Use previous response as context +messages.append(Message(role="user", content="What are they doing?")) +response = await agent.run(messages=messages) +print(f"[Turn 2 Response] {response[-1].content}") +``` + +## Multimodal Message Format + +ms-agent uses the OpenAI-compatible multimodal message format. Images can be provided in three ways: + +### 1. Image URL + +```python +from ms_agent.llm.utils import Message + +multimodal_content = [ + {"type": "text", "text": "Please describe this image."}, + {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}} +] + +messages = [ + Message(role="user", content=multimodal_content) +] + +response = llm.generate(messages=messages) +``` + +### 2. Base64 Encoding + +```python +import base64 + +# Read and encode the image +with open('image.jpg', 'rb') as f: + image_data = base64.b64encode(f.read()).decode('utf-8') + +multimodal_content = [ + {"type": "text", "text": "What is this?"}, + { + "type": "image_url", + "image_url": { + "url": f"data:image/jpeg;base64,{image_data}" + } + } +] + +messages = [Message(role="user", content=multimodal_content)] +response = llm.generate(messages=messages) +``` + +### 3. Local File Path + +```python +import base64 +import os + +image_path = 'path/to/image.png' + +# Get MIME type +ext = os.path.splitext(image_path)[1].lower() +mime_type = { + '.png': 'image/png', + '.jpg': 'image/jpeg', + '.jpeg': 'image/jpeg', + '.gif': 'image/gif', + '.webp': 'image/webp' +}.get(ext, 'image/png') + +# Read and encode +with open(image_path, 'rb') as f: + image_data = base64.b64encode(f.read()).decode('utf-8') + +multimodal_content = [ + {"type": "text", "text": "Describe this image."}, + { + "type": "image_url", + "image_url": { + "url": f"data:{mime_type};base64,{image_data}" + } + } +] + +messages = [Message(role="user", content=multimodal_content)] +response = llm.generate(messages=messages) +``` + +## Running Examples + +### Running the Agent Example + +```bash +# Run the complete test suite (including stream and non-stream modes) +python examples/agent/test_llm_agent_multimodal.py +``` + +## FAQ + +### Q: Are there image size limits? + +A: Yes, different models have different limits: +- qwen3.5-plus: Recommended image size under 4MB +- Recommended resolution not exceeding 2048x2048 + +### Q: What image formats are supported? + +A: Commonly supported formats: +- JPEG / JPG +- PNG +- GIF +- WebP + +### Q: Can I send multiple images at once? + +A: Yes, you can add multiple `image_url` blocks in a single message: + +```python +multimodal_content = [ + {"type": "text", "text": "Compare these two images."}, + {"type": "image_url", "image_url": {"url": "https://example.com/img1.jpg"}}, + {"type": "image_url", "image_url": {"url": "https://example.com/img2.jpg"}} +] +``` + +### Q: Is streaming output supported? + +A: Yes, multimodal conversations support streaming output. Set `stream: true`: + +```python +config.generation_config.stream = True +response = llm.generate(messages=messages, stream=True) +``` diff --git a/docs/en/index.rst b/docs/en/index.rst index e9ae01838..c734f9288 100644 --- a/docs/en/index.rst +++ b/docs/en/index.rst @@ -20,14 +20,16 @@ MS-Agent DOCUMENTATION Components/LLMAgent Components/Workflow Components/SupportedModels + Components/MultimodalSupport Components/Tools + Components/AgentSkills Components/ContributorGuide .. toctree:: :maxdepth: 2 :caption: 📁 Projects - Projects/AgentSkills + Projects/CodeGenesis Projects/DeepResearch Projects/FinResearch Projects/VideoGeneration diff --git a/docs/zh/Components/agent-skills.md b/docs/zh/Components/agent-skills.md new file mode 100644 index 000000000..661ad722b --- /dev/null +++ b/docs/zh/Components/agent-skills.md @@ -0,0 +1,514 @@ +--- +slug: agent-skills +title: 智能体技能 +description: Ms-Agent 智能体技能模块:基于 Anthropic Agent Skills 协议的技能发现、分析与执行框架。 +--- + +# 智能体技能 (Agent Skills) + +MS-Agent 技能模块是一个强大的、可扩展的技能执行框架,支持 LLM Agent 自动发现、分析并执行特定领域的技能,以完成复杂任务。该模块是 [Anthropic Agent Skills](https://docs.claude.com/en/docs/agents-and-tools/agent-skills) 协议的实现。 + +通过技能模块,Agent 可以处理如下复杂任务: +- "生成 Q4 销售数据的 PDF 报告" +- "创建关于 AI 趋势的演示文稿并附带图表" +- "将文档转换为 PPTX 格式并应用自定义主题" + +## 核心特性 + +- **智能技能检索**:结合 FAISS 密集检索与 BM25 稀疏检索的混合搜索,并通过 LLM 进行相关性过滤 +- **DAG 执行引擎**:基于依赖关系构建执行 DAG,支持独立技能并行执行,自动在技能之间传递输入/输出 +- **渐进式技能分析**:两阶段分析(先规划、再加载资源),按需增量加载脚本/引用/资源,优化上下文窗口使用 +- **安全执行环境**:支持通过 [ms-enclave](https://github.com/modelscope/ms-enclave) Docker 沙箱隔离执行,或在受控的本地环境中执行 +- **自反思与重试**:基于 LLM 的错误分析,自动修复代码并可配置重试次数 +- **标准协议兼容**:完全兼容 [Anthropic Skills](https://github.com/anthropics/skills) 协议 + +## 架构 + +### 整体架构 + +``` +┌─────────────────────────────────────────────────────────┐ +│ LLMAgent │ +│ ┌───────────────────────────────────────────────────┐ │ +│ │ AutoSkills │ │ +│ │ ┌──────────┐ ┌──────────┐ ┌────────────────┐ │ │ +│ │ │ Loader │ │ Retriever│ │ SkillAnalyzer │ │ │ +│ │ │ │ │ (Hybrid) │ │ (Progressive) │ │ │ +│ │ └────┬─────┘ └────┬─────┘ └───────┬────────┘ │ │ +│ │ │ │ │ │ │ +│ │ ▼ ▼ ▼ │ │ +│ │ ┌───────────────────────────────────────────────┐│ │ +│ │ │ DAGExecutor ││ │ +│ │ │ ┌────────┐ ┌────────┐ ┌────────┐ ││ │ +│ │ │ │Skill 1 │→ │Skill 2 │→ │Skill N │ ││ │ +│ │ │ └───┬────┘ └───┬────┘ └───┬────┘ ││ │ +│ │ │ └────────────┴──────────┘ ││ │ +│ │ │ ↓ ││ │ +│ │ │ SkillContainer (执行) ││ │ +│ │ └───────────────────────────────────────────────┘│ │ +│ └───────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +### 执行流程 + +``` +用户请求 + │ + ▼ +┌────────────────┐ +│ 查询分析 │ ─── 是否为技能相关请求? +└───────┬────────┘ + │ 是 + ▼ +┌────────────────┐ +│ 技能检索 │ ─── 混合搜索 (FAISS + BM25) +└───────┬────────┘ + │ + ▼ +┌────────────────┐ +│ 技能过滤 │ ─── 基于 LLM 的相关性过滤 +└───────┬────────┘ + │ + ▼ +┌────────────────┐ +│ DAG 构建 │ ─── 构建依赖图 +└───────┬────────┘ + │ + ▼ +┌────────────────┐ +│ 渐进式执行 │ ─── 规划 → 加载 → 执行 +└───────┬────────┘ + │ + ▼ +┌────────────────┐ +│ 结果聚合 │ ─── 合并输出,格式化响应 +└────────────────┘ +``` + +技能模块实现了多层次渐进式上下文加载机制: + +1. **Level 1 (Metadata)**:仅加载技能元数据(名称、描述)以进行语义搜索 +2. **Level 2 (Retrieval)**:检索相关技能并加载 SKILL.md 全文 +3. **Level 3 (Resources)**:进一步加载技能所需的参考资料和资源文件 +4. **Level 4 (Analysis|Planning|Execution)**:分析技能上下文,自主制定计划,加载所需资源并运行相关脚本 + +## 技能目录结构 + +每个技能是一个自包含的目录: + +``` +skill-name/ +├── SKILL.md # 必须: 主文档和指令 +├── META.yaml # 可选: 元数据(名称、描述、版本、标签) +├── scripts/ # 可选: 可执行脚本 +│ ├── main.py +│ ├── utils.py +│ └── run.sh +├── references/ # 可选: 参考文档 +│ ├── api_docs.md +│ └── examples.json +├── resources/ # 可选: 静态资源 +│ ├── template.html +│ ├── fonts/ +│ └── images/ +└── requirements.txt # 可选: Python 依赖 +``` + +### SKILL.md 格式 + +```markdown +# 技能名称 + +技能功能简述。 + +## Capabilities + +- 功能 1 +- 功能 2 + +## Usage + +使用说明... + +## Parameters + +| Parameter | Type | Description | +|-----------|------|-------------| +| input | str | 输入数据 | +| format | str | 输出格式 | + +## Examples + +使用示例... +``` + +### META.yaml 格式 + +```yaml +name: "PDF Generator" +description: "Generates professional PDF documents from markdown or data" +version: "1.0.0" +author: "Your Name" +tags: + - document + - pdf + - report +``` + +## 快速开始 + +### 前提条件 + +- Python 3.9+ +- Docker(用于沙箱执行,可选) +- FAISS(用于技能检索) + +### 安装 + +```bash +pip install 'ms-agent>=1.4.0' + +# 或从源码安装 +git clone https://github.com/modelscope/ms-agent.git +cd ms-agent +pip install -e . +``` + +### 方式一:通过 LLMAgent 配置使用 + +```python +import asyncio +from omegaconf import DictConfig +from ms_agent.agent import LLMAgent + +config = DictConfig({ + 'llm': { + 'service': 'openai', + 'model': 'gpt-4', + 'openai_api_key': 'your-api-key', + 'openai_base_url': 'https://api.openai.com/v1' + }, + 'skills': { + 'path': '/path/to/skills', + 'auto_execute': True, + 'work_dir': '/path/to/workspace', + 'use_sandbox': False, + } +}) + +agent = LLMAgent(config, tag='skill-agent') + +async def main(): + result = await agent.run('Generate a mock PDF report about AI trends') + print(result) + +asyncio.run(main()) +``` + +### 方式二:直接使用 AutoSkills + +```python +import asyncio +from ms_agent.skill import AutoSkills +from ms_agent.llm import LLM +from omegaconf import DictConfig + +llm_config = DictConfig({ + 'llm': { + 'service': 'openai', + 'model': 'gpt-4', + 'openai_api_key': 'your-api-key', + 'openai_base_url': 'https://api.openai.com/v1' + } +}) +llm = LLM.from_config(llm_config) + +auto_skills = AutoSkills( + skills='/path/to/skills', + llm=llm, + work_dir='/path/to/workspace', + use_sandbox=False, +) + +async def main(): + result = await auto_skills.run( + query='Generate a mock PDF report about AI trends' + ) + print(f"Result: {result.execution_result}") + +asyncio.run(main()) +``` + +**参数说明:** + +- `skills`:技能来源,支持以下格式: + - 单个或多个本地技能目录路径 + - 单个或多个 ModelScope 技能仓库 ID,例如 `ms-agent/claude_skills`(参考 [ModelScope Hub](https://modelscope.cn/models/ms-agent/claude_skills)) + - 格式 `owner/skill_name` 或 `owner/skill_name/subfolder` + - 单个或多个 `SkillSchema` 对象 +- `work_dir`:技能执行输出的工作目录 +- `use_sandbox`:是否使用 Docker 沙箱执行(默认 `True`),设为 `False` 则在本地执行并启用安全检查 +- `auto_execute`:是否自动执行技能(默认 `True`) + +## 配置 + +在 `agent.yaml` 中配置技能模块: + +```yaml +skills: + # 必须: 技能目录路径或 ModelScope 仓库 ID + path: /path/to/skills + + # 可选: 是否启用检索器(默认根据技能数量自动判断) + enable_retrieve: + + # 可选: 检索器参数 + retrieve_args: + top_k: 3 + min_score: 0.8 + + # 可选: 最大候选技能数量(默认 10) + max_candidate_skills: 10 + + # 可选: 最大重试次数(默认 3) + max_retries: 3 + + # 可选: 工作目录 + work_dir: /path/to/workspace + + # 可选: 是否使用 Docker 沙箱执行(默认 True) + use_sandbox: false + + # 可选: 是否自动执行技能(默认 True) + auto_execute: true +``` + +更多 YAML 配置的一般性说明,请参考 [配置与参数](./config)。 + +## 核心组件 + +| 组件 | 描述 | +|------|------| +| `AutoSkills` | 技能执行主入口,协调检索、分析和执行 | +| `SkillContainer` | 安全的技能执行环境(沙箱或本地) | +| `SkillAnalyzer` | 渐进式技能分析器,支持增量资源加载 | +| `DAGExecutor` | 基于依赖管理的 DAG 执行器 | +| `SkillLoader` | 技能加载与管理 | +| `Retriever` | 使用语义搜索查找相关技能 | +| `SkillSchema` | 技能 Schema 定义 | + +### AutoSkills + +```python +class AutoSkills: + def __init__( + self, + skills: Union[str, List[str], List[SkillSchema]], + llm: LLM, + enable_retrieve: Optional[bool] = None, + retrieve_args: Dict[str, Any] = None, + max_candidate_skills: int = 10, + max_retries: int = 3, + work_dir: Optional[str] = None, + use_sandbox: bool = True, + ): ... + + async def run(self, query: str, ...) -> SkillDAGResult: + """执行技能。""" + + async def get_skill_dag(self, query: str) -> SkillDAGResult: + """获取技能 DAG(不执行)。""" +``` + +### SkillContainer + +```python +class SkillContainer: + def __init__( + self, + workspace_dir: Optional[Path] = None, + use_sandbox: bool = True, + timeout: int = 300, + memory_limit: str = "2g", + enable_security_check: bool = True, + ): ... + + async def execute_python_code(self, code: str, ...) -> ExecutionOutput: + """执行 Python 代码。""" + + async def execute_shell(self, command: str, ...) -> ExecutionOutput: + """执行 Shell 命令。""" +``` + +### SkillAnalyzer + +```python +class SkillAnalyzer: + def __init__(self, llm: LLM): ... + + def analyze_skill_plan(self, skill: SkillSchema, query: str) -> SkillContext: + """阶段 1: 分析技能并创建执行计划。""" + + def load_skill_resources(self, context: SkillContext) -> SkillContext: + """阶段 2: 根据计划加载资源。""" + + def generate_execution_commands(self, context: SkillContext) -> List[Dict]: + """阶段 3: 生成执行命令。""" +``` + +### DAGExecutor + +```python +class DAGExecutor: + def __init__( + self, + container: SkillContainer, + skills: Dict[str, SkillSchema], + llm: LLM = None, + enable_progressive_analysis: bool = True, + enable_self_reflection: bool = True, + max_retries: int = 3, + ): ... + + async def execute( + self, + dag: Dict[str, List[str]], + execution_order: List[Union[str, List[str]]], + stop_on_failure: bool = True, + query: str = '', + ) -> DAGExecutionResult: + """执行技能 DAG。""" +``` + +## 使用示例 + +### 示例一:PDF 报告生成 + +```python +import asyncio +from ms_agent.skill import AutoSkills +from ms_agent.llm import LLM + +async def generate_pdf_report(): + llm = LLM.from_config(config) + auto_skills = AutoSkills( + skills='/path/to/skills', + llm=llm, + work_dir='/tmp/reports' + ) + + result = await auto_skills.run( + query='Generate a PDF report analyzing Q4 2024 sales data with charts' + ) + + if result.execution_result and result.execution_result.success: + for skill_id, skill_result in result.execution_result.results.items(): + if skill_result.output.output_files: + print(f"Generated files: {skill_result.output.output_files}") + +asyncio.run(generate_pdf_report()) +``` + +### 示例二:多技能流水线 + +```python +async def create_presentation(): + auto_skills = AutoSkills( + skills='/path/to/skills', + llm=llm, + work_dir='/tmp/presentation' + ) + + # 此请求可能触发多个技能协同执行: + # 1. data-analysis 技能处理数据 + # 2. chart-generator 技能创建可视化图表 + # 3. pptx 技能生成演示文稿 + result = await auto_skills.run( + query='Create a presentation about AI market trends with data visualizations' + ) + + print(f"Execution order: {result.execution_order}") + + for skill_id in result.execution_order: + if isinstance(skill_id, str): + context = auto_skills.get_skill_context(skill_id) + if context and context.plan: + print(f"{skill_id}: {context.plan.plan_summary}") + +asyncio.run(create_presentation()) +``` + +### 示例三:自定义输入执行 + +```python +from ms_agent.skill.container import ExecutionInput + +async def execute_with_custom_input(): + auto_skills = AutoSkills( + skills='/path/to/skills', + llm=llm, + work_dir='/tmp/custom' + ) + + dag_result = await auto_skills.get_skill_dag( + query='Convert my document to PDF' + ) + + custom_input = ExecutionInput( + input_files={'document.md': '/path/to/my/document.md'}, + env_vars={'OUTPUT_FORMAT': 'A4', 'MARGINS': '1in'} + ) + + exec_result = await auto_skills.execute_dag( + dag_result=dag_result, + execution_input=custom_input, + query='Convert my document to PDF' + ) + + print(f"Success: {exec_result.success}") + +asyncio.run(execute_with_custom_input()) +``` + +## 安全机制 + +### 沙箱执行(推荐) + +当 `use_sandbox=True` 时,技能在隔离的 Docker 容器中运行: +- 网络隔离(可配置) +- 文件系统隔离(仅挂载工作目录) +- 资源限制(内存、CPU) +- 无法访问宿主系统 +- 自动安装技能声明的依赖 + +### 本地执行 + +当 `use_sandbox=False` 时,通过以下方式保障安全: +- 基于模式匹配的危险代码扫描 +- 受限的文件系统访问 +- 环境变量清洗 + +> 请确保您信任待执行的技能脚本,以避免潜在的安全风险。本地执行需确保 Python 环境中已安装脚本所需的全部依赖。 + +## 创建自定义技能 + +1. 在技能目录下创建新的子目录 +2. 添加 `SKILL.md` 文件,包含文档和指令 +3. 添加 `META.yaml` 文件,包含元数据 +4. 按需添加 scripts、references 和 resources +5. 使用 `AutoSkills.get_skill_dag()` 验证技能能否被正确检索 + +### 最佳实践 + +- 编写清晰完整的 `SKILL.md`,充分描述技能的功能、使用方式和参数 +- 在 `requirements.txt` 中显式声明所有依赖 +- 保持技能自包含,将所有必要资源打包在目录内 +- 在脚本中妥善处理错误 +- 使用 `SKILL_OUTPUT_DIR` 环境变量指定输出目录 + +## 参考 + +- [Anthropic Agent Skills 官方文档](https://docs.claude.com/en/docs/agents-and-tools/agent-skills) +- [Anthropic Skills GitHub 仓库](https://github.com/anthropics/skills) +- [MS-Agent Skill 示例](https://modelscope.cn/models/ms-agent/skill_examples) diff --git a/docs/zh/Components/config.md b/docs/zh/Components/config.md index 363eca253..f3dd74203 100644 --- a/docs/zh/Components/config.md +++ b/docs/zh/Components/config.md @@ -106,6 +106,24 @@ tools: 支持的完整工具列表,以及自定义工具请参考 [这里](./tools) +## 技能配置 + +> 可选,启用 Agent Skills 时使用 + +```yaml +skills: + # 技能目录路径或 ModelScope 仓库 ID + path: /path/to/skills + # 是否自动执行技能(默认 True) + auto_execute: true + # 工作目录 + work_dir: /path/to/workspace + # 是否使用 Docker 沙箱执行(默认 True) + use_sandbox: false +``` + +完整的技能模块说明(包括架构、目录结构、API 参考和安全机制等),请参考 [智能体技能](./agent-skills)。 + ## 内存压缩配置 > 可选,用于长对话场景的上下文管理 diff --git "a/docs/zh/Components/\345\267\245\345\205\267.md" "b/docs/zh/Components/\345\267\245\345\205\267.md" deleted file mode 100644 index 0a187f809..000000000 --- "a/docs/zh/Components/\345\267\245\345\205\267.md" +++ /dev/null @@ -1,217 +0,0 @@ -# 工具 - -## 工具列表 - -MS-Agent支持很多内部工具: - -### split_task - -任务拆分工具。LLM可以使用该工具将一个复杂任务拆分为若干个子任务,每个子任务都具有独立的system和query字段。子任务的yaml配置默认继承自父任务。 - -#### split_to_sub_task - -使用该方法开启多个子任务。 - -参数: - -- tasks: ``List[Dict[str, str]]``, 列表长度等于子任务数,每个子任务均包含key为system和query两个字段的Dict - -### file_system - -一个基础的本地文件增删改查工具。该工具会读取yaml配置中的`output`字段(默认为当前文件夹的`output`文件夹),所有的增删改查均基于output所指定的目录为根目录进行。 - -#### create_directory - -创建一个文件夹 - -参数: - -- path: `str`, 待创建的目录,该目录基于yaml配置中的`output`字段。 - -#### write_file - -写入具体文件。 - -参数: - -- path: `str`, 待写入的具体文件,目录基于yaml配置中的`output`字段。 -- content: `str`: 写入内容。 - -#### read_file - -读取一个文件内容 - -参数: - -- path: `str`, 待读出的具体文件,目录基于yaml配置中的`output`字段。 - -#### list_files - -列出某个目录的文件列表 - -参数: - -- path: `str`, 基于yaml配置中的`output`的相对目录。如果为空,则列出根目录下的所有文件。 - -### code_execution - -代码执行工具,基于沙箱环境运行代码,支持基于HTTP或本地建立沙箱运行环境,主要支持docker和docker-notebook两种环境类型,分别适合于无状态的代码运行和需要在对话内保持上下文状态的代码运行。 -工具基于ms-enclave实现,依赖于本地的docker环境,如果代码执行需要的依赖较多,需要预先构建包含所需依赖的镜像。准备好基础镜像后,需要完善本次启动容器的基础配置,如配置所选择的执行环境类型、可用的工具、容器需要挂载的目录等等,默认的挂载目录基于yaml配置中的`output`字段,在沙箱内挂载为`/data`。 - -#### notebook_executor - -该方法专用于docker-notebook类型沙箱环境,可以在notebook内执行代码并保持对话内的上下文,同时支持使用shell命令完成沙箱环境下的各种操作。 - -- code: `str`, 需要执行的代码。 -- description: `str`, 代码工作内容的描述。 - -#### python_executor - -该方法专用于docker类型沙箱环境,可以使用沙箱内的本地代码解释器运行代码。 - -- code: `str`, 需要执行的代码。 -- description: `str`, 代码工作内容的描述。 - -#### shell_executor - -该方法专用于docker类型沙箱环境,可以使用bash在沙箱内执行shell命令,支持基本的shell操作例如ls、cd、mkdir、rm等等。 - -- command: `str`, 需要执行的shell命令。 - -#### file_operation - -该方法专用于docker类型沙箱环境,可以在沙箱内执行基本的文件操作,包括创建、读、写、删除、列出、判断是否存在等。 - -- operation: `str`, 要执行的文件操作类型,可选值为'create'、'read'、'write'、'delete'、'list'、'exists'。 - -#### reset_sandbox - -用于在沙箱环境崩溃时重启沙箱环境,例如notebook内变量状态混乱时重置所有状态。 - -#### get_sandbox_info - -获取当前沙箱状态与环境信息。 - -### MCP工具 - -支持传入外部MCP工具,只需要将mcp工具需要的配置写入字段即可,注意配置`mcp: true`。 - -```yaml - amap-maps: - mcp: true - type: sse - url: https://mcp.api-inference.modelscope.net/xxx/sse - exclude: - - map_geo -``` - -## 自定义工具 - -### 传入mcp.json - -该方式可以传入一个mcp工具列表。作用和配置yaml中的tools字段相同。 - -```shell -ms-agent run --config xxx/xxx --mcp_server_file ./mcp.json -``` - -### 配置yaml文件 - -yaml中可以在tools中添加额外工具。可以参考[配置与参数](./配置与参数.md#工具配置) - -### 编写新的工具 - -```python -from ms_agent.llm.utils import Tool -from ms_agent.tools.base import ToolBase - - -# 可以改为其他名字 -class CustomTool(ToolBase): - """A file system operation tool - - TODO: This tool now is a simple implementation, sandbox or mcp TBD. - """ - - def __init__(self, config): - super(CustomTool, self).__init__(config) - self.exclude_func(getattr(config.tools, 'custom_tool', None)) - ... - - async def connect(self): - ... - - async def cleanup(self): - ... - - async def get_tools(self): - tools = { - 'custom_tool': [ - Tool( - tool_name='foo', - server_name='custom_tool', - description='foo function', - parameters={ - 'type': 'object', - 'properties': { - 'path': { - 'type': 'string', - 'description': 'This is the only argument needed by foo, it\'s used to ...', - } - }, - 'required': ['foo_arg1'], - 'additionalProperties': False - }), - Tool( - tool_name='bar', - server_name='custom_tool', - description='bar function', - parameters={ - 'type': 'object', - 'properties': { - 'path': { - 'type': 'string', - 'description': 'This is the only argument needed by bar, it\'s used to ...', - }, - }, - 'required': ['bar_arg1'], - 'additionalProperties': False - }), - ] - } - return tools - - async def foo(self, foo_arg1) -> str: - ... - - async def bar(self, bar_arg1) -> str: - ... -``` - -将文件保存在`agent.yaml`的相对目录中,如`tools/custom_tool.py`。 - -```text -agent.yaml -tools - |--custom_tool.py -``` - -之后可以在`agent.yaml`中进行如下配置: - -```yaml - -tools: - tool1: - mcp: true - # 其他配置 - - tool2: - mcp: false - # 其他配置 - - # 这里是注册的新工具 - plugins: - - tools/custom_tool -``` - -我们有一个[简单的例子](https://www.modelscope.cn/models/ms-agent/simple_tool_plugin),可以基于这个例子进行修改。 diff --git a/docs/zh/Projects/agent-skills.md b/docs/zh/Projects/agent-skills.md deleted file mode 100644 index 9985cb75e..000000000 --- a/docs/zh/Projects/agent-skills.md +++ /dev/null @@ -1,235 +0,0 @@ ---- -slug: agent-skills -title: 智能体技能 -description: Ms-Agent 智能体技能:模块化封装领域知识,提升智能体在特定任务上的表现,完全兼容 Anthropic Agent Skills 协议。 ---- - -# 智能体技能 (Agent Skills) - - -## 1. 背景和动机 - -- **通用智能体的演进需求** - - 随着模型能力提升,智能体已能与完整计算环境(如代码执行、文件系统)交互,执行跨领域复杂任务 - - 更强大的智能体需要模块化、可扩展、可移植的方式注入领域专业知识 - - -- **技能即知识封装** - - 将人类的流程性知识打包为可复用、可组合的“技能”,无需为每个场景重建定制智能体 - - 以结构化文件夹形式(含指令、脚本、资源)动态加载,使智能体在特定任务上表现更优 - - -- **灵活性与适应性** - - 构建技能如同编写入职指南,降低专业化门槛,提升智能体的灵活性与适应性 - - -
- -更多关于`Agent Skills`的内容,参考: [Anthropic Agent Skills](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/overview) - -
- - -## 2. 智能体技能是什么? - - -### 1) 架构 - -- 智能体技能架构 - -![Skill-Architecture](../../resources/skill_architecture.png) - - -- 文件夹结构 -``` -skill-name/ -├── SKILL.md # Main skill definition (Required) -├── reference.md # Detailed reference material (Optional) -├── LICENSE.txt # License information (Optional) -├── resources/ # Additional resources (Optional) -│ ├── template.xlsx # Example files -│ └── data.json # Data files -└── scripts/ # Executable scripts (Optional) - ├── main.py # Main implementation - └── helper.py # Helper functions -``` - -### 2) SKILL.md 文件格式 - -`SKILL.md` 文件使用YAML前置内容定义元数据,后续为详细说明的Markdown内容。 - -![Skill-MD-File](../../resources/skill_md_file.png) - -💡 说明: - - `name`和`description`字段为必填项。 - - `SKILL.md`文件的正文部分应提供关于技能的全面描述,包括功能、使用说明、参考资料、资源和示例。 - -[SKILL.md示例](https://github.com/anthropics/skills/blob/main/document-skills/pdf/SKILL.md) - - -### 3) 绑定附加内容 - - -附加的文件可以包含在`SKILL.md`中以扩展技能功能,例如: -- References (例如 `reference.md` 和 `forms.md`) - -![Skill-Additional-Content](../../resources/skill_additional_content.png) - -- Scripts - -![Skill-Additional-Scripts](../../resources/skill_additional_scripts.png) - - -### 4) 技能和上下文 - -- 推荐设置技能文件的token限制,以确保在上下文窗口限制内高效加载 - -![Skill-Files-Limitation](../../resources/skill_files_limitation.png) - -![Skill-Context-Window](../../resources/skill_context_window.png) - - -
- - -## 3. 技能的实现 - -### 1) 概览 - -**MS-Agent**框架的**AgentSkills**模块是对[Anthropic-Agent-Skills](https://docs.claude.com/en/docs/agents-and-tools/agent-skills)协议的实现(Beta版本)。 - -`Agent Skills`实现了多层次渐进式上下文加载机制,有效管理技能的发现与执行: - -1. **Level 1 (Metadata)**: 仅加载技能元数据(名称、描述)以进行语义搜索 -2. **Level 2 (Retrieval)**: 检索相关技能并加载SKILL.md全文 -3. **Level 3 (Resources)**: 进一步加载技能所需的参考资料和资源文件 -4. **Level 4 (Analysis|Planning|Execution)**: 分析技能上下文,自主制定计划和任务列表,并加载所需资源和运行相关脚本 - -这种方法在提供全面技能能力的同时,最大限度地减少资源消耗。 - - -* 核心组件 - -| 组件 | 描述 | -|------------------|------------------| -| `AgentSkill` | 主流程 | -| `SkillLoader` | 加载和管理技能 | -| `Retriever` | 使用语义搜索查找相关技能 | -| `SkillContext` | 技能上下文管理 | -| `ScriptExecutor` | 技能执行模块 | -| `SkillSchema` | 技能Schema定义 | - -### 2) 主要特性 - -- 📜 **标准技能协议**:完全兼容 [Anthropic Skills](https://github.com/anthropics/skills) 协议 -- 🧠 **启发式上下文加载**:仅按需加载必要上下文(如 `References`、`Resources` 和 `Scripts`) -- 🤖 **自主执行能力**:智能体可根据技能定义,自主分析、规划并决策需调用的脚本与资源 -- 🔍 **技能管理支持**:支持批量加载技能,并能根据用户输入自动检索与发现相关技能 -- 🛡️ **代码执行环境**:可选本地直接执行代码,或通过 [**ms-enclave**](https://github.com/modelscope/ms-enclave) 提供的安全沙箱执行(自动安装依赖、实现环境隔离) -- 📁 **多文件类型支持**:支持文档、脚本与资源文件等多种类型 -- 🧩 **可扩展设计**:技能数据结构模块化,提供如 `SkillSchema` 和 `SkillContext` 等实现,便于扩展与定制 - - -### 3) 安装 - -* Install from PyPI -```bash -pip install 'ms-agent>=1.4.0' -``` - -* Install from Source -```bash -git clone git@github.com:modelscope/ms-agent.git -cd ms-agent -pip install -e . -``` - -* Configuration -```bash -export OPENAI_API_KEY="your-api-key" -export OPENAI_BASE_URL="your-base-url" -``` - - -### 4) 使用方法 - -> 下面是一个实现`流场粒子艺术生成`的示例 - -```python -import os -from ms_agent.agent import create_agent_skill - - -def main(): - """ - Main function to create and run an agent with skills. - """ - work_dir: str = './temp_workspace' - # Refer to `https://github.com/modelscope/ms-agent/tree/main/projects/agent_skills/skills` - skill_id_or_dir: str = './skills' - use_sandbox: bool = True - - ## Configuration for ModelScope API-Inference, or set your own model with OpenAI API compatible format - ## Free LLM API inference calls for ModelScope users, refer to [ModelScope API-Inference](https://modelscope.cn/docs/model-service/API-Inference/intro) - model: str = 'Qwen/Qwen3-235B-A22B-Instruct-2507' - api_key: str = 'xx-xx' # For ModelScope users, refer to `https://modelscope.cn/my/myaccesstoken` to get your access token - base_url: str = 'https://api-inference.modelscope.cn/v1/' - - agent = create_agent_skill( - # Use a skill from ModelScope Hub by its ID. A list of IDs is also supported. e.g. `ms-agent/skill_examples` - # To use local skills, provide the path to the directory, e.g., skills='./skills' - # For more details on skill IDs, see: https://modelscope.cn/models/ms-agent/skill_examples - skills=skill_id_or_dir, - model=model, - api_key=os.getenv('OPENAI_API_KEY', api_key), - base_url=os.getenv('OPENAI_BASE_URL', base_url), - stream=True, - # Note: Make sure the `Docker Daemon` is running if use_sandbox=True - use_sandbox=use_sandbox, - work_dir=work_dir, - ) - - user_query: str = ('Create generative art using p5.js with seeded randomness, flow fields, and particle systems, ' - 'please fill in the details and provide the complete code based on the templates.') - - response = agent.run(query=user_query) - print(f'\n\n** Agent skill results: {response}\n') - - -if __name__ == '__main__': - - main() -``` - -- skill_id_or_dir: 支持传入本地技能目录路径,或从ModelScope Hub加载技能ID。 - - skill_id_or_dir (str): 示例: 'path/to/skill-directory', 'ms-agent/skill_examples', 'ma-agent/skill_examples/pdf' (格式为 `owner/skill_name` or `owner/skill_name/subfolder`) - - 参考 [AgentSkillExamples](https://modelscope.cn/models/ms-agent/skill_examples) - - -* 本地执行 - - 若 `use_sandbox=False`,技能脚本将在本地环境中直接执行 - - 请确保您信任该技能脚本,以避免潜在的安全风险 - - 请确保本地 Python 环境中已安装脚本所需的全部依赖项 - -* 沙箱执行 - - 若 `use_sandbox=True`,技能脚本将通过 [**ms-enclave**](https://github.com/modelscope/ms-enclave) 在隔离的 Docker 容器中执行 - - 该方式提供安全的执行环境,可有效防止对宿主系统造成潜在危害 - - 请确保您的机器上已安装 Docker,并且 Docker 服务(Docker Daemon)正在运行 - - 沙箱环境将根据技能声明的依赖项自动安装所需依赖,无需手动配置 - -
- -**运行结果** - -![Flow Field Particles](../../resources/skill_algorithmic_art_result.gif) - - -
- - -## 参考文档 - -* Anthropic Agent Skills官方文档:https://docs.claude.com/en/docs/agents-and-tools/agent-skills -* Anthropic Skills GitHub仓库: https://github.com/anthropics/skills - -
diff --git a/docs/zh/index.rst b/docs/zh/index.rst index 0a9cc8b48..b05f7e52f 100644 --- a/docs/zh/index.rst +++ b/docs/zh/index.rst @@ -20,14 +20,16 @@ MS-Agent 官方文档 Components/llm-agent Components/workflow Components/supported-models + Components/multimodal-support Components/tools + Components/agent-skills Components/contributor-guide .. toctree:: :maxdepth: 2 :caption: 📁 项目 - Projects/agent-skills + Projects/code-genesis Projects/deep-research Projects/fin-research Projects/video-generation