Complete reference for all MCProbe command-line interface commands.
mcprobe [COMMAND] [OPTIONS] [ARGS]MCProbe is built on Typer and supports standard help options:
--help- Show help message for any command--version- Display MCProbe version
| Command | Description |
|---|---|
| run | Run test scenarios against an agent |
| validate | Validate scenario YAML files |
| generate-scenarios | Generate scenarios from MCP schemas |
| report | Generate reports from test results |
| trends | Show trend analysis for scenarios |
| flaky | Detect flaky (inconsistent) tests |
| stability-check | Check stability of a scenario |
| providers | List available LLM providers |
| serve | Start MCP server for AI assistant integration |
Execute test scenarios against the agent under test.
mcprobe run SCENARIO_PATH [OPTIONS]Runs one or more test scenarios, using a synthetic user to converse with the agent and a judge to evaluate the results. Can execute a single scenario file or all scenarios in a directory.
| Argument | Type | Required | Description |
|---|---|---|---|
SCENARIO_PATH |
Path | Yes | Path to scenario YAML file or directory containing scenarios |
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--config |
-c |
Path | None | Path to mcprobe.yaml configuration file (auto-discovers if not specified) |
--provider |
-p |
string | ollama |
LLM provider: 'ollama', 'openai', etc. (overrides config file) |
--model |
-m |
string | llama3.2 |
Model name for LLM components (synthetic user and judge) (overrides config file) |
--base-url |
-u |
string | None | Base URL for LLM API (overrides config file) |
--agent-type |
-t |
string | simple |
Agent type: 'simple' (LLM) or 'adk' (Gemini ADK with MCP) (can be set in config file via agent.type) |
--agent-factory |
-f |
Path | None | Path to Python module with create_agent() function (required for 'adk' type) (can be set in config file via agent.factory) |
--verbose |
-v |
flag | False | Enable verbose output including full conversation and detailed metrics |
Run a single scenario with default settings:
mcprobe run scenarios/weather-query.yamlRun all scenarios in a directory:
mcprobe run scenarios/Use a configuration file:
mcprobe run scenarios/ --config mcprobe.yamlUse a specific provider and model:
mcprobe run scenarios/greeting.yaml --provider openai --model gpt-4Use a specific model and base URL:
mcprobe run scenarios/greeting.yaml -m llama3.1 -u http://ollama-server:11434Override config file with CLI arguments:
# Config file sets provider to ollama, but override to openai
mcprobe run scenarios/ -c mcprobe.yaml -p openai -m gpt-4Run with verbose output:
mcprobe run scenarios/complex-query.yaml -vTest an ADK agent with MCP tools (using CLI arguments):
mcprobe run scenarios/ -t adk -f my_agent_factory.pyTest an ADK agent with MCP tools (using config file - recommended):
# mcprobe.yaml
agent:
type: adk
factory: my_agent_factory.py
llm:
provider: ollama
model: llama3.2mcprobe run scenarios/Use environment variables in config:
# mcprobe.yaml contains: api_key: ${OPENAI_API_KEY}
export OPENAI_API_KEY=sk-your-key
mcprobe run scenarios/ --config mcprobe.yamlThe command displays:
- Number of scenarios found
- Agent type being tested
- For each scenario:
- Scenario name and description
- Pass/fail status with score
- Judge reasoning
- Suggestions for improvement (if any)
- Summary table with all results
In verbose mode, also shows:
- Complete conversation transcript
- Detailed tool call parameters
- Per-criterion correctness results
- Failure condition checks
- Quality metrics (clarifications, backtracks, etc.)
- Efficiency metrics (tokens, tool calls, turns)
- Structured MCP improvement suggestions
| Code | Meaning |
|---|---|
| 0 | All scenarios passed successfully |
| 1 | One or more scenarios failed, or command error |
Validate scenario YAML files without running them.
mcprobe validate SCENARIO_PATHChecks that scenario files are properly formatted and contain all required fields. Does not execute the scenarios or connect to any LLM services.
| Argument | Type | Required | Description |
|---|---|---|---|
SCENARIO_PATH |
Path | Yes | Path to scenario YAML file or directory to validate |
Validate a single scenario:
mcprobe validate scenarios/greeting.yamlValidate all scenarios in a directory:
mcprobe validate scenarios/On success:
Validated 5 scenario(s) successfully.
- Simple Greeting Test
- Weather Query Test
- Multi-Step Task
- Error Handling
- Tool Composition
On failure:
Validation failed: Missing required field 'synthetic_user' in scenario.yaml
| Code | Meaning |
|---|---|
| 0 | All scenarios are valid |
| 1 | One or more scenarios failed validation |
Generate test scenarios from MCP tool schemas.
mcprobe generate-scenarios --server SERVER_COMMAND [OPTIONS]Connects to an MCP server, extracts tool schemas, and automatically generates test scenarios based on the specified complexity level. Uses an LLM to create realistic user personas, queries, and evaluation criteria.
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--server |
-s |
string | Required | MCP server command (e.g., 'npx @example/weather-mcp') |
--output |
-o |
Path | ./generated-scenarios |
Output directory for generated scenarios |
--complexity |
-c |
string | medium |
Complexity level: simple, medium, or complex |
--count |
-n |
int | 10 | Number of scenarios to generate |
--model |
-m |
string | llama3.2 |
Model for generation |
--base-url |
-u |
string | http://localhost:11434 |
Base URL for Ollama API |
- simple: Single-tool scenarios with straightforward queries
- medium: Multi-step scenarios that may require 2-3 tool calls
- complex: Advanced scenarios with tool composition, error handling, and edge cases
Generate 10 medium-complexity scenarios:
mcprobe generate-scenarios -s "npx @modelcontextprotocol/server-weather" -o ./scenariosGenerate simple scenarios for testing:
mcprobe generate-scenarios -s "npx @example/my-mcp-server" -c simple -n 5Generate complex scenarios with a specific model:
mcprobe generate-scenarios -s "npx @example/server" -c complex -n 20 -m llama3.1Generate from a local server script:
mcprobe generate-scenarios -s "python my_mcp_server.py" -o ./test-scenariosConnecting to MCP server: npx @example/weather-mcp
Found 3 tool(s):
- get_current_weather: Get current weather for a location
- get_forecast: Get weather forecast
- search_locations: Search for locations by name
Generating 10 scenario(s) at medium complexity...
Generated 10 scenario(s)
Created: ./generated-scenarios/weather_query_london.yaml
Created: ./generated-scenarios/forecast_next_week.yaml
...
Scenarios written to ./generated-scenarios
| Code | Meaning |
|---|---|
| 0 | Scenarios generated successfully |
| 1 | Failed to connect to server or generate scenarios |
Generate a report from stored test results.
mcprobe report [OPTIONS]Reads test results from the results directory and generates a report in the specified format (HTML, JSON, or JUnit XML).
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--results-dir |
-d |
Path | test-results |
Directory containing test results |
--output |
-o |
Path | report.html |
Output file path for the report |
--format |
-f |
string | html |
Report format: html, json, or junit |
--title |
-t |
string | MCProbe Test Report |
Title for the report |
--limit |
-n |
int | 100 | Maximum number of results to include |
- html: Interactive HTML report with charts and detailed breakdowns
- json: Machine-readable JSON format for custom processing
- junit: JUnit XML format for CI/CD integration
Generate HTML report:
mcprobe report --format html --output report.htmlGenerate JUnit XML for CI:
mcprobe report --format junit --output test-results.xml --title "MCProbe CI Tests"Generate JSON report with custom limit:
mcprobe report --format json --output results.json --limit 50Use a different results directory:
mcprobe report -d ./my-results -f html -o my-report.htmlFound 42 test result(s)
Report generated: report.html
| Code | Meaning |
|---|---|
| 0 | Report generated successfully or no results found |
| 1 | Results directory not found or invalid format specified |
Show trend analysis for test scenarios.
mcprobe trends [OPTIONS]Analyzes historical test results to detect trends in pass rates and scores over time. Helps identify regressions, improvements, and performance patterns.
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--scenario |
-s |
string | None | Scenario name to analyze (all scenarios if not specified) |
--window |
-w |
int | 10 | Number of recent runs to consider |
--results-dir |
-d |
Path | test-results |
Directory containing test results |
Show trends for all scenarios:
mcprobe trendsAnalyze a specific scenario with larger window:
mcprobe trends --scenario "Weather Query Test" --window 20Use custom results directory:
mcprobe trends -d ./historical-results -w 30Trend Analysis
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Scenario ┃ Runs ┃ Pass Rate ┃ Pass Trend ┃ Avg Score ┃ Score Trend ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Weather Query Test │ 20 │ 95% │ ↑ │ 0.92 │ → │
│ Greeting Test │ 20 │ 100% │ → │ 0.98 │ ↑ │
│ Error Handling │ 15 │ 80% │ ↓ │ 0.75 │ ↓ │
└─────────────────────┴──────┴───────────┴────────────┴───────────┴─────────────┘
⚠ Detected Regressions:
[high] Error Handling: pass_rate dropped 15.0%
[medium] Error Handling: avg_score dropped 12.5%
↑- Improving trend (statistically significant improvement)↓- Degrading trend (statistically significant degradation)→- Stable trend (no significant change)
| Code | Meaning |
|---|---|
| 0 | Trend analysis completed successfully or insufficient data |
| 1 | Results directory not found |
Detect flaky (inconsistent) test scenarios.
mcprobe flaky [OPTIONS]Identifies scenarios with inconsistent pass/fail results or high score variance, indicating potential flakiness in the test or system under test.
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--min-runs |
-n |
int | 5 | Minimum runs required for analysis |
--results-dir |
-d |
Path | test-results |
Directory containing test results |
--fail-on-flaky |
flag | False | Exit with error code if flaky tests are detected |
Detect flaky scenarios:
mcprobe flakyRequire more runs for analysis:
mcprobe flaky --min-runs 10Use in CI to fail builds with flaky tests:
mcprobe flaky --min-runs 5 --fail-on-flakyWhen flaky tests are detected:
Flaky Scenarios
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Scenario ┃ Pass Rate ┃ Runs ┃ Severity ┃ Reason ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Weather Query │ 60% │ 10 │ high │ Pass rate in flaky range │
│ Complex Workflow │ 90% │ 15 │ medium │ High score variance │
└────────────────────┴───────────┴──────┴──────────┴────────────────────────────┘
Found 2 flaky scenario(s)
When no flaky tests are detected:
No flaky scenarios detected.
MCProbe considers a scenario flaky if:
- Pass rate is between 20% and 80% (neither consistently passing nor failing)
- Score has high variance (standard deviation > 0.15)
- Results show inconsistent patterns over time
- high: Pass rate between 30-70% or very high score variance
- medium: Pass rate between 20-30% or 70-80%, moderate variance
- low: Borderline cases with slight inconsistency
| Code | Meaning |
|---|---|
| 0 | No flaky scenarios detected |
| 1 | Flaky scenarios detected (only when --fail-on-flaky is used) or results directory not found |
Check stability of a specific scenario.
mcprobe stability-check SCENARIO_NAME [OPTIONS]Returns detailed stability metrics for a specified scenario, including pass rate, mean score, and score variance.
| Argument | Type | Required | Description |
|---|---|---|---|
SCENARIO_NAME |
string | Yes | Name of scenario to check |
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--min-runs |
-n |
int | 5 | Minimum runs required for analysis |
--results-dir |
-d |
Path | test-results |
Directory containing test results |
Check stability of a scenario:
mcprobe stability-check "Weather Query Test"Require more runs for analysis:
mcprobe stability-check "Complex Workflow" --min-runs 10For a stable scenario:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Stability Check: Weather Query Test ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
Run Count 15
Pass Rate 100%
Mean Score 0.95
Score Std Dev 0.023
✓ Scenario is stable
For an unstable scenario:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Stability Check: Complex Workflow ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
Run Count 12
Pass Rate 67%
Mean Score 0.73
Score Std Dev 0.182
✗ Scenario is unstable
- Pass rate in flaky range (20-80%)
- High score variance (std dev > 0.15)
For insufficient data:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Stability Check: New Test ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
Insufficient data for stability analysis
Run count: 3 (need at least 5)
| Code | Meaning |
|---|---|
| 0 | Check completed (regardless of stability result) |
| 1 | Results directory not found |
List available LLM providers.
mcprobe providersDisplays all registered LLM providers that can be used with MCProbe. Currently used for informational purposes.
mcprobe providersAvailable Providers
┏━━━━━━━━━━┓
┃ Provider ┃
┡━━━━━━━━━━┩
│ ollama │
└──────────┘
| Code | Meaning |
|---|---|
| 0 | Always successful |
Start an MCP server for AI assistant integration.
mcprobe serve [OPTIONS]Starts a Model Context Protocol (MCP) server that exposes MCProbe test results and control capabilities to AI assistants like Claude Code. The server runs on stdio transport, making it easy to configure as an MCP server in tools that support the protocol.
The server provides tools for:
- Discovery: List available scenarios and test results
- Inspection: View conversations, judgments, and suggestions
- Analysis: Get trend analysis for scenarios
- Control: Run test scenarios (requires config file)
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--results-dir |
-r |
Path | test-results |
Directory containing test results |
--scenarios-dir |
-s |
Path | . |
Directory containing scenario files |
--config |
-c |
Path | None | Path to mcprobe.yaml config file (required for run_scenario tool) |
The server exposes the following tools to AI assistants:
| Tool | Description |
|---|---|
list_scenarios |
List available test scenario files |
list_results |
List recent test run results with optional filtering |
| Tool | Description |
|---|---|
get_result |
Get complete test run result by ID |
get_conversation |
Get formatted conversation transcript |
get_judgment |
Get judge evaluation with criteria results |
get_suggestions |
Get MCP improvement suggestions |
| Tool | Description |
|---|---|
get_trends |
Get trend analysis for a scenario |
get_latest |
Get the most recent test result |
| Tool | Description |
|---|---|
run_scenario |
Run a test scenario (requires --config) |
Start server for viewing results only:
mcprobe serve --results-dir ./test-resultsStart server with scenario execution capability:
mcprobe serve -r ./test-results -s ./scenarios -c ./mcprobe.yamlAdd MCProbe to Claude Code using the CLI:
# Basic setup (results viewing only)
claude mcp add --transport stdio mcprobe -- mcprobe serve -r ./test-results -s ./scenarios
# With test execution enabled
claude mcp add --transport stdio mcprobe -- mcprobe serve -r ./test-results -s ./scenarios -c ./mcprobe.yamlOr create .mcp.json in your project root:
{
"mcpServers": {
"mcprobe": {
"type": "stdio",
"command": "mcprobe",
"args": ["serve", "-r", "./test-results", "-s", "./scenarios", "-c", "./mcprobe.yaml"]
}
}
}Verify with claude mcp list and use /mcp within Claude Code to check status. You can then ask Claude to:
- "List the recent test results"
- "Show me the conversation from the last failed test"
- "What suggestions does the judge have for improving the MCP server?"
- "Run the weather-query scenario and show me the results"
See Claude Code Integration for detailed setup instructions.
| Code | Meaning |
|---|---|
| 0 | Server stopped normally |
| 1 | Configuration error or startup failure |
MCProbe supports environment variables in two ways:
Some LLM providers use environment variables for configuration:
| Variable | Provider | Description | Default |
|---|---|---|---|
OPENAI_API_KEY |
openai | OpenAI API key | Required for OpenAI |
OLLAMA_BASE_URL |
ollama | Ollama server URL | http://localhost:11434 |
Environment variables can be used in configuration files with ${VAR} or ${VAR:-default} syntax:
# mcprobe.yaml
llm:
provider: ${LLM_PROVIDER:-ollama}
model: ${LLM_MODEL:-llama3.2}
api_key: ${OPENAI_API_KEY} # Required if not set
base_url: ${OLLAMA_BASE_URL:-http://localhost:11434} # Optional with defaultMCProbe supports YAML configuration files for centralized configuration:
MCProbe automatically discovers configuration files in this order:
mcprobe.yaml.mcprobe.yamlmcprobe.yml.mcprobe.yml
Configuration files are discovered in:
- Explicit path via
--configoption - Current working directory
# Agent configuration (system under test)
agent:
type: adk # or "simple"
factory: my_agent_factory.py # required for ADK agents
# Shared LLM configuration (for judge and synthetic user)
llm:
provider: ollama
model: llama3.2
base_url: http://localhost:11434
temperature: 0.0
max_tokens: 4096
# Component-specific overrides
judge:
provider: openai
model: gpt-4
api_key: ${OPENAI_API_KEY}
synthetic_user:
provider: ollama
model: llama3.2
# Orchestrator settings
orchestrator:
max_turns: 10
turn_timeout_seconds: 30.0
loop_detection_threshold: 3
# Results storage
results:
save: true
dir: test-resultsImportant: The agent: section configures the system being tested, while llm:, judge:, and synthetic_user: configure the MCProbe evaluation components. ADK agents use Gemini internally regardless of the llm: settings.
- CLI arguments (highest priority)
- Component-specific config (
judge:,synthetic_user:) - Shared LLM config (
llm:) - Environment variables
- Default values (lowest priority)
See Configuration Reference for complete documentation.
- Running Tests - Detailed guide for the
runcommand - Generating Scenarios - Scenario generation guide
- Analysis Commands - Trends, flaky detection, and reporting
- MCP Server - AI assistant integration guide
- Scenario Format - YAML scenario specification