Skip to content

Add run_scenarios tool to MCP server for batch testing#58

Merged
richardkiene merged 1 commit into
mainfrom
feature/run-multiple-scenarios
Jan 23, 2026
Merged

Add run_scenarios tool to MCP server for batch testing#58
richardkiene merged 1 commit into
mainfrom
feature/run-multiple-scenarios

Conversation

@richardkiene
Copy link
Copy Markdown
Contributor

Summary

Adds a new run_scenarios tool to the MCP server that accepts a list of scenario paths and runs them all sequentially, returning an aggregated summary.

Problem

Previously, Claude Code had to call run_scenario one at a time, making batch testing inefficient.

Solution

New run_scenarios tool that:

  • Accepts a list of scenario paths
  • Runs each scenario sequentially
  • Returns aggregated summary with pass/fail counts

Example output:

## Test Run Summary
**2/3 passed** (1 failed)

### Results
- ✓ `scenario1.yaml`: PASSED (score: 0.95)
- ✓ `scenario2.yaml`: PASSED (score: 0.88)
- ✗ `scenario3.yaml`: FAILED (score: 0.42)

Changes

  • Add ScenarioRunResult dataclass for structured results
  • Extract _execute_scenario helper for reuse
  • Add run_scenarios tool with summary formatting
  • Add tests for new tool

Test plan

  • All 258 unit tests pass
  • Ruff linting passes
  • Mypy type checking passes

Adds a new `run_scenarios` tool that accepts a list of scenario paths
and runs them all, returning an aggregated summary with pass/fail
counts and per-scenario results. This is more efficient than calling
run_scenario repeatedly when testing multiple scenarios.

Changes:
- Add ScenarioRunResult dataclass for structured results
- Extract _execute_scenario helper for reuse
- Add run_scenarios tool with summary formatting
- Add tests for new tool
@richardkiene richardkiene merged commit c6176a0 into main Jan 23, 2026
3 checks passed
@richardkiene richardkiene deleted the feature/run-multiple-scenarios branch January 23, 2026 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant