feat: stub agent driver for QA acceleration#33
Conversation
Spec for a lightweight stub agent binary that implements the MCP bridge protocol with deterministic echo responses, enabling ~30 QA cases to run in sub-second time without real LLM backends. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Call out resumable_session_id exhaustive match needing Stub arm - Specify UI hiding: add to all_runtime_drivers, filter in handler - Document that POST /api/agents allows stub without gating (acceptable) - Note Cargo workspace introduction for crates/stub-agent - Use distinct stub-a/b/c names to avoid collision with bot-a/b/c Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9-task plan covering workspace setup, enum variant, driver impl, runtime filtering, MCP client binary, integration test, QA harness updates, and proof-of-concept spec wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces placeholder with full stub agent that: - Parses --mcp-config to find bridge command, spawns it as child process - Connects as MCP client via rmcp over stdio pipes - Loops: wait_for_message -> extract token -> send_message - Emits JSON status lines to stdout for the agent manager - Drains stdin in background to prevent buffer fill-up Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a deterministic “stub” agent runtime to accelerate QA by running a lightweight local agent process that replies via the MCP bridge without using real LLM backends.
Changes:
- Add a new Rust workspace member (
chorus-stub-agent) plus a server-sideStubDriverandAgentRuntime::Stub. - Hide the stub runtime from runtime-status listing while keeping it available for API-created agents.
- Update Playwright QA helpers + MSG-002 to support
CHORUS_E2E_LLM=stub, and document astub-triopreset.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
Cargo.toml |
Convert repo to a Cargo workspace to include the stub-agent crate. |
Cargo.lock |
Add dependencies required by the new stub-agent crate (and transitive adds). |
crates/stub-agent/Cargo.toml |
Define the chorus-stub-agent binary crate and its deps (rmcp, tokio, regex, uuid). |
crates/stub-agent/src/main.rs |
Implement MCP client loop that waits for messages and sends deterministic echo/fallback replies. |
src/store/agents.rs |
Add AgentRuntime::Stub and wire parse()/as_str(). |
src/agent/manager.rs |
Register the stub driver and ensure stub has no resumable session behavior. |
src/agent/runtime_status.rs |
Filter stub from list_statuses() output (keeps it hidden from UI runtime lists). |
src/agent/drivers/mod.rs |
Register the new stub driver module and include it in all_runtime_drivers(). |
src/agent/drivers/stub.rs |
Implement StubDriver spawn + stdout event parsing + model/status reporting. |
qa/cases/playwright/helpers/api.ts |
Add ensureStubTrio() and agentNames() helpers for stub mode. |
qa/cases/playwright/MSG-002.spec.ts |
Wire MSG-002 to use stub trio + mode-aware agent naming in stub mode. |
qa/QA_PRESETS.md |
Document the new stub-trio preset and CHORUS_E2E_LLM=stub usage. |
docs/superpowers/specs/2026-04-03-stub-agent-driver-design.md |
Add design spec for the stub agent driver/binary and QA integration. |
docs/superpowers/plans/2026-04-03-stub-agent-driver.md |
Add detailed implementation plan for the stub agent driver/binary and QA wiring. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let line = line.trim(); | ||
| if line.is_empty() || line.contains("No new messages.") { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
wait_for_message responses include a trailing "Reply instructions" section (see bridge formatting), and this loop currently processes every line. Lines like "Reply instructions:" won’t match the [target=...] header, so the stub will still attempt to generate a token and send an extra message (often to the fallback target). Filter the loop to only handle actual message header lines (e.g., lines starting with [target=), and ignore the instruction/footer lines entirely.
| } | |
| } | |
| // Only handle actual message header lines, e.g. "[target=...]" | |
| if !line.starts_with("[target=") { | |
| continue; | |
| } |
| continue; | ||
| } | ||
|
|
||
| let target = parse_target(line).unwrap_or_else(|| "#general".to_string()); |
There was a problem hiding this comment.
Defaulting to #general when parse_target() fails is unsafe here because any unexpected/auxiliary line (including "Reply instructions" or future formatting changes) will cause the stub to send messages to #general. Prefer skipping lines where the target cannot be parsed, or fail the turn with an error event so the behavior is deterministic and doesn’t leak messages into unrelated channels.
| let target = parse_target(line).unwrap_or_else(|| "#general".to_string()); | |
| let Some(target) = parse_target(line) else { | |
| emit_error(&format!("Could not parse target from line: {line}")); | |
| continue; | |
| }; |
| .unwrap_or(200); | ||
|
|
||
| // Process initial prompt | ||
| emit_text(&format!("Processing prompt: {}", args.prompt)); |
There was a problem hiding this comment.
This logs the entire --prompt content to the manager activity log. The prompt can be very large and may contain sensitive or noisy context (agent descriptions, unread summaries, etc.), which hurts log signal and could leak data. Consider emitting a short fixed message (or a truncated prompt preview) instead.
| emit_text(&format!("Processing prompt: {}", args.prompt)); | |
| emit_text("Processing prompt"); |
| .into_values() | ||
| .next() | ||
| .context("No MCP server entries in config")?; | ||
| Ok((entry.command, entry.args)) |
There was a problem hiding this comment.
load_mcp_config() picks the first value from a HashMap (into_values().next()), which is non-deterministic if the config ever contains multiple MCP servers. Since the driver writes a chat entry, it’s safer to explicitly select mcpServers["chat"] (or otherwise choose deterministically) so the stub always launches the expected bridge command.
| .into_values() | |
| .next() | |
| .context("No MCP server entries in config")?; | |
| Ok((entry.command, entry.args)) | |
| .get("chat") | |
| .context("No MCP server entry named 'chat' in config")?; | |
| Ok((entry.command.clone(), entry.args.clone())) |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… #general fallback - Select mcpServers["chat"] deterministically instead of HashMap::next() - Ignore non-message lines; require parseable target (no #general default) - Emit short processing status instead of full --prompt Made-with: Cursor
- default-members so cargo build produces chorus-stub-agent - Silence dead_code on stub Args.prompt (CLI still requires --prompt) - ensureStubTrio: retry POST /api/agents; clickComboboxOption for Radix - ERR-001: mock POST /api/attachments; assert toast copy - Stub-aware: CHN-001/003, TMT-001/002/003/005, MSG-001/004/005/011/012, REC-002 - createAgentViaUi: exact model option; createTeamQaEngViaUi uses agentNames() - ACT-002: disambiguate activity rows; AGT-004: longer senderDeleted poll Made-with: Cursor
- stub-agent: parse_content allows spaces in @sender (OS usernames) - playwright: 600s default timeout when CHORUS_E2E_LLM=stub (fixtures + slow polls) - openMembersPanel: no-op if panel already open (fixes CHN-003 after #all) - AGT-004: ensureStubTrio + stub runtime; wait for #all agent reply before delete; skip Reasoning UI edits when stub (form has no combobox) - MSG-004: longer stub poll window; CHN-002/003, REC-002, TMT-006 timeouts / Team Settings dialog Made-with: Cursor
…eady wait - MSG-001: require OK-a/b/c on messages from stub-a/b/c (human lines also contain tokens) - REC-002: poll #all history using agent rows only; wait for thread anchor in UI - waitForAppReady: 90s when CHORUS_E2E_LLM=stub Made-with: Cursor
…fore thread - MSG-001: wait until ≥3 agent senders and OK-a/b/c appear in agent bodies (not human lines); case-insensitive senderType; longer stub timeout - gotoApp: retry once on stub when sidebar shell is slow - REC-002: reload #all after history sees marks so thread anchor is in the DOM Made-with: Cursor
Made-with: Cursor
- QA_PRESETS: correct TMT stub skip list; note chorus-stub-agent build, timeouts, CHORUS_WORKERS - README: document CHORUS_E2E_LLM=stub, CHORUS_WORKERS, default per-worker server vs CHORUS_BASE_URL Made-with: Cursor
Summary
crates/stub-agent/) that echoes messages back through the MCP bridge with deterministic responsesAgentRuntime::Stubdriver registered server-side but hidden from the UI's create-agent modalCHORUS_E2E_LLM=stubmode enables ~30 QA cases to run in sub-second time without real LLM backendsreply with "hello"get echoed back; fallback tostub-reply-{n}ensureStubTrio(),agentNames()helpers,stub-triopreset, MSG-002 wired as proof-of-conceptTest plan
cargo build— bothchorusandchorus-stub-agentcompilestub-reply-{n}response/api/runtimeshidesstubfrom the UICHORUS_E2E_LLM=stub npx playwright test MSG-002.spec.ts— end-to-end with Playwright🤖 Generated with Claude Code