Skip to content

feat: stub agent driver for QA acceleration#33

Open
Fullstop000 wants to merge 19 commits intomainfrom
claude/stub-agent-driver
Open

feat: stub agent driver for QA acceleration#33
Fullstop000 wants to merge 19 commits intomainfrom
claude/stub-agent-driver

Conversation

@Fullstop000
Copy link
Copy Markdown
Owner

Summary

  • Add a lightweight stub agent binary (crates/stub-agent/) that echoes messages back through the MCP bridge with deterministic responses
  • New AgentRuntime::Stub driver registered server-side but hidden from the UI's create-agent modal
  • CHORUS_E2E_LLM=stub mode enables ~30 QA cases to run in sub-second time without real LLM backends
  • Token extraction: messages like reply with "hello" get echoed back; fallback to stub-reply-{n}
  • QA harness: ensureStubTrio(), agentNames() helpers, stub-trio preset, MSG-002 wired as proof-of-concept

Test plan

  • cargo build — both chorus and chorus-stub-agent compile
  • Integration smoke test: create stub agent via API, send DM with echo token, verify reply
  • Fallback token: messages without patterns get stub-reply-{n} response
  • /api/runtimes hides stub from the UI
  • CHORUS_E2E_LLM=stub npx playwright test MSG-002.spec.ts — end-to-end with Playwright

🤖 Generated with Claude Code

Fullstop000 and others added 11 commits April 3, 2026 00:10
Spec for a lightweight stub agent binary that implements the MCP bridge
protocol with deterministic echo responses, enabling ~30 QA cases to run
in sub-second time without real LLM backends.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Call out resumable_session_id exhaustive match needing Stub arm
- Specify UI hiding: add to all_runtime_drivers, filter in handler
- Document that POST /api/agents allows stub without gating (acceptable)
- Note Cargo workspace introduction for crates/stub-agent
- Use distinct stub-a/b/c names to avoid collision with bot-a/b/c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9-task plan covering workspace setup, enum variant, driver impl,
runtime filtering, MCP client binary, integration test, QA harness
updates, and proof-of-concept spec wiring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces placeholder with full stub agent that:
- Parses --mcp-config to find bridge command, spawns it as child process
- Connects as MCP client via rmcp over stdio pipes
- Loops: wait_for_message -> extract token -> send_message
- Emits JSON status lines to stdout for the agent manager
- Drains stdin in background to prevent buffer fill-up

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 2, 2026 17:00
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a deterministic “stub” agent runtime to accelerate QA by running a lightweight local agent process that replies via the MCP bridge without using real LLM backends.

Changes:

  • Add a new Rust workspace member (chorus-stub-agent) plus a server-side StubDriver and AgentRuntime::Stub.
  • Hide the stub runtime from runtime-status listing while keeping it available for API-created agents.
  • Update Playwright QA helpers + MSG-002 to support CHORUS_E2E_LLM=stub, and document a stub-trio preset.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
Cargo.toml Convert repo to a Cargo workspace to include the stub-agent crate.
Cargo.lock Add dependencies required by the new stub-agent crate (and transitive adds).
crates/stub-agent/Cargo.toml Define the chorus-stub-agent binary crate and its deps (rmcp, tokio, regex, uuid).
crates/stub-agent/src/main.rs Implement MCP client loop that waits for messages and sends deterministic echo/fallback replies.
src/store/agents.rs Add AgentRuntime::Stub and wire parse()/as_str().
src/agent/manager.rs Register the stub driver and ensure stub has no resumable session behavior.
src/agent/runtime_status.rs Filter stub from list_statuses() output (keeps it hidden from UI runtime lists).
src/agent/drivers/mod.rs Register the new stub driver module and include it in all_runtime_drivers().
src/agent/drivers/stub.rs Implement StubDriver spawn + stdout event parsing + model/status reporting.
qa/cases/playwright/helpers/api.ts Add ensureStubTrio() and agentNames() helpers for stub mode.
qa/cases/playwright/MSG-002.spec.ts Wire MSG-002 to use stub trio + mode-aware agent naming in stub mode.
qa/QA_PRESETS.md Document the new stub-trio preset and CHORUS_E2E_LLM=stub usage.
docs/superpowers/specs/2026-04-03-stub-agent-driver-design.md Add design spec for the stub agent driver/binary and QA integration.
docs/superpowers/plans/2026-04-03-stub-agent-driver.md Add detailed implementation plan for the stub agent driver/binary and QA wiring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

let line = line.trim();
if line.is_empty() || line.contains("No new messages.") {
continue;
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_for_message responses include a trailing "Reply instructions" section (see bridge formatting), and this loop currently processes every line. Lines like "Reply instructions:" won’t match the [target=...] header, so the stub will still attempt to generate a token and send an extra message (often to the fallback target). Filter the loop to only handle actual message header lines (e.g., lines starting with [target=), and ignore the instruction/footer lines entirely.

Suggested change
}
}
// Only handle actual message header lines, e.g. "[target=...]"
if !line.starts_with("[target=") {
continue;
}

Copilot uses AI. Check for mistakes.
Comment thread crates/stub-agent/src/main.rs Outdated
continue;
}

let target = parse_target(line).unwrap_or_else(|| "#general".to_string());
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to #general when parse_target() fails is unsafe here because any unexpected/auxiliary line (including "Reply instructions" or future formatting changes) will cause the stub to send messages to #general. Prefer skipping lines where the target cannot be parsed, or fail the turn with an error event so the behavior is deterministic and doesn’t leak messages into unrelated channels.

Suggested change
let target = parse_target(line).unwrap_or_else(|| "#general".to_string());
let Some(target) = parse_target(line) else {
emit_error(&format!("Could not parse target from line: {line}"));
continue;
};

Copilot uses AI. Check for mistakes.
Comment thread crates/stub-agent/src/main.rs Outdated
.unwrap_or(200);

// Process initial prompt
emit_text(&format!("Processing prompt: {}", args.prompt));
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logs the entire --prompt content to the manager activity log. The prompt can be very large and may contain sensitive or noisy context (agent descriptions, unread summaries, etc.), which hurts log signal and could leak data. Consider emitting a short fixed message (or a truncated prompt preview) instead.

Suggested change
emit_text(&format!("Processing prompt: {}", args.prompt));
emit_text("Processing prompt");

Copilot uses AI. Check for mistakes.
Comment thread crates/stub-agent/src/main.rs Outdated
Comment on lines +70 to +73
.into_values()
.next()
.context("No MCP server entries in config")?;
Ok((entry.command, entry.args))
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_mcp_config() picks the first value from a HashMap (into_values().next()), which is non-deterministic if the config ever contains multiple MCP servers. Since the driver writes a chat entry, it’s safer to explicitly select mcpServers["chat"] (or otherwise choose deterministically) so the stub always launches the expected bridge command.

Suggested change
.into_values()
.next()
.context("No MCP server entries in config")?;
Ok((entry.command, entry.args))
.get("chat")
.context("No MCP server entry named 'chat' in config")?;
Ok((entry.command.clone(), entry.args.clone()))

Copilot uses AI. Check for mistakes.
Fullstop000 and others added 8 commits April 3, 2026 01:13
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… #general fallback

- Select mcpServers["chat"] deterministically instead of HashMap::next()
- Ignore non-message lines; require parseable target (no #general default)
- Emit short processing status instead of full --prompt

Made-with: Cursor
- default-members so cargo build produces chorus-stub-agent
- Silence dead_code on stub Args.prompt (CLI still requires --prompt)
- ensureStubTrio: retry POST /api/agents; clickComboboxOption for Radix
- ERR-001: mock POST /api/attachments; assert toast copy
- Stub-aware: CHN-001/003, TMT-001/002/003/005, MSG-001/004/005/011/012, REC-002
- createAgentViaUi: exact model option; createTeamQaEngViaUi uses agentNames()
- ACT-002: disambiguate activity rows; AGT-004: longer senderDeleted poll

Made-with: Cursor
- stub-agent: parse_content allows spaces in @sender (OS usernames)
- playwright: 600s default timeout when CHORUS_E2E_LLM=stub (fixtures + slow polls)
- openMembersPanel: no-op if panel already open (fixes CHN-003 after #all)
- AGT-004: ensureStubTrio + stub runtime; wait for #all agent reply before delete;
  skip Reasoning UI edits when stub (form has no combobox)
- MSG-004: longer stub poll window; CHN-002/003, REC-002, TMT-006 timeouts / Team Settings dialog

Made-with: Cursor
…eady wait

- MSG-001: require OK-a/b/c on messages from stub-a/b/c (human lines also contain tokens)
- REC-002: poll #all history using agent rows only; wait for thread anchor in UI
- waitForAppReady: 90s when CHORUS_E2E_LLM=stub

Made-with: Cursor
…fore thread

- MSG-001: wait until ≥3 agent senders and OK-a/b/c appear in agent bodies (not human lines); case-insensitive senderType; longer stub timeout
- gotoApp: retry once on stub when sidebar shell is slow
- REC-002: reload #all after history sees marks so thread anchor is in the DOM

Made-with: Cursor
- QA_PRESETS: correct TMT stub skip list; note chorus-stub-agent build, timeouts, CHORUS_WORKERS
- README: document CHORUS_E2E_LLM=stub, CHORUS_WORKERS, default per-worker server vs CHORUS_BASE_URL

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants