Skip to content

feat(compare): add acpx compare to run one prompt across multiple agents#320

Open
mvanhorn wants to merge 1 commit into
openclaw:mainfrom
mvanhorn:feat/acpx-compare
Open

feat(compare): add acpx compare to run one prompt across multiple agents#320
mvanhorn wants to merge 1 commit into
openclaw:mainfrom
mvanhorn:feat/acpx-compare

Conversation

@mvanhorn
Copy link
Copy Markdown

Summary

  • Adds acpx compare <agent>... '<prompt>'. Runs the same prompt against multiple ACP-compatible agents in parallel (via Promise.allSettled) and emits a per-agent table: wall-clock time, token usage (from usage_update events already in the protocol stream), stop reason, first 200 chars of final message, and transcript path.
  • Token data is aggregated from session/update.usage_update events the protocol already produces. This PR surfaces existing data; no new state is introduced.
  • Per-agent transcripts persist to ~/.acpx/compare/<run-id>/<agent>.ndjson so they survive the table render and stay reviewable later.
  • Flags: --cwd <dir>, --deny-all / --approve-all / --approve-reads (default deny-all), --timeout <seconds> (default 300, per-agent), --json for CompareRow[] output, --diff to run each agent in an isolated git worktree when --approve-all, -f, --prompt-file <path>.

Why this matters

acpx already supports calling any individual ACP-compatible agent: acpx codex 'fix the test', acpx claude 'fix the test', acpx pi 'fix the test'. What's missing is the natural next step — running the same prompt across multiple agents in one command and seeing the results side-by-side.

"Which agent should I use for this task?" is unsolved in ACP-land. The current workflow is to run the prompt under each agent and compare by hand. One command closes that:

acpx compare codex claude pi 'fix the failing test in checkout.spec.ts'
acpx compare codex claude --json | jq '.[] | select(.status == "ok") | .agent'
acpx --approve-all compare codex claude 'refactor auth.ts' --diff   # isolated worktrees

Each agent's full NDJSON transcript is persisted to ~/.acpx/compare/<run-id>/<agent>.ndjson so the table render is a summary, not the only output.

Demo

Simulated demo:

acpx compare demo

The demo shows acpx compare codex claude pi 'fix the failing test' against three agents: codex finishes in 8.4s with a concise fix, claude takes 14.1s with deeper analysis, pi times out at the 300s cap. The viewer sees all three outcomes in one table — exactly the picking-an-agent decision the feature exists to support.

Testing

  • corepack pnpm typecheck
  • corepack pnpm lint (oxlint + oxfmt + flow-schema-terms + persisted-key-casing, all clean)
  • corepack pnpm test — 675 tests pass; new test/compare-command.test.ts uses stub agents to cover:
    • multi-agent run produces one table row per agent
    • --json returns valid CompareRow[]
    • an erroring agent shows status: error with error preview; other agents still ok
    • --timeout <s> cancels agents past the per-agent budget (status: cancelled)
    • token totals populate from stubbed usage_update events
    • transcripts persist to ~/.acpx/compare/<run-id>/<agent>.ndjson on disk

acpx compare <agent>... '<prompt>' runs the same prompt against multiple
ACP-compatible agents and shows wall-clock time, token usage, stop reason,
and final message preview side-by-side. Use it to pick the right agent
for a task.

Each agent runs in parallel via Promise.allSettled. Per-agent transcripts
are persisted to ~/.acpx/compare/<run-id>/<agent>.ndjson so they survive
the table render and can be inspected later.

Token data comes from usage_update events already in the protocol stream;
this PR aggregates and presents, no new state introduced.

Flags:
- --cwd <dir>: target workspace
- --deny-all / --approve-all / --approve-reads: permission mode (default: deny-all)
- --timeout <seconds>: per-agent timeout (default 300)
- --json: emit CompareRow[] as JSON
- --diff: in approve-all mode, run each agent in an isolated worktree
- -f, --prompt-file <path>: read prompt from file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant