feat(compare): add acpx compare to run one prompt across multiple agents by mvanhorn · Pull Request #320 · openclaw/acpx

mvanhorn · 2026-05-16T17:47:58Z

Summary

Adds acpx compare <agent>... '<prompt>'. Runs the same prompt against multiple ACP-compatible agents in parallel (via Promise.allSettled) and emits a per-agent table: wall-clock time, token usage (from usage_update events already in the protocol stream), stop reason, first 200 chars of final message, and transcript path.
Token data is aggregated from session/update.usage_update events the protocol already produces. This PR surfaces existing data; no new state is introduced.
Per-agent transcripts persist to ~/.acpx/compare/<run-id>/<agent>.ndjson so they survive the table render and stay reviewable later.
Flags: --cwd <dir>, --deny-all / --approve-all / --approve-reads (default deny-all), --timeout <seconds> (default 300, per-agent), --json for CompareRow[] output, --diff to run each agent in an isolated git worktree when --approve-all, -f, --prompt-file <path>.

Why this matters

acpx already supports calling any individual ACP-compatible agent: acpx codex 'fix the test', acpx claude 'fix the test', acpx pi 'fix the test'. What's missing is the natural next step — running the same prompt across multiple agents in one command and seeing the results side-by-side.

"Which agent should I use for this task?" is unsolved in ACP-land. The current workflow is to run the prompt under each agent and compare by hand. One command closes that:

acpx compare codex claude pi 'fix the failing test in checkout.spec.ts'
acpx compare codex claude --json | jq '.[] | select(.status == "ok") | .agent'
acpx --approve-all compare codex claude 'refactor auth.ts' --diff   # isolated worktrees

Each agent's full NDJSON transcript is persisted to ~/.acpx/compare/<run-id>/<agent>.ndjson so the table render is a summary, not the only output.

Demo

Simulated demo:

The demo shows acpx compare codex claude pi 'fix the failing test' against three agents: codex finishes in 8.4s with a concise fix, claude takes 14.1s with deeper analysis, pi times out at the 300s cap. The viewer sees all three outcomes in one table — exactly the picking-an-agent decision the feature exists to support.

Testing

corepack pnpm typecheck
corepack pnpm lint (oxlint + oxfmt + flow-schema-terms + persisted-key-casing, all clean)
corepack pnpm test — 675 tests pass; new test/compare-command.test.ts uses stub agents to cover:
- multi-agent run produces one table row per agent
- --json returns valid CompareRow[]
- an erroring agent shows status: error with error preview; other agents still ok
- --timeout <s> cancels agents past the per-agent budget (status: cancelled)
- token totals populate from stubbed usage_update events
- transcripts persist to ~/.acpx/compare/<run-id>/<agent>.ndjson on disk

acpx compare <agent>... '<prompt>' runs the same prompt against multiple ACP-compatible agents and shows wall-clock time, token usage, stop reason, and final message preview side-by-side. Use it to pick the right agent for a task. Each agent runs in parallel via Promise.allSettled. Per-agent transcripts are persisted to ~/.acpx/compare/<run-id>/<agent>.ndjson so they survive the table render and can be inspected later. Token data comes from usage_update events already in the protocol stream; this PR aggregates and presents, no new state introduced. Flags: - --cwd <dir>: target workspace - --deny-all / --approve-all / --approve-reads: permission mode (default: deny-all) - --timeout <seconds>: per-agent timeout (default 300) - --json: emit CompareRow[] as JSON - --diff: in approve-all mode, run each agent in an isolated worktree - -f, --prompt-file <path>: read prompt from file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(compare): add acpx compare to run one prompt across multiple agents#320

feat(compare): add acpx compare to run one prompt across multiple agents#320
mvanhorn wants to merge 1 commit into
openclaw:mainfrom
mvanhorn:feat/acpx-compare

mvanhorn commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mvanhorn commented May 16, 2026

Summary

Why this matters

Demo

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant