feat(compare): add acpx compare to run one prompt across multiple agents#320
Open
mvanhorn wants to merge 1 commit into
Open
feat(compare): add acpx compare to run one prompt across multiple agents#320mvanhorn wants to merge 1 commit into
mvanhorn wants to merge 1 commit into
Conversation
acpx compare <agent>... '<prompt>' runs the same prompt against multiple ACP-compatible agents and shows wall-clock time, token usage, stop reason, and final message preview side-by-side. Use it to pick the right agent for a task. Each agent runs in parallel via Promise.allSettled. Per-agent transcripts are persisted to ~/.acpx/compare/<run-id>/<agent>.ndjson so they survive the table render and can be inspected later. Token data comes from usage_update events already in the protocol stream; this PR aggregates and presents, no new state introduced. Flags: - --cwd <dir>: target workspace - --deny-all / --approve-all / --approve-reads: permission mode (default: deny-all) - --timeout <seconds>: per-agent timeout (default 300) - --json: emit CompareRow[] as JSON - --diff: in approve-all mode, run each agent in an isolated worktree - -f, --prompt-file <path>: read prompt from file
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
acpx compare <agent>... '<prompt>'. Runs the same prompt against multiple ACP-compatible agents in parallel (viaPromise.allSettled) and emits a per-agent table: wall-clock time, token usage (fromusage_updateevents already in the protocol stream), stop reason, first 200 chars of final message, and transcript path.session/update.usage_updateevents the protocol already produces. This PR surfaces existing data; no new state is introduced.~/.acpx/compare/<run-id>/<agent>.ndjsonso they survive the table render and stay reviewable later.--cwd <dir>,--deny-all/--approve-all/--approve-reads(defaultdeny-all),--timeout <seconds>(default 300, per-agent),--jsonfor CompareRow[] output,--diffto run each agent in an isolated git worktree when--approve-all,-f, --prompt-file <path>.Why this matters
acpx already supports calling any individual ACP-compatible agent:
acpx codex 'fix the test',acpx claude 'fix the test',acpx pi 'fix the test'. What's missing is the natural next step — running the same prompt across multiple agents in one command and seeing the results side-by-side."Which agent should I use for this task?" is unsolved in ACP-land. The current workflow is to run the prompt under each agent and compare by hand. One command closes that:
Each agent's full NDJSON transcript is persisted to
~/.acpx/compare/<run-id>/<agent>.ndjsonso the table render is a summary, not the only output.Demo
Simulated demo:
The demo shows
acpx compare codex claude pi 'fix the failing test'against three agents: codex finishes in 8.4s with a concise fix, claude takes 14.1s with deeper analysis, pi times out at the 300s cap. The viewer sees all three outcomes in one table — exactly the picking-an-agent decision the feature exists to support.Testing
corepack pnpm typecheckcorepack pnpm lint(oxlint + oxfmt + flow-schema-terms + persisted-key-casing, all clean)corepack pnpm test— 675 tests pass; newtest/compare-command.test.tsuses stub agents to cover:--jsonreturns valid CompareRow[]status: errorwith error preview; other agents still ok--timeout <s>cancels agents past the per-agent budget (status: cancelled)~/.acpx/compare/<run-id>/<agent>.ndjsonon disk