Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Repo: https://github.com/openclaw/acpx

### Changes

- CLI: add `acpx compare` to run one prompt across multiple agents, summarize timing, token usage, stop reason, final output preview, and persisted per-agent transcripts.

### Breaking

### Fixes
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ One command surface for Pi, OpenClaw ACP, Codex, Claude, and other ACP-compatibl
- **Structured output**: typed ACP messages (thinking, tool calls, diffs) instead of ANSI scraping
- **Any ACP agent**: built-in registry + `--agent` escape hatch for custom servers
- **One-shot mode**: `exec` for stateless fire-and-forget tasks
- **Compare across agents**: `acpx compare pi openclaw codex 'fix the bug'` runs the same prompt against multiple ACP-compatible agents and shows wall-clock time, token usage, and final output side by side so you can pick the right agent for a task
- **Experimental flows**: `flow run <file>` for TypeScript workflow modules over multiple prompts
- **Runtime-owned flow actions**: shell-backed action steps can prepare workspaces and other deterministic mechanics outside the agent turn
- **Flow workspace isolation**: `acp` nodes can target an explicit per-step cwd, so flows can keep agent work inside disposable worktrees
Expand Down
90 changes: 90 additions & 0 deletions docs/compare.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Compare Command

`acpx compare` runs the same one-shot prompt across multiple ACP-compatible agents and summarizes the results side by side.

```bash
acpx compare pi codex claude 'fix the failing test in checkout.spec.ts'
```

Each agent runs independently. By default, compare uses `deny-all` permissions, which is best for review, planning, and read-only evaluation prompts.

## Usage

```bash
acpx compare <agent>... '<prompt>'
acpx compare <agent>... -- prompt words after the delimiter
acpx compare <agent>... --prompt-file ./prompt.md
acpx compare <agent>... -f ./prompt.md
```

The final positional argument is treated as the prompt unless `--prompt-file` is provided. When you use `--`, every token after the delimiter is joined into the prompt.

## Options

| Option | Description |
| -------------------------- | ----------------------------------------------------------------------------- |
| `--cwd <dir>` | Target workspace. Defaults to the current working directory. |
| `--deny-all` | Deny all permission requests. This is the default compare permission mode. |
| `--approve-reads` | Auto-approve read/search requests and prompt for writes. |
| `--approve-all` | Auto-approve all permission requests. |
| `--timeout <sec>` | Per-agent timeout in seconds. Defaults to `300`. Decimal seconds are allowed. |
| `--json` | Emit the full `CompareRow[]` payload instead of the text table. |
| `--diff` | Run each agent in an isolated git worktree and include diff summaries. |
| `-f, --prompt-file <path>` | Read prompt text from a file. Use `-` for stdin. |

## Table Output

Text output includes one row per agent:

| Column | Meaning |
| --------------- | ---------------------------------------------------------- |
| `agent` | Agent name or raw command token. |
| `status` | `ok`, `cancelled`, or `error`. |
| `wall_ms` | Wall-clock runtime in milliseconds. |
| `input` | Input token count from the latest `usage_update`, if any. |
| `output` | Output token count from the latest `usage_update`, if any. |
| `context` | Context usage from `usage_update.size` or `used`, if any. |
| `stop_reason` | ACP `session/prompt` stop reason, such as `end_turn`. |
| `final_message` | First 200 characters of assistant text output. |
| `transcript` | NDJSON transcript path. |
| `diff` | Diff summary when `--diff` is set. |
| `error` | Error preview for failed or timed-out runs. |

Transcripts are persisted under:

```text
~/.acpx/compare/<run-id>/<agent>.ndjson
```

## JSON Output

`--json` emits an array of rows:

```json
[
{
"agent": "codex",
"status": "ok",
"stop_reason": "end_turn",
"wall_ms": 1240,
"input_tokens": 1200,
"output_tokens": 340,
"context_used": 1540,
"final_message": "The failing test is caused by...",
"transcript_path": "/Users/me/.acpx/compare/2026-05-16T12-00-00-000Z-a1b2c3/codex.ndjson",
"error": null,
"diff_stat": null,
"diff_path": null
}
]
```

## Diff Mode

When `--diff` is set, each agent runs in a separate detached git worktree created from the current repository `HEAD`. After the run completes, acpx writes the full diff to the compare transcript directory and includes `git diff --stat` in the table.

```bash
acpx compare codex claude --approve-all --diff 'implement the smallest fix'
```

Use diff mode for write-capable comparisons. Without `--diff`, all agents run in the same `--cwd`, which is appropriate for `deny-all` review-style prompts.
1 change: 1 addition & 0 deletions src/cli-core.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ const TOP_LEVEL_VERBS = new Set([
"prompt",
"exec",
"cancel",
"compare",
"flow",
"set-mode",
"set",
Expand Down
2 changes: 2 additions & 0 deletions src/cli/command-registration.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import {
handleSetMode,
parseHistoryLimit,
} from "./command-handlers.js";
import { registerCompareCommand } from "./compare-command.js";
import { registerConfigCommand } from "./config-command.js";
import type { ResolvedAcpxConfig } from "./config.js";
import {
Expand Down Expand Up @@ -280,5 +281,6 @@ export function registerDefaultCommands(program: Command, config: ResolvedAcpxCo

registerSessionsCommand(program, undefined, config);
registerConfigCommand(program, config);
registerCompareCommand(program, config);
registerFlowCommand(program, config);
}
Loading