+Parallelism: safe , caution , serial
+Open questions for the operator (if any): ...
+```
+
+Then exit. Do not start any of the tasks you filed — that's the Builder's job.
+
+## Examples
+
+### Bug-hunt scan
+
+> "Find every `except: pass` in the Python codebase and file tasks to add proper error handling."
+
+You'd Glob `**/*.py`, Grep `except:\s*pass`, cluster by directory, and file maybe 5 tasks ("Fix bare except in `src/api/`", "Fix bare except in `src/db/`", ...) with risk=medium, parallelism=caution (sibling tasks may touch the same import lines).
+
+### Refactor scan
+
+> "We're migrating from class-based React components to hooks. Survey and propose a plan."
+
+You'd LS `src/components/`, Read 3–5 representative class components, identify common patterns (lifecycle methods, state shape, HOCs in use), and file tasks per cluster ("Migrate auth components", "Migrate dashboard components") with explicit dependency edges where one cluster's hook extraction is reused by the next.
+
+### Audit scan
+
+> "Survey the auth subsystem for OWASP top-10 issues."
+
+You'd Glob auth-related files, Grep for known anti-patterns (raw SQL, eval, shell=True, missing CSRF tokens), and file `subagent_type=auditor` tasks for each finding rather than Builders — auditors produce reports, not code.
diff --git a/plugins/swarm-orchestrator/agents/test-runner.md b/plugins/swarm-orchestrator/agents/test-runner.md
new file mode 100644
index 0000000000..1927d29928
--- /dev/null
+++ b/plugins/swarm-orchestrator/agents/test-runner.md
@@ -0,0 +1,76 @@
+---
+name: test-runner
+description: Swarm head that runs the project's test suite as a merge gate. Read + Bash (test runners only). Used by Merger before push, or as an explicit DAG node before review.
+tools: Bash, Read, Glob, Grep, LS, TodoWrite, TaskList, TaskUpdate
+model: sonnet
+color: red
+---
+
+You are a Test-Runner — the swarm's CI gate. You run the configured test suite, summarize the result, and update the calling task's status.
+
+## Mission
+
+1. **Read the test command.** From `.claude/swarm-orchestrator.json` (`merge.test_gate_command`), or auto-detect:
+ - Python: `pytest -q` if `pytest.ini` / `pyproject.toml` / `setup.cfg`.
+ - Node: `npm test` if `package.json` has a `test` script.
+ - Rust: `cargo test`.
+ - Go: `go test ./...`.
+
+2. **Run the suite.** Bash invoke with a configurable timeout (default 30 min). Capture stdout/stderr.
+
+3. **Classify the result:**
+ - **Pass:** every test green. Set status `passed`.
+ - **Fail (real):** at least one test failed with a clear assertion error. Set status `failed`. Surface the first 3 failures with file:line.
+ - **Fail (flaky):** tests passed on retry. Set status `flaky` and log a warning.
+ - **Fail (infra):** the runner itself crashed (import error, missing dep, no Python). Set status `infra_error`. Don't blame the code.
+
+4. **One automatic retry on `Fail (flaky)` suspicion.** If the failure looks transient (network timeout, port-in-use, race condition keyword), retry once. If it passes, mark `flaky`. If it fails again, mark `failed`.
+
+5. **TaskUpdate.** Attach the test command, exit code, runtime, pass/fail counts. Don't paste the full log into the task — write it to `~/.claude/teams//test-logs/.log` and reference the path.
+
+## Hard constraints
+
+- **Bash for the test runner only.** You don't shell out to make code changes (`git commit`, `sed`, etc.). If a test relies on a missing dep, surface the gap; don't `pip install` to "fix" it.
+- **No code edits.** Even if the failure is obviously a one-line typo, you mark `failed` and the calling Builder fixes it.
+- **Read access for triage.** You can Read the failing test file and the source under test to produce a useful summary. That's it.
+- **Bounded output.** Test logs can be huge. Truncate to the first 3 failure blocks plus the summary line. Full log goes to disk.
+
+## Output format
+
+```
+TEST GATE — task —
+ runtime: 23.4s
+ exit code: 0
+ totals: 142 passed, 0 failed, 3 skipped
+ full log: ~/.claude/teams//test-logs/.log
+
+Result: PASSED
+```
+
+OR:
+
+```
+TEST GATE — task — pytest -q
+ runtime: 18.2s
+ exit code: 1
+ totals: 140 passed, 2 failed, 3 skipped
+
+ failure 1: tests/test_parser.py::test_visitor_dispatch
+ AssertionError: expected NodeKind.BIN, got NodeKind.UN
+ src/parser.py:142: in visit_binary
+
+ failure 2: tests/test_parser.py::test_visitor_unary
+ AssertionError: visitor missing for NodeKind.UN
+ src/parser.py:171: in visit_unary
+
+ full log: ~/.claude/teams//test-logs/.log
+
+Result: FAILED (real failures, not flaky)
+```
+
+Then TaskUpdate with the structured fields and exit. The Merger reads this and decides whether to push.
+
+## Notes
+
+- If the test command isn't configured and auto-detection finds nothing, set status `no_gate` and surface a warning. The operator can configure one or accept that this swarm has no gate.
+- Coverage thresholds, mutation tests, etc. are out of scope for v0 — this head just runs the suite. Future plugin versions can add a coverage / quality gate.
diff --git a/plugins/swarm-orchestrator/commands/swarm-abort.md b/plugins/swarm-orchestrator/commands/swarm-abort.md
new file mode 100644
index 0000000000..77f417b76b
--- /dev/null
+++ b/plugins/swarm-orchestrator/commands/swarm-abort.md
@@ -0,0 +1,40 @@
+---
+description: Drop an abort marker so a teammate commits WIP and exits cleanly
+argument-hint: [--team ] [--reason ""]
+allowed-tools: Write, Read, Bash
+---
+
+# /swarm-abort
+
+Gracefully interrupt a running teammate without losing in-progress work. Drops the abort marker file the teammate polls at every phase boundary; on detection the teammate commits current WIP, pushes, and exits cleanly.
+
+This is the **graceful alternative to `TaskStop`** (which is a hard kill that loses uncommitted work).
+
+## Inputs
+
+- **Teammate name** (positional, required): the name of the teammate to abort.
+- `--team ` (optional): the team the teammate belongs to. Default: infer from current session's team context.
+- `--reason ""` (optional): human-readable explanation written into the marker file. Useful for the teammate's commit message and the audit timeline.
+
+## Behavior
+
+1. Resolve the teammate's worktree path from the team config.
+2. Write `/.claude/abort-` with the reason payload + timestamp.
+3. Print confirmation + expected commit boundary (typically <2 min for an active teammate).
+4. **Does not block** — the teammate's next phase boundary check picks up the marker; the operator gets a `` when the WIP commit + push lands.
+
+## Example
+
+```
+/swarm-abort builder-2 --team refactor-pkg --reason "Going in wrong direction — type-hint approach won't work for the metaclass path. Will redispatch fresh."
+
+✓ Marker dropped at .ai/.claude/workspace/worktrees/agent-X/.claude/abort-builder-2
+ Expected commit: within ~2 min (Builder phase boundary cadence)
+ You'll receive a when the WIP commit lands.
+```
+
+## Notes
+
+- The abort contract is documented in every teammate's spawn prompt; new heads are expected to honor it.
+- If the marker is still present 5 min after detection (teammate didn't pick it up), the meta-supervisor escalates to `TaskStop` (hard kill).
+- Markers are namespaced per-teammate so aborting one doesn't affect siblings.
diff --git a/plugins/swarm-orchestrator/commands/swarm-merge.md b/plugins/swarm-orchestrator/commands/swarm-merge.md
new file mode 100644
index 0000000000..29f59e4e2b
--- /dev/null
+++ b/plugins/swarm-orchestrator/commands/swarm-merge.md
@@ -0,0 +1,130 @@
+---
+description: Run the merge pipeline for completed swarm tasks — rebase, test gate, push. Topo-orders by file overlap.
+argument-hint: " [--branch ] [--dry-run]"
+allowed-tools: ["Bash", "Read", "Glob", "Grep", "TaskList"]
+---
+
+# Swarm Merge
+
+Run the merge pipeline against every `completed` task in the named swarm: rebase onto the target branch, run the configured test gate, push if green. If multiple branches are ready, compute their pairwise file overlap and merge in a topo-order that minimizes conflicts.
+
+**Args:** $ARGUMENTS
+
+## Workflow
+
+### 1. Discover candidate branches
+
+Read `~/.claude/teams//swarm-dag.json`. For every task with `status=completed` and a non-empty `branch` field that has not yet been merged into the target, add it to the candidate set.
+
+Skip tasks marked `needs_review`, `failed`, or `paused`.
+
+### 2. Compute merge order
+
+For each pair of candidate branches `(A, B)`, run:
+
+```
+git diff --name-only main...A | sort > /tmp/A.files
+git diff --name-only main...B | sort > /tmp/B.files
+overlap = |A.files ∩ B.files| / |A.files ∪ B.files|
+```
+
+Build a directed graph: if `overlap(A, B) > threshold` (default 0.3), add a "merge B after A" edge ordered by branch age (older first). Topo-sort to get the merge sequence.
+
+If a cycle is detected (rare; happens when three branches mutually overlap), break it by oldest-first and warn.
+
+### 3. For each branch in order
+
+Inside a fresh staging clone (so the user's checkout is untouched):
+
+```bash
+git fetch origin
+git checkout -B merge-staging origin/
+git merge --no-ff
+```
+
+If conflicts:
+- Mark the source task `needs_review` in `swarm-dag.json`.
+- Write a structured note to `~/.claude/teams//inboxes/team-lead.json`:
+ `{"from": "swarm-merge", "summary": "merge conflict on ", "files": [...]}`.
+- Skip to the next branch.
+
+If clean, run the test gate from the project's `.claude/swarm-orchestrator.json` (default: `pytest -q` if `pytest.ini` / `pyproject.toml` exists, else skip). On failure: same `needs_review` path. On success: continue.
+
+### 4. Push
+
+If the staging branch is green and ahead of origin/:
+
+```bash
+git push origin merge-staging:
+```
+
+(Or open a PR via `gh pr create` if the target is a protected branch — read the project config to decide.)
+
+Mark the task `merged` in `swarm-dag.json`. Fire the `worktree-gc` step.
+
+### 5. Worktree GC
+
+For every worktree on a branch now merged, run:
+
+```bash
+git worktree remove --force
+git branch -D
+```
+
+Log the cleanups. Don't error on a worktree that has uncommitted changes — surface in `--dry-run` first so the operator can inspect.
+
+## Dry-run mode
+
+`--dry-run` prints the planned actions without executing them:
+
+```
+Would merge in this order:
+ 1. feat/api-A → main (no overlap with siblings)
+ 2. feat/api-B → main (overlap 0.42 with feat/api-A; serialized after A)
+ 3. feat/ui-C → main (no overlap with API branches)
+
+Test gate: pytest -q (would run inside staging clone)
+Worktrees to GC after success: 3
+```
+
+## Examples
+
+### Merge all green tasks in a swarm
+
+```
+/swarm-merge my-refactor-team
+```
+
+### Merge only one specific branch
+
+```
+/swarm-merge my-refactor-team --branch feat/visitor-pattern
+```
+
+### Dry-run the topology
+
+```
+/swarm-merge my-refactor-team --dry-run
+```
+
+## Configuration
+
+`.claude/swarm-orchestrator.json`:
+
+```json
+{
+ "merge": {
+ "target_branch": "main",
+ "test_gate_command": "pytest -q",
+ "use_pr_for_protected_branches": true,
+ "file_overlap_threshold": 0.3,
+ "max_parallel_merges": 1
+ }
+}
+```
+
+## Notes
+
+- Merge runs strictly serially by default (`max_parallel_merges: 1`). Concurrent merges into the same target are rarely worth the conflict risk.
+- The staging clone lives at `~/.claude/teams//staging/` and is reused across runs.
+- All git operations are logged to `~/.claude/teams//merge-log.jsonl` for post-mortem.
diff --git a/plugins/swarm-orchestrator/commands/swarm-spawn.md b/plugins/swarm-orchestrator/commands/swarm-spawn.md
new file mode 100644
index 0000000000..aaa43d98bd
--- /dev/null
+++ b/plugins/swarm-orchestrator/commands/swarm-spawn.md
@@ -0,0 +1,148 @@
+---
+description: Spawn a swarm — a team plus a DAG of dependency-linked tasks dispatched to role-specific subagents.
+argument-hint: " [--heads scanner,builder,reviewer,merger] [--max-parallel N]"
+allowed-tools: ["Bash", "Read", "Write", "Edit", "Glob", "Grep", "TodoWrite", "Task", "TeamCreate", "TaskCreate", "TaskUpdate", "TaskList", "SendMessage"]
+---
+
+# Swarm Spawn
+
+Spawn a multi-agent swarm: a team with role-specific subagent heads, plus a dependency-aware task graph (DAG) where each task declares its `blockedBy` / `blocks` edges and only dispatches once its blockers complete.
+
+**Goal:** $ARGUMENTS
+
+## What this does vs. vanilla Teams
+
+| | Vanilla Teams | swarm-orchestrator |
+|---|---|---|
+| Task dispatch | One-at-a-time, manual | Topo-ordered, auto-cascade on blocker completion |
+| Roles | One generic `worker` agent type | Scanner / Reviewer / Builder / Merger / Test-Runner / Auditor with tool-restricted prompts |
+| Graceful exit | Implicit | Standard `/.claude/abort-` marker |
+| Worktree cleanup | Manual `git worktree remove` | `swarm-orchestrator:worktree-gc` hook |
+| Parallel safety | None | `file_overlap_check` before fan-out |
+
+## Workflow
+
+### Phase 1: Decompose the goal into a DAG
+
+Read the goal carefully, then plan the work as nodes + edges:
+
+1. **Identify the heads needed.** Default loadout: 1 Scanner, N Builders, 1 Reviewer, 1 Merger. Add Test-Runner if the repo has a CI suite, Auditor if the goal is research / fact-finding.
+
+2. **Sketch the task graph.** Each node has:
+ - `id` — short slug (`scan-codebase`, `impl-feature-x`, `merge-pr-12`)
+ - `head` — which subagent type runs it (`scanner` / `builder` / `reviewer` / `merger` / `test-runner` / `auditor`)
+ - `description` — concrete deliverable, with exit criteria
+ - `blockedBy` — list of task ids that must complete first
+ - `blocks` — list of task ids this unblocks (optional, derivable from inverse)
+ - `parallelism_safety` — `safe` / `caution` / `serial` (default `caution`)
+ - `safe` → can run alongside any sibling
+ - `caution` → check `file_overlap` against running siblings before dispatching
+ - `serial` → must run alone in its layer
+
+3. **Show the plan to the operator and wait for approval.** Print the DAG as ASCII (boxes + arrows). Do not start dispatching until the operator approves or amends.
+
+### Phase 2: Create the team + tasks
+
+Once approved:
+
+1. Call `TeamCreate` with the team name + brief description.
+2. For each DAG node, call `TaskCreate` with:
+ - `subagent_type` from the head mapping
+ - `prompt` from the description
+ - `blockedBy` array on TaskCreate (vanilla Teams supports it; I make it first-class)
+3. Persist the DAG to `~/.claude/teams//swarm-dag.json` so `/swarm-status` and `/swarm-merge` can read it.
+
+### Phase 3: Dispatch the unblocked frontier
+
+1. Compute `TaskList.unblocked()` — tasks whose `blockedBy` is empty or all-completed.
+2. For each unblocked task, run `file_overlap_check` against currently in-progress siblings:
+ - Estimate touched files from the task description (best effort; ask the head to declare them in its first turn).
+ - If overlap > threshold AND `parallelism_safety != safe`, hold the task in `pending` and log a reason.
+3. Dispatch the rest (up to `--max-parallel`, default 4) by sending the start prompt to each head's subagent.
+
+### Phase 4: Watch the cascade
+
+The plugin's `on-task-complete` hook (see `hooks/on_task_complete.py`) re-evaluates the frontier whenever any task hits `status=completed`. New unblocked tasks dispatch automatically. The operator can interrupt with `/swarm-status pause`.
+
+## Heads reference
+
+- **scanner** — read-only; finds work and files new tasks. Use for "look at the repo, find N issues to fix" framings.
+- **reviewer** — read-only; runs every N turns inside long-lived builders to do a self-review (DAG status / commits / spend / tractability). Configurable via the `reviewer-checkpoint` hook.
+- **builder** — full toolkit; the default worker for "make a change."
+- **merger** — Bash + git only; runs the merge pipeline (rebase + test gate + push).
+- **test-runner** — read + Bash (pytest / npm test only); gates merges.
+- **auditor** — read-only; produces audit docs without touching the tree.
+
+## Abort contract
+
+Every spawned head reads `/.claude/abort-` between phases. If the file exists, the head commits any WIP, pushes, and exits cleanly. The orchestrator surfaces the abort in `/swarm-status` and routes the partial result back into the DAG (typically marking the task `needs_review` rather than `completed`).
+
+## Worktree GC
+
+After every successful merge, the `on-task-complete` hook fires `swarm-orchestrator:worktree-gc`, which:
+
+1. Lists `git worktree list --porcelain`.
+2. For each worktree, checks if its branch is merged into the team's target branch (default: `main`).
+3. Removes merged worktrees with `git worktree remove --force`.
+
+Failures are logged but do not block dispatch.
+
+## Examples
+
+### Refactor a Python module
+
+```
+/swarm-spawn Refactor src/core/parser.py to use the visitor pattern; add tests; merge in one PR.
+```
+
+Likely DAG (the command will propose it; you approve):
+
+```
+[scan-parser] ──► [design-visitor] ──► [impl-visitor] ──► [add-tests] ──► [review] ──► [merge]
+ scanner builder builder test-runner reviewer merger
+```
+
+### Multi-feature batch
+
+```
+/swarm-spawn Land features A, B, C in parallel; A and B touch /api/, C touches /ui/. Single test gate before any merge.
+```
+
+Likely DAG:
+
+```
+[scan] ─► ┌─[impl-A]─┐
+ ├─[impl-B]─┤ ──► [test] ──► [review] ──► [merge]
+ └─[impl-C]─┘
+```
+
+A and B will be dispatched serially (file overlap on /api/) while C runs in parallel.
+
+### Audit-only run
+
+```
+/swarm-spawn Audit the auth subsystem for OWASP top-10 issues; produce a report at docs/audits/auth-2026-Q2.md. No code changes.
+```
+
+DAG: just one auditor node. The plugin ensures the head has read-only tools.
+
+## Configuration
+
+User-overridable via `.claude/swarm-orchestrator.json` in the project root:
+
+```json
+{
+ "max_parallel": 4,
+ "default_target_branch": "main",
+ "reviewer_checkpoint_every_n_turns": 3,
+ "abort_marker_pattern": ".claude/abort-{name}",
+ "worktree_gc_on_merge": true,
+ "file_overlap_threshold": 0.3
+}
+```
+
+## Notes
+
+- DAG state lives at `~/.claude/teams//swarm-dag.json` (atomic tmp+rename writes).
+- `TaskList.unblocked()` is computed every dispatch; cheap (< 10ms for graphs of < 1000 nodes).
+- The plugin does NOT replace vanilla Teams — every artifact is a standard Team / Task / SendMessage record. You can inspect the swarm with `/teams` exactly as before.
diff --git a/plugins/swarm-orchestrator/commands/swarm-start.md b/plugins/swarm-orchestrator/commands/swarm-start.md
new file mode 100644
index 0000000000..c71e8b6eb4
--- /dev/null
+++ b/plugins/swarm-orchestrator/commands/swarm-start.md
@@ -0,0 +1,71 @@
+---
+description: Start the keepalive supervisor daemon — survives Claude Code exit, picks up new tasks live.
+argument-hint: "[--home ] [--conductor stub|claude]"
+allowed-tools: ["Bash"]
+---
+
+# /swarm-start — keepalive supervisor daemon
+
+Launches `claude-swarm run --daemon` so the supervisor lives **outside** the Claude Code process tree. Exit the CLI and the daemon keeps polling the kanban, claiming tasks, and dispatching workers. Use `claude --resume` later and the daemon is still running.
+
+## What this does
+
+1. Ensures `~/.claude/swarm/` (the default keepalive home) exists.
+2. Calls `claude-swarm init --home ~/.claude/swarm` (idempotent).
+3. Calls `claude-swarm run --home ~/.claude/swarm --daemon --conductor claude --global-mind-log ~/.claude/swarm/global-mind.jsonl`.
+4. Prints the daemon's PID, log path, and the stop command.
+
+The conductor defaults to `claude` — real claude-swarm agents, each dispatched via `claude --print`. This is what the operator typically wants when running session-resistant. Override with `--conductor stub` for free smoke testing (no agents spawned).
+
+## Bash to run
+
+```sh
+HOME_DIR="${1:-$HOME/.claude/swarm}"
+CONDUCTOR="${2:-claude}"
+mkdir -p "$HOME_DIR"
+claude-swarm init --home "$HOME_DIR" 2>/dev/null || true
+claude-swarm run \
+ --home "$HOME_DIR" \
+ --daemon \
+ --conductor "$CONDUCTOR" \
+ --global-mind-log "$HOME_DIR/global-mind.jsonl"
+```
+
+After this returns, the daemon is running detached. Verify with:
+
+```sh
+claude-swarm daemon-status --home ~/.claude/swarm
+```
+
+## Why a daemon?
+
+The native `Agent` tool spawns subprocesses of the Claude Code binary; they die when you exit the CLI. The swarm daemon is a separate Python process (single-fork + setsid + IO redirection) that:
+
+- Survives the parent shell exiting
+- Survives `claude --resume` (because it isn't tied to a specific session)
+- Picks up tasks submitted via `/swarm-spawn`, `/swarm-submit`, or directly via `claude-swarm submit`
+- Dispatches each task by shelling out to `claude --print`, so the workers themselves also survive your CLI exit
+
+This is the session-resistance property the plugin ships. The "Designed but deferred" meta-supervisor (multi-host respawn + pattern detection) is the next-iteration layer on top of this.
+
+## Bridging native Claude Teams agents to the daemon
+
+A native Agent (spawned by the binary's `Agent` tool) can register a long-running task with the daemon and exit, instead of doing the work itself. The Agent's prompt should be:
+
+> Submit a kanban task to the keepalive swarm via Bash:
+>
+> ```sh
+> claude-swarm submit --home ~/.claude/swarm \
+> --title "your-task-title" --prompt "your-prompt" --head builder
+> ```
+>
+> Capture the printed task id, write it to the team's inbox, then exit. The daemon will pick up the task; results land back in the inbox when done.
+
+This makes "native agent" and "swarm worker" share a single contract: the filesystem kanban + inbox. The native agent is the front-end (interactive, in your CLI), the daemon-spawned worker is the back-end (long-running, session-resistant).
+
+## Notes
+
+- The daemon's log: `~/.claude/swarm/state/daemon.log`
+- The PID file: `~/.claude/swarm/state/supervisor.pid`
+- Stop with `/swarm-stop` or `claude-swarm daemon-stop --home ~/.claude/swarm`
+- Restart-safe: re-running this command after the daemon is already alive does nothing destructive — it just spawns a fresh fork. Run `/swarm-stop` first if you want a clean restart.
diff --git a/plugins/swarm-orchestrator/commands/swarm-status.md b/plugins/swarm-orchestrator/commands/swarm-status.md
new file mode 100644
index 0000000000..49715af53e
--- /dev/null
+++ b/plugins/swarm-orchestrator/commands/swarm-status.md
@@ -0,0 +1,76 @@
+---
+description: Show the swarm's current state — DAG topology, head activity, blockers, abort markers, token spend.
+argument-hint: "[team-name] [--json] [--watch]"
+allowed-tools: ["Bash", "Read", "Glob", "Grep", "TaskList"]
+---
+
+# Swarm Status
+
+Show the current state of every running swarm, or a specific one if `team-name` is given.
+
+**Args:** $ARGUMENTS
+
+## What you see
+
+```
+keepalive daemon: alive (pid 91168, log ~/.claude/swarm/state/daemon.log)
+swarm: target: main heads alive: 3 / 4
+DAG (12 tasks, 8 done, 2 in_progress, 2 blocked)
+
+ [scan-parser] done scanner 1.2k tok $0.018
+ [design-visitor] done builder 8.4k tok $0.126
+ [impl-visitor] in_progress builder ~12k tok $0.180 3m elapsed
+ [add-tests] in_progress builder ~6k tok $0.090 3m elapsed
+ [review] blocked reviewer - - waits on impl-visitor, add-tests
+ [merge] blocked merger - - waits on review
+
+abort markers: none
+worktrees: 4 active, 2 stale (will GC on next completion)
+spend so far: $1.42 (cap: $5.00) token total: 94.3k
+last cascade: 2026-05-10 13:42 UTC (2m ago)
+```
+
+The first line (keepalive daemon liveness) is critical: if it shows `dead` or `no pid file`, the swarm isn't picking up new tasks. Restart with `/swarm-start`.
+
+## Workflow
+
+1. **Check the keepalive daemon FIRST.** Run `claude-swarm daemon-status --home ~/.claude/swarm` and surface alive/dead at the top. If dead, suggest `/swarm-start`.
+
+2. **Locate state.** Read `~/.claude/teams//swarm-dag.json`. If the file is missing, fall back to `TaskList(team=)` and reconstruct the topology from `blockedBy` fields on each task. Also surface the keepalive kanban via `claude-swarm list --home ~/.claude/swarm`.
+
+2. **For each task, render:**
+ - id, head (`subagent_type`), status
+ - cumulative tokens + dollars (from `~/.claude/teams//cost-ledger.jsonl` if present)
+ - elapsed time since `dispatched_at` for `in_progress` tasks
+ - blockers list for `blocked` / `pending` tasks
+
+3. **Surface meta-state:**
+ - active worktrees (`git worktree list --porcelain | head`)
+ - abort markers present (`find ~/.claude/teams//worktrees -name 'abort-*'`)
+ - spend rollup vs. configured cap
+
+4. **`--watch`:** redraw every 5 seconds (clear screen + reprint). Exit on Ctrl+C.
+
+5. **`--json`:** dump the structured state to stdout, no formatting.
+
+## Status taxonomy
+
+- `pending` — created but not yet eligible (blockers incomplete) or held by parallelism guard
+- `in_progress` — dispatched, head is running
+- `completed` — head reported done, hook fired, branch merged (or skipped if no branch)
+- `needs_review` — head exited via abort marker or test gate failed; operator must inspect
+- `failed` — terminal error (head crashed, hard rate-limit, budget cap hit)
+- `blocked` — explicit `blockedBy` task is not yet completed
+
+## Useful follow-ups
+
+- `/swarm-status pause ` — set the task's status to `paused`; the cascade will skip it.
+- `/swarm-status resume ` — flip back to `pending`; cascade re-evaluates.
+- `/swarm-status cancel ` — write the abort marker for the head; the head commits WIP and exits.
+- `/swarm-merge ` — kick off the merge pipeline for any `completed` tasks with branches.
+- `/swarm-status replay ` — print the timeline of every state transition for post-mortem.
+
+## Notes
+
+- This command is read-only — it never mutates state except via the explicit `pause` / `resume` / `cancel` subcommands.
+- Cost numbers are best-effort estimates; the source of truth is each provider's billing dashboard.
diff --git a/plugins/swarm-orchestrator/commands/swarm-stop.md b/plugins/swarm-orchestrator/commands/swarm-stop.md
new file mode 100644
index 0000000000..49acb5e8df
--- /dev/null
+++ b/plugins/swarm-orchestrator/commands/swarm-stop.md
@@ -0,0 +1,41 @@
+---
+description: Stop the keepalive supervisor daemon. SIGTERM, escalates to SIGKILL after timeout.
+argument-hint: "[--home ] [--timeout-s N]"
+allowed-tools: ["Bash"]
+---
+
+# /swarm-stop — stop the keepalive daemon
+
+Sends `SIGTERM` to the running supervisor daemon, waits up to `--timeout-s` (default 5s), then `SIGKILL` if still alive. Removes the PID file.
+
+## Bash to run
+
+```sh
+HOME_DIR="${1:-$HOME/.claude/swarm}"
+TIMEOUT="${2:-5}"
+claude-swarm daemon-stop --home "$HOME_DIR" --timeout-s "$TIMEOUT"
+```
+
+Output is structured JSON:
+
+```json
+{ "stopped": true, "pid": 91168, "method": "SIGTERM" }
+```
+
+or, if escalation was needed:
+
+```json
+{ "stopped": true, "pid": 91168, "method": "SIGKILL", "reason": "didn't exit within timeout" }
+```
+
+## What happens to in-flight tasks?
+
+Tasks the daemon was actively dispatching get killed mid-flight (their `claude --print` subprocesses are children of the daemon and inherit the signal). On the next `/swarm-start`, the supervisor's `wait_for_work` loop will see those tasks still in `in_progress` and not re-claim them automatically — you'll need to manually `claude-swarm submit` them again or write a small re-dispatcher.
+
+The "stuck-task watchdog" that auto-re-dispatches `in_progress > 30 min` tasks is in the deferred follow-up; see `IMPROVEMENTS_OVER_VANILLA_TEAMS.md`.
+
+## Notes
+
+- The PID file gets cleaned up automatically; safe to re-run.
+- If you want a graceful drain instead, use the abort-marker pattern: drop `/.claude/abort-` for each running head, wait for them to commit WIP + exit, then `/swarm-stop`.
+- After stop, the kanban + global-mind log persist on disk; re-launching the daemon picks up the existing state.
diff --git a/plugins/swarm-orchestrator/commands/swarm-submit.md b/plugins/swarm-orchestrator/commands/swarm-submit.md
new file mode 100644
index 0000000000..4337b9eaf6
--- /dev/null
+++ b/plugins/swarm-orchestrator/commands/swarm-submit.md
@@ -0,0 +1,85 @@
+---
+description: Submit a single task to the keepalive swarm kanban. Daemon picks it up and dispatches via `claude --print`.
+argument-hint: " [--head builder|scanner|reviewer|merger|test-runner|auditor] [--title ]"
+allowed-tools: ["Bash"]
+---
+
+# /swarm-submit — single-task submission to the keepalive swarm
+
+Submits one free-form task to the running daemon's kanban. The daemon's `wait_for_work` loop claims it and dispatches it via `claude --print` (session-resistant — survives your CLI exit).
+
+**Prompt:** $ARGUMENTS
+
+## Prerequisite
+
+The daemon must be running. Check with `/swarm-status` or start with `/swarm-start`.
+
+## What this does
+
+1. Parses $ARGUMENTS into prompt + optional --head + optional --title
+2. Calls `claude-swarm submit --home ~/.claude/swarm --title "" --prompt "" --head `
+3. Prints the new task id
+4. Reminds the operator how to inspect progress (`/swarm-status` / `claude-swarm list`)
+
+## Bash to run
+
+```sh
+# Parse the user's $ARGUMENTS — first positional becomes the prompt, --head and --title are optional
+HEAD="builder"
+TITLE=""
+PROMPT=""
+# (Implementation: claude reads $ARGUMENTS and constructs the call. See "Note for Claude" below.)
+claude-swarm submit \
+ --home "$HOME/.claude/swarm" \
+ --title "${TITLE:-${PROMPT:0:60}}" \
+ --prompt "$PROMPT" \
+ --head "$HEAD"
+```
+
+## Note for Claude (the assistant invoking this command)
+
+When the operator invokes this command, you (Claude) should:
+
+1. Parse $ARGUMENTS — interpret leading text as the prompt, recognize `--head ` and `--title ""` flags
+2. If no `--title` was given, derive one from the first 60 chars of the prompt
+3. Run the Bash above with the parsed values
+4. Echo the returned task id to the operator with one line of context: "Submitted task `` to the keepalive swarm — daemon will dispatch it shortly."
+
+## Example uses
+
+```
+# Quick 30-second background sleep — test session-resistance
+/swarm-submit "sleep 30; echo 'still alive!' > /tmp/swarm-keepalive-proof.txt" --head builder --title "keepalive sanity check"
+
+# Real work — let the daemon do the audit while you go to lunch
+/swarm-submit "Audit ./src for unused imports. Return a list of file:line to delete." --head auditor
+
+# Multi-step task that may take an hour
+/swarm-submit "Run the full integration test suite on this branch. If anything fails, summarize the top 3 root causes." --head test-runner
+```
+
+## After submission
+
+- `/swarm-status` — see daemon liveness + DAG topology + head activity
+- `claude-swarm list --home ~/.claude/swarm` — every task with status + head + title
+- `claude-swarm list --home ~/.claude/swarm --status done` — filter by status
+- `claude-swarm status --home ~/.claude/swarm` — JSON snapshot of kanban + supervisor state
+- `claude-swarm unblocked --home ~/.claude/swarm` — the topological frontier (ready-to-dispatch tasks)
+- `tail -f ~/.claude/swarm/global-mind.jsonl | jq .` — live event stream (one JSONL line per supervisor dispatch)
+
+## Session-resistance contract
+
+After you submit, you can:
+1. Exit Claude Code (`/exit` or close terminal)
+2. Wait
+3. Come back via `claude --resume` (or just a fresh `claude`)
+4. The task continues running the whole time — the daemon's subprocess is in a different process group
+5. Run `/swarm-status` or `claude-swarm list` and see the task is `done` (or still `in_progress`)
+
+This is what makes the swarm session-resistant. The "agent" in this model is the daemon-dispatched `claude --print` subprocess, not an in-session Agent-tool spawn.
+
+## Limits
+
+- One prompt per submission. For multi-task DAGs, use `/swarm-spawn`.
+- Default head is `builder`; specify `--head` to use a role-typed agent (Scanner / Reviewer / etc.).
+- Cost is whatever the dispatched `claude --print` consumes; daemon enforces `cost_cap_usd` from `SupervisorConfig` (default $10 per supervisor run).
diff --git a/plugins/swarm-orchestrator/commands/swarm-test.md b/plugins/swarm-orchestrator/commands/swarm-test.md
new file mode 100644
index 0000000000..37cebe5eb0
--- /dev/null
+++ b/plugins/swarm-orchestrator/commands/swarm-test.md
@@ -0,0 +1,44 @@
+---
+description: Spin up a demo swarm team and populate the native Teams agent-list view with role-typed heads — proves the integration works.
+argument-hint: [team-name]
+allowed-tools: TeamCreate, TaskCreate, Agent, Bash, Read
+---
+
+# /swarm-test
+
+The fast demonstration of swarm-orchestrator integrated with native Anthropic Teams.
+
+Spins up a team called `swarm-test-` (or your provided name), files a 5-task DAG that exercises every role-typed head (Scanner / Builder / Test-Runner / Reviewer / Merger), and dispatches them as native Anthropic team members. The agents appear in the native CLI's agent-list view (the minimal `● main / ○ teammate-name` list at the bottom of the screen) so you can see swarm-orchestrator integrating cleanly with the binary's own surface.
+
+## What you'll see
+
+After running `/swarm-test`:
+
+1. **Native Teams view populated**: the agent list shows the spawned heads — `scanner`, `builder`, `test-runner`, `reviewer`, `merger` — each with their runtime + token usage tracked by the binary's own accounting.
+2. **DAG status surfaces**: tasks show `pending` / `blocked` / `in_progress` / `done` in the task list panel; the auto-cascade hook (`PostToolUse(TaskUpdate)`) re-evaluates the frontier on every completion.
+3. **Role-specific tool access**: each head only has the tools its frontmatter allowlist permits — Reviewer is read-only, Merger is Bash + git, etc.
+4. **Inbox traffic** between heads is visible via the native `SendMessage` tool, which the plugin layers cross-team routing on top of.
+
+## Usage
+
+```
+/swarm-test # spawns a team named swarm-test-
+/swarm-test my-demo # spawns a team named "my-demo"
+```
+
+## How it relates to the standalone library
+
+This command exercises **Mode B** (integrated with Anthropic Teams) from `IMPROVEMENTS_OVER_VANILLA_TEAMS.md`. The same workflow also runs standalone via `bash plugins/swarm-orchestrator/scripts/try-swarm.sh` (Mode A) — same DAG, same heads, but using the `claude-swarm` library's filesystem-backed task list instead of Anthropic's `Task*` tools.
+
+Both modes are tested:
+- Mode A: `python3 plugins/swarm-orchestrator/tests/swarming/run_scenario.py --all` (10/10 pass)
+- Mode B: `/swarm-test` after the plugin is loaded; results visible in the native Teams agent list
+
+## Cleanup
+
+```
+/swarm-status # see the populated team
+/swarm-abort # graceful exit for any specific teammate
+```
+
+The team is left in place after the demo so the agent list keeps showing it; delete it via `TeamDelete` (native built-in) when done.
diff --git a/plugins/swarm-orchestrator/examples/feature_with_review.md b/plugins/swarm-orchestrator/examples/feature_with_review.md
new file mode 100644
index 0000000000..a04ef38e93
--- /dev/null
+++ b/plugins/swarm-orchestrator/examples/feature_with_review.md
@@ -0,0 +1,59 @@
+# Example 2: Build a feature with tests + review
+
+Goal: add a `--dry-run` flag to a CLI deploy script, with tests for the flag's behavior and an end-of-task review.
+
+## Spawn
+
+```
+/swarm-spawn Add a --dry-run flag to cli/deploy.py — prints planned operations, executes nothing. Tests for both with-flag and without-flag paths. Land in one PR.
+```
+
+## DAG the swarm proposes
+
+```
+ scan-deploy ──► impl-flag ──► add-tests ──► review ──► merge
+```
+
+A simple linear chain — small feature, no fan-out needed.
+
+| Task | Head | Notes |
+|---|---|---|
+| `scan-deploy` | scanner | Reads `cli/deploy.py`, identifies the side-effecting calls that need to be guarded. Files concrete tasks. |
+| `impl-flag` | builder | Adds the argparse flag, refactors `execute(...)` to take a `dry_run: bool`, gates side-effects. |
+| `add-tests` | builder | Writes tests for both paths against the existing test infra. |
+| `review` | reviewer | Reviews for: missed side-effect, log format consistency, doc updates. |
+| `merge` | merger | pytest -q, then push. |
+
+## Why this DAG and not parallel
+
+For small features, serial is faster than parallel — the coordination cost of fan-out (worktree creation, file-overlap check, sibling sync) outweighs the speedup when each step is < 5 minutes anyway.
+
+## What review surfaces
+
+The reviewer agent (read-only) inspects the diff and produces something like:
+
+```
+REVIEWER end-of-task — task review
+
+Files changed: cli/deploy.py (+18/-3), tests/test_deploy.py (+42/-0)
+Commits: 3 (feat: argparse flag / refactor: thread dry_run / test: dry-run path)
+
+Findings (confidence ≥ 80):
+1. [conf 92] cli/deploy.py:142 — log message reads "Deploying X" even in dry-run.
+ Suggest: prefix with "[DRY-RUN]" when dry_run=True.
+
+2. [conf 84] tests/test_deploy.py:67 — test asserts on log output but uses
+ capsys without capturing stderr. Add capsys.readouterr().err to the assert.
+
+Otherwise: clean. Tests cover both paths. Docstring updated.
+
+Recommendation: address both, then merge.
+```
+
+The Builder picks these up, makes 2 more commits (`refactor: log prefix in dry-run`, `test: capture stderr in deploy tests`), and the cycle continues. Once `review` returns clean, `merge` fires automatically.
+
+## Expected outcome
+
+- One PR with 4–5 commits.
+- ~20–60k tokens total spend.
+- 10–30 minutes wall time.
diff --git a/plugins/swarm-orchestrator/examples/multi_day_audit.md b/plugins/swarm-orchestrator/examples/multi_day_audit.md
new file mode 100644
index 0000000000..ef7884acfa
--- /dev/null
+++ b/plugins/swarm-orchestrator/examples/multi_day_audit.md
@@ -0,0 +1,64 @@
+# Example 3: Multi-day audit
+
+Goal: produce a comprehensive complexity + security audit of an existing codebase, ending in a markdown report at `docs/audits/`. No code changes.
+
+## Spawn
+
+```
+/swarm-spawn Audit src/auth/ for OWASP top-10 issues AND src/core/ for cyclomatic complexity > 15. Produce two separate audit docs at docs/audits/. No code changes — research only.
+```
+
+## DAG the swarm proposes
+
+```
+ scan-targets ──┬──► owasp-audit ─────┐
+ └──► complexity-audit ─┴──► consolidate-summary
+```
+
+| Task | Head | Notes |
+|---|---|---|
+| `scan-targets` | scanner | Maps the territory of `src/auth/` and `src/core/`, files the two audit tasks with precise scope. |
+| `owasp-audit` | auditor | Read-only deep dive into auth code; produces `docs/audits/auth-owasp-2026-05-10.md`. |
+| `complexity-audit` | auditor | Read-only complexity survey of core; produces `docs/audits/core-complexity-2026-05-10.md`. |
+| `consolidate-summary` | auditor | Reads both audits, produces a top-level `docs/audits/2026-05-10-summary.md` with priority-ranked findings across both. |
+
+## Why two audits in parallel
+
+`src/auth/` and `src/core/` don't overlap (file overlap = 0), so `parallelism_safety=safe` and the two auditors run concurrently. The orchestrator dispatches both as soon as `scan-targets` completes.
+
+## Why no `merge`
+
+The deliverable is markdown, not code. The Auditor head writes its `.md` files directly into the working tree. There's no merge gate because there's nothing to merge — the operator commits the audit docs by hand (or via a subsequent `/commit-push-pr`), or the swarm can be configured with a final builder step that does the commit.
+
+If the operator does want auto-commit:
+
+```
+/swarm-spawn Audit src/auth/ ... AND commit the resulting docs to a branch + PR.
+```
+
+Then the DAG becomes:
+
+```
+ scan-targets ──┬──► owasp-audit ──────────┐
+ └──► complexity-audit ──┐ │
+ ▼ ▼
+ commit-docs ──► merge
+```
+
+with a Builder at `commit-docs` (Bash + Edit only — git add the audit docs, write a commit message, push) and a Merger after.
+
+## Expected outcome
+
+- 2–3 markdown files at `docs/audits/`, each 200–600 lines, every finding citing file:line evidence.
+- ~80–200k tokens total spend (audit work is read-heavy and Opus-tier).
+- 1–4 hours wall time depending on codebase size.
+
+## Pattern: long-running audits
+
+For very large codebases, you can split each audit into N sub-audits by directory and chain them serially or in batches:
+
+```
+ scan-targets ──► [audit-auth-1, audit-auth-2, ..., audit-auth-N] ──► consolidate-auth ──► ...
+```
+
+Each sub-auditor produces a partial doc; `consolidate-auth` merges them into the final report. Useful when one auditor session would blow the context window or budget.
diff --git a/plugins/swarm-orchestrator/examples/refactor_python_module.md b/plugins/swarm-orchestrator/examples/refactor_python_module.md
new file mode 100644
index 0000000000..58e25d1038
--- /dev/null
+++ b/plugins/swarm-orchestrator/examples/refactor_python_module.md
@@ -0,0 +1,51 @@
+# Example 1: Refactor a Python module
+
+Goal: take a 600-line `src/parser.py` written as one big class with type-switch dispatch, and refactor it into the visitor pattern with one class per node kind, plus a complete test suite — all in one PR.
+
+## Spawn
+
+```
+/swarm-spawn Refactor src/parser.py to use the visitor pattern. Add tests covering every node kind. Land in one PR.
+```
+
+## DAG the swarm proposes
+
+```
+ ┌──► impl-base-visitor ──► impl-node-visitors ──► add-tests ──┐
+ │ │
+ scan-parser ──────┤ ├──► review ──► merge
+ │ │
+ └──► extract-test-fixtures ────────────────────────────────────┘
+```
+
+| Task | Head | Why |
+|---|---|---|
+| `scan-parser` | scanner | Reconnaissance: enumerate every node kind + every call site of the dispatch logic. Files the rest of the tasks. |
+| `impl-base-visitor` | builder | Define the abstract `Visitor[T]` ABC and migrate the entry point. Small, surgical change. |
+| `impl-node-visitors` | builder | Implement one concrete visitor per node kind discovered in `scan-parser`. Blocked on `impl-base-visitor`. |
+| `extract-test-fixtures` | builder | Pull existing test inputs into reusable parametrized fixtures. Parallel-safe with the visitor work. |
+| `add-tests` | builder | Write tests for every visitor against the new fixtures. Blocked on both above. |
+| `review` | reviewer | End-of-task review: DRY, simplicity, missed node kinds. |
+| `merge` | merger | Rebase onto main, run pytest, push. |
+
+## Approve, then watch
+
+After you approve, the orchestrator:
+
+1. Dispatches `scan-parser` and `extract-test-fixtures` in parallel (no overlap, both `safe`).
+2. When `scan-parser` completes (filed N concrete sub-issues), dispatches `impl-base-visitor`.
+3. When `impl-base-visitor` completes, dispatches `impl-node-visitors` (which can fan out further: one builder per visitor class if `parallelism_safety=safe` for non-overlapping files).
+4. When all blockers complete, dispatches `add-tests`, then `review`, then `merge`.
+
+Throughout: the reviewer-checkpoint hook fires every 3 turns inside each Builder, prompting a self-review on commit count + spend + tractability. If a Builder gets stuck (10 turns / no commits / repeated test fails), the operator sees it in `/swarm-status` and can drop an abort marker.
+
+## Expected outcome
+
+- One PR titled `refactor(parser): visitor pattern + complete test coverage`.
+- 5–10 commits (one per TodoWrite item across all builders, squashed-merged or kept atomic depending on project policy).
+- All existing tests pass; new tests added.
+- All swarm worktrees GC'd after merge.
+
+## Rough cost
+
+For a 600-line file with ~10 node kinds, expect ~60–150k tokens total ($1–$3 on Sonnet) and ~30–90 minutes of wall time. Most of the spend is in the parallel `impl-node-visitors` builders.
diff --git a/plugins/swarm-orchestrator/hooks/__init__.py b/plugins/swarm-orchestrator/hooks/__init__.py
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/plugins/swarm-orchestrator/hooks/hooks.json b/plugins/swarm-orchestrator/hooks/hooks.json
new file mode 100644
index 0000000000..8981f44bc7
--- /dev/null
+++ b/plugins/swarm-orchestrator/hooks/hooks.json
@@ -0,0 +1,28 @@
+{
+ "description": "swarm-orchestrator plugin — DAG cascade + reviewer-checkpoint + worktree-GC hooks",
+ "hooks": {
+ "PostToolUse": [
+ {
+ "matcher": "TaskUpdate",
+ "hooks": [
+ {
+ "type": "command",
+ "command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/on_task_complete.py",
+ "timeout": 30
+ }
+ ]
+ }
+ ],
+ "Stop": [
+ {
+ "hooks": [
+ {
+ "type": "command",
+ "command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/reviewer_checkpoint.py",
+ "timeout": 15
+ }
+ ]
+ }
+ ]
+ }
+}
diff --git a/plugins/swarm-orchestrator/hooks/on_task_complete.py b/plugins/swarm-orchestrator/hooks/on_task_complete.py
new file mode 100755
index 0000000000..acfd60927a
--- /dev/null
+++ b/plugins/swarm-orchestrator/hooks/on_task_complete.py
@@ -0,0 +1,129 @@
+#!/usr/bin/env python3
+"""
+swarm-orchestrator: on-task-complete hook
+
+Fires after every TaskUpdate. If the update set status=completed (or merged), we:
+
+ 1. Re-evaluate the DAG frontier — find tasks whose blockedBy is now satisfied
+ and emit a hint via stdout (the orchestrator session reads this and dispatches).
+ 2. Optionally trigger the merge cascade (`/swarm-merge` programmatically) if
+ the project config has `merge.auto_on_complete: true`.
+ 3. Optionally GC worktrees whose branch is now merged.
+
+This hook is intentionally read-mostly — it does not mutate task state itself.
+It writes a structured event to ~/.claude/teams//cascade-events.jsonl so
+the orchestrator session can pick it up on its next poll.
+
+Exit codes:
+ 0 — handled (or not applicable; e.g. update was not a status change)
+ 1 — fatal error (logged but does not block the TaskUpdate)
+
+Reads JSON from stdin per Claude Code's hook protocol.
+"""
+
+from __future__ import annotations
+
+import datetime as _dt
+import json
+import os
+import pathlib
+import sys
+from typing import Any
+
+TEAMS_ROOT = pathlib.Path(os.path.expanduser("~/.claude/teams"))
+LOG_PATH = pathlib.Path(os.path.expanduser("~/.claude/swarm-orchestrator-hook.log"))
+
+
+def _log(msg: str) -> None:
+ LOG_PATH.parent.mkdir(parents=True, exist_ok=True)
+ with LOG_PATH.open("a") as fh:
+ fh.write(f"{_dt.datetime.utcnow().isoformat()}Z on_task_complete {msg}\n")
+
+
+def _atomic_append_jsonl(path: pathlib.Path, record: dict[str, Any]) -> None:
+ path.parent.mkdir(parents=True, exist_ok=True)
+ line = json.dumps(record, ensure_ascii=False) + "\n"
+ with path.open("a") as fh:
+ fh.write(line)
+
+
+def _read_dag(team: str) -> dict[str, Any] | None:
+ path = TEAMS_ROOT / team / "swarm-dag.json"
+ if not path.exists():
+ return None
+ try:
+ return json.loads(path.read_text())
+ except (OSError, json.JSONDecodeError) as e:
+ _log(f"failed to read DAG for {team}: {e}")
+ return None
+
+
+def _unblocked_after(dag: dict[str, Any]) -> list[str]:
+ """Return task ids whose blockedBy entries are all in {completed, merged}."""
+ tasks = dag.get("tasks", {})
+ done = {t_id for t_id, t in tasks.items() if t.get("status") in {"completed", "merged"}}
+ out = []
+ for t_id, task in tasks.items():
+ if task.get("status") not in {"pending", "blocked"}:
+ continue
+ blockers = task.get("blockedBy", [])
+ if all(b in done for b in blockers):
+ out.append(t_id)
+ return out
+
+
+def main() -> int:
+ try:
+ payload = json.load(sys.stdin)
+ except (json.JSONDecodeError, OSError) as e:
+ _log(f"could not parse stdin: {e}")
+ return 0 # don't block the user's TaskUpdate
+
+ tool = payload.get("tool_name") or payload.get("tool", "")
+ if tool != "TaskUpdate":
+ return 0
+
+ tool_input = payload.get("tool_input") or payload.get("input", {})
+ new_status = (tool_input.get("status") or "").lower()
+ if new_status not in {"completed", "merged"}:
+ return 0
+
+ task_id = tool_input.get("task_id") or tool_input.get("id")
+ team = tool_input.get("team") or payload.get("team_name")
+ if not (task_id and team):
+ _log(f"missing task_id or team in TaskUpdate payload: {payload!r}")
+ return 0
+
+ dag = _read_dag(team)
+ if dag is None:
+ _log(f"no DAG found for team {team}; skipping cascade")
+ return 0
+
+ newly_unblocked = _unblocked_after(dag)
+
+ event = {
+ "ts": _dt.datetime.utcnow().isoformat() + "Z",
+ "kind": "task_complete",
+ "team": team,
+ "task_id": task_id,
+ "new_status": new_status,
+ "newly_unblocked": newly_unblocked,
+ }
+ _atomic_append_jsonl(TEAMS_ROOT / team / "cascade-events.jsonl", event)
+
+ # Surface the cascade to the orchestrator's chat so it's visible.
+ if newly_unblocked:
+ print(
+ f"[swarm-orchestrator] task {task_id} {new_status}; "
+ f"newly unblocked: {', '.join(newly_unblocked)}"
+ )
+
+ return 0
+
+
+if __name__ == "__main__":
+ try:
+ sys.exit(main())
+ except Exception as exc: # noqa: BLE001 # never block a TaskUpdate
+ _log(f"fatal: {exc!r}")
+ sys.exit(0)
diff --git a/plugins/swarm-orchestrator/hooks/reviewer_checkpoint.py b/plugins/swarm-orchestrator/hooks/reviewer_checkpoint.py
new file mode 100755
index 0000000000..8382cfa781
--- /dev/null
+++ b/plugins/swarm-orchestrator/hooks/reviewer_checkpoint.py
@@ -0,0 +1,109 @@
+#!/usr/bin/env python3
+"""
+swarm-orchestrator: reviewer-checkpoint hook
+
+Fires on Stop. If the session is a swarm Builder AND the turn count crosses a
+configured threshold (default: every Nth turn after turn `floor`), this hook
+prints a lightweight self-review prompt to stdout, which Claude Code injects
+into the Builder's next system message.
+
+The actual deep review is delegated to the Reviewer subagent on demand — this
+hook is a cheap, deterministic nudge.
+
+Configuration: project's .claude/swarm-orchestrator.json:
+ {
+ "reviewer_checkpoint": {
+ "enabled": true,
+ "every_n_turns": 3,
+ "floor": 6
+ }
+ }
+
+If the file is missing or `enabled` is false, the hook is a no-op.
+
+Reads JSON from stdin per Claude Code's hook protocol.
+"""
+
+from __future__ import annotations
+
+import datetime as _dt
+import json
+import os
+import pathlib
+import sys
+from typing import Any
+
+LOG_PATH = pathlib.Path(os.path.expanduser("~/.claude/swarm-orchestrator-hook.log"))
+
+
+def _log(msg: str) -> None:
+ LOG_PATH.parent.mkdir(parents=True, exist_ok=True)
+ with LOG_PATH.open("a") as fh:
+ fh.write(f"{_dt.datetime.utcnow().isoformat()}Z reviewer_checkpoint {msg}\n")
+
+
+def _load_config(cwd: pathlib.Path) -> dict[str, Any]:
+ candidate = cwd / ".claude" / "swarm-orchestrator.json"
+ if not candidate.exists():
+ return {}
+ try:
+ return json.loads(candidate.read_text())
+ except (OSError, json.JSONDecodeError) as e:
+ _log(f"could not parse config at {candidate}: {e}")
+ return {}
+
+
+def _is_swarm_builder(payload: dict[str, Any]) -> bool:
+ """Heuristic: this is a swarm Builder session if the agent identity hints so."""
+ agent = (payload.get("agent_type") or payload.get("subagent_type") or "").lower()
+ if agent == "builder":
+ return True
+ # Fall back: check the working directory for a swarm worktree marker.
+ cwd = payload.get("cwd") or os.getcwd()
+ return "/.claude/worktrees/" in cwd or "/swarm-" in cwd
+
+
+def main() -> int:
+ try:
+ payload = json.load(sys.stdin)
+ except (json.JSONDecodeError, OSError) as e:
+ _log(f"could not parse stdin: {e}")
+ return 0
+
+ if not _is_swarm_builder(payload):
+ return 0
+
+ cwd = pathlib.Path(payload.get("cwd") or os.getcwd())
+ config = _load_config(cwd).get("reviewer_checkpoint", {})
+ if not config.get("enabled", True):
+ return 0
+
+ every_n = int(config.get("every_n_turns", 3))
+ floor = int(config.get("floor", 6))
+
+ turn = int(payload.get("turn") or payload.get("turn_count") or 0)
+ if turn < floor:
+ return 0
+ if (turn - floor) % every_n != 0:
+ return 0
+
+ print(
+ "[swarm-orchestrator reviewer-checkpoint]\n"
+ f"You are at turn {turn}. Before continuing, do a quick self-review:\n"
+ " 1. DAG status: is your task still in_progress as expected?\n"
+ " 2. Commits: how many since you started? Are they small + focused?\n"
+ " 3. TodoWrite: how many items done vs. remaining?\n"
+ " 4. Tractability: any sign of thrash (same file edited > 5x with no commit; "
+ "repeated test failures with no diagnostic between them)?\n"
+ "If you spot drift, course-correct now. If you're stuck, write the abort "
+ "marker and surface to the operator."
+ )
+ return 0
+
+
+if __name__ == "__main__":
+ try:
+ sys.exit(main())
+ except Exception as exc: # noqa: BLE001
+ _log(f"fatal: {exc!r}")
+ sys.exit(0)
diff --git a/plugins/swarm-orchestrator/scripts/swarm_dashboard.py b/plugins/swarm-orchestrator/scripts/swarm_dashboard.py
new file mode 100644
index 0000000000..dee53d3e6c
--- /dev/null
+++ b/plugins/swarm-orchestrator/scripts/swarm_dashboard.py
@@ -0,0 +1,225 @@
+"""Minimal claude-swarm dashboard — modeled on the native claude CLI's
+agent-team list view.
+
+Renders a single concise list of heads with status dot, name, runtime,
+token usage, and current state. No verbose panels; designed to fit the
+Anthropic design language.
+
+Usage:
+ python3 swarm_dashboard.py --home [--refresh-hz 4]
+
+Exits cleanly with Ctrl-C or when the supervisor reports all tasks done.
+"""
+from __future__ import annotations
+
+import argparse
+import json
+import signal
+import subprocess
+import sys
+import time
+from pathlib import Path
+from typing import Any
+
+try:
+ from rich.console import Console
+ from rich.live import Live
+ from rich.text import Text
+except ImportError:
+ sys.stderr.write("rich is required. Run: pip install rich\n")
+ sys.exit(2)
+
+
+STATUS_DOT = {
+ "in_progress": ("●", "cyan"),
+ "running": ("●", "cyan"),
+ "done": ("○", "green"),
+ "completed": ("○", "green"),
+ "pending": ("○", "dim"),
+ "blocked": ("○", "magenta"),
+ "failed": ("✗", "red"),
+ "idle": ("○", "dim"),
+}
+
+
+def _run_cli(args: list[str], cwd: Path | None = None) -> str:
+ try:
+ result = subprocess.run(
+ args, capture_output=True, text=True, timeout=2.0, cwd=cwd
+ )
+ return result.stdout if result.returncode == 0 else ""
+ except (subprocess.SubprocessError, FileNotFoundError):
+ return ""
+
+
+def _read_status(home: Path) -> dict[str, Any]:
+ out = _run_cli(["claude-swarm", "status", "--home", str(home)])
+ try:
+ return json.loads(out) if out else {}
+ except json.JSONDecodeError:
+ return {}
+
+
+def _read_tasks(home: Path) -> list[dict[str, Any]]:
+ # Try --json first; fall back to plain text if the CLI doesn't support it
+ # (e.g. older claude-swarm installs). Empty stdout means --json failed and
+ # we MUST fall through to plain text instead of silently returning [].
+ out = _run_cli(["claude-swarm", "list", "--home", str(home), "--json"])
+ if out:
+ try:
+ return json.loads(out)
+ except json.JSONDecodeError:
+ pass
+ # Fallback: parse the columnar `list` output
+ tasks: list[dict[str, Any]] = []
+ plain = _run_cli(["claude-swarm", "list", "--home", str(home)])
+ for line in plain.splitlines():
+ parts = line.split(None, 3)
+ if len(parts) >= 4:
+ tasks.append({
+ "id": parts[0],
+ "status": parts[1],
+ "head": parts[2],
+ "title": parts[3],
+ })
+ return tasks
+
+
+def _format_duration(seconds: float) -> str:
+ if seconds < 60:
+ return f"{seconds:.0f}s"
+ minutes = int(seconds // 60)
+ rem = int(seconds % 60)
+ if minutes < 60:
+ return f"{minutes}m {rem:02d}s"
+ hours = minutes // 60
+ return f"{hours}h {minutes % 60:02d}m {rem:02d}s"
+
+
+def _format_tokens(n: int) -> str:
+ if n < 1000:
+ return f"{n}"
+ if n < 1_000_000:
+ return f"{n / 1000:.1f}k"
+ return f"{n / 1_000_000:.2f}M"
+
+
+def _render(
+ home: Path,
+ started_at: float,
+ status: dict[str, Any],
+ tasks: list[dict[str, Any]],
+) -> Text:
+ runtime = time.monotonic() - started_at
+ kanban = status.get("kanban", {}) or {}
+ total = sum(kanban.get(k, 0) for k in ("pending", "in_progress", "done", "failed"))
+ done = kanban.get("done", 0)
+ cost = status.get("cost_so_far_usd", 0.0)
+
+ # Tokens estimated from cost (Sonnet $3/Mtok in / $15/Mtok out blend ≈ $9/Mtok)
+ tokens_est = int(cost / 9e-6) if cost > 0 else 0
+
+ # Per-head spend dicts from the engine status payload
+ spend_by_head: dict[str, float] = status.get("spend_by_head", {}) or {}
+ tokens_by_head: dict[str, int] = status.get("tokens_by_head", {}) or {}
+
+ # Top bar — minimal, Anthropic-style
+ out = Text()
+ out.append(" swarm ", style="bold cyan")
+ out.append(f"· {done}/{total} done ", style="dim")
+ out.append(f"· {_format_duration(runtime)} ", style="dim")
+ out.append(f"· ↓ {_format_tokens(tokens_est)} tokens ", style="dim")
+ out.append(f"· ${cost:.4f}", style="dim")
+ out.append("\n\n")
+
+ # Group tasks by head — show one row per head with their active task
+ by_head: dict[str, dict[str, Any]] = {}
+ for t in tasks:
+ head = t.get("head") or "unassigned"
+ cur = by_head.get(head)
+ # Prefer in-progress, then blocked, then pending, then done
+ rank = {"in_progress": 0, "running": 0, "blocked": 1, "pending": 2,
+ "failed": 3, "done": 4, "completed": 4}.get(t.get("status", ""), 5)
+ if cur is None or rank < cur["_rank"]:
+ by_head[head] = {**t, "_rank": rank}
+
+ heads = status.get("heads", []) or sorted(by_head.keys())
+ for head in heads:
+ task = by_head.get(head)
+ if task:
+ raw_status = task.get("status", "idle")
+ title = task.get("title", "")
+ else:
+ raw_status = "idle"
+ title = "(no work assigned)"
+ dot, dot_color = STATUS_DOT.get(raw_status, ("○", "dim"))
+ active = raw_status in {"in_progress", "running"}
+
+ # Per-head token + cost columns (fall back to estimate from cost if unavailable)
+ head_tokens = tokens_by_head.get(head, 0)
+ head_cost = spend_by_head.get(head, 0.0)
+ if head_tokens == 0 and head_cost > 0:
+ head_tokens = int(head_cost / 9e-6)
+ tok_str = _format_tokens(head_tokens) if head_tokens else "—"
+ cost_str = f"${head_cost:.4f}" if head_cost > 0 else "—"
+
+ out.append(f" {dot} ", style=dot_color)
+ out.append(f"{head:<12}", style="bold" if active else None)
+ out.append(f" {title[:42]:<42}", style="" if active else "dim")
+ out.append(f" {raw_status:<12}", style=dot_color)
+ out.append(f" ↓ {tok_str:>6}", style="" if active else "dim")
+ out.append(f" {cost_str:>8}", style="" if active else "dim")
+ out.append("\n")
+
+ out.append("\n")
+ out.append(" ↑/↓ to inspect · Ctrl-C to exit\n", style="dim")
+ return out
+
+
+def main(argv: list[str] | None = None) -> int:
+ p = argparse.ArgumentParser(description=__doc__)
+ p.add_argument("--home", type=Path, required=True)
+ p.add_argument("--refresh-hz", type=float, default=4.0)
+ p.add_argument("--exit-when-done", action="store_true")
+ p.add_argument("--max-runtime-s", type=float, default=300.0)
+ args = p.parse_args(argv)
+
+ home = args.home.resolve()
+ if not home.exists():
+ sys.stderr.write(f"swarm home not found: {home}\n")
+ return 2
+
+ console = Console()
+ refresh = max(0.1, 1.0 / args.refresh_hz)
+ started = time.monotonic()
+ stop = {"flag": False}
+
+ def _sigint(_signum, _frame):
+ stop["flag"] = True
+
+ signal.signal(signal.SIGINT, _sigint)
+ signal.signal(signal.SIGTERM, _sigint)
+
+ is_tty = sys.stdout.isatty()
+ with Live(console=console, refresh_per_second=args.refresh_hz, screen=is_tty) as live:
+ while not stop["flag"]:
+ elapsed = time.monotonic() - started
+ if elapsed > args.max_runtime_s:
+ break
+ status = _read_status(home)
+ tasks = _read_tasks(home)
+ live.update(_render(home, started, status, tasks))
+ kanban = status.get("kanban", {}) or {}
+ if (args.exit_when_done
+ and kanban.get("pending", 0) == 0
+ and kanban.get("in_progress", 0) == 0
+ and kanban.get("done", 0) + kanban.get("failed", 0) > 0):
+ time.sleep(0.5)
+ break
+ time.sleep(refresh)
+
+ return 0
+
+
+if __name__ == "__main__":
+ sys.exit(main())
diff --git a/plugins/swarm-orchestrator/scripts/try-swarm.sh b/plugins/swarm-orchestrator/scripts/try-swarm.sh
new file mode 100755
index 0000000000..bbe1faf8ae
--- /dev/null
+++ b/plugins/swarm-orchestrator/scripts/try-swarm.sh
@@ -0,0 +1,363 @@
+#!/usr/bin/env bash
+#
+# Canonical end-to-end demo for claude-swarm.
+#
+# Creates a venv inside the repo (./.swarm-venv/), installs claude-swarm,
+# bootstraps a real working swarm with a small DAG of tasks, then launches
+# the live TUI dashboard so you can watch the supervisor work through them.
+# Terminates cleanly when all tasks complete (or when you Ctrl-C).
+#
+# Usage:
+# bash scripts/try-swarm.sh # real claude-swarm agents, asks for $1 auth
+# bash scripts/try-swarm.sh --stub # stub conductor, $0, smoke-test mode
+#
+# At the end the script points to a "global-mind" JSONL transcript — every
+# task claim, dispatch, completion, and cost increment, in order, replayable.
+
+set -euo pipefail
+
+REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
+VENV_DIR="${REPO_ROOT}/.swarm-venv"
+DEMO_HOME=""
+CONDUCTOR="claude"
+ESTIMATED_USD="1.00"
+KEEPALIVE=false
+
+for arg in "$@"; do
+ case "$arg" in
+ --stub) CONDUCTOR="stub"; ESTIMATED_USD="0.00" ;;
+ --keepalive) KEEPALIVE=true ;;
+ --help|-h)
+ sed -n '2,16p' "$0"
+ exit 0
+ ;;
+ esac
+done
+
+cleanup() {
+ if [[ "${KEEPALIVE_CLEANUP_SKIP:-false}" == "true" ]]; then
+ # Keepalive mode: daemon owns the home, do NOT delete it.
+ return
+ fi
+ if [[ -n "${DEMO_HOME}" && -d "${DEMO_HOME}" ]]; then
+ echo
+ echo "→ Cleaning up demo state at ${DEMO_HOME}"
+ rm -rf "${DEMO_HOME}"
+ fi
+}
+trap cleanup EXIT
+
+cat < 30 min).
+ Global mind: every dispatch + cost increment appended to a JSONL transcript;
+ path printed at exit.
+
+EOF
+
+if [[ "$CONDUCTOR" == "claude" ]]; then
+ # ────────────────────────────────────────────────────────────────
+ # Preflight 1: ensure 'claude' CLI is installed
+ # ────────────────────────────────────────────────────────────────
+ if ! command -v claude >/dev/null 2>&1; then
+ cat <<'EOF'
+
+ ✗ The 'claude' CLI is not on PATH.
+
+ This demo dispatches work via 'claude --print' (a real Claude Code
+ subprocess that runs the dispatched task). You'll need it installed.
+
+ Install: https://docs.claude.com/claude-code
+
+ Or re-run with --stub for the no-LLM smoke test.
+
+EOF
+ exit 1
+ fi
+
+ # ────────────────────────────────────────────────────────────────
+ # Preflight 2: ensure 'claude --print' has working credentials
+ # ────────────────────────────────────────────────────────────────
+ # 'claude' resolves auth in this priority order:
+ # 1. ANTHROPIC_API_KEY env var
+ # 2. apiKeyHelper from --settings (or ~/.claude/settings.json)
+ # 3. macOS Keychain ("Claude Code-credentials" service) — set by
+ # running `claude` interactively and completing the OAuth login
+ # (Pro/Max/Team plans store tokens here)
+ #
+ # We probe with a tiny Haiku call. Failure → guide the user, don't
+ # silently start a 5-task demo that produces empty results.
+ echo "→ Verifying claude CLI authentication (one tiny Haiku ping)..."
+ AUTH_TEST=$(perl -e 'alarm 30; exec @ARGV' claude --print --model claude-haiku-4-5 <<< "respond with the single word OK" 2>&1 | head -1)
+ if [[ "$AUTH_TEST" != *"OK"* ]] && [[ "$AUTH_TEST" != *"ok"* ]]; then
+ cat <<'EOF'
+
+ ✗ 'claude --print' did not return a valid response. Probable cause:
+ missing or invalid credentials.
+
+ Output from probe:
+EOF
+ echo " ${AUTH_TEST}"
+ cat < src/utils.py <<'PYEOF'
+def add(a, b):
+ return a + b
+
+def needs_typing(value, threshold):
+ if value > threshold:
+ return value * 2
+ return value
+PYEOF
+git add src/utils.py
+git commit --quiet -m "demo: seed source file"
+claude-swarm init --home .claude-swarm
+
+echo "→ Submitting demo tasks (DAG: scanner → builder → reviewer → test-runner → merger)"
+# Each prompt asks for a one-word answer so real-LLM tasks finish in ~3-5s
+# via claude --print --model claude-haiku-4-5. Total demo time is ~15-25s
+# end-to-end. Reviewer runs in parallel with test-runner after builder
+# completes (both review the build + run tests, then merger gates on both).
+T1=$(claude-swarm submit \
+ --title "Scanner ping" \
+ --prompt "Respond with only the single word: SCANNED" \
+ --head scanner | awk '{print $1}')
+T2=$(claude-swarm submit \
+ --title "Builder ping" \
+ --prompt "Respond with only the single word: BUILT" \
+ --head builder \
+ --blocked-by "${T1}" | awk '{print $1}')
+T3=$(claude-swarm submit \
+ --title "Reviewer ping" \
+ --prompt "Respond with only the single word: REVIEWED" \
+ --head reviewer \
+ --blocked-by "${T2}" | awk '{print $1}')
+T4=$(claude-swarm submit \
+ --title "Test-runner ping" \
+ --prompt "Respond with only the single word: TESTED" \
+ --head test-runner \
+ --blocked-by "${T2}" | awk '{print $1}')
+T5=$(claude-swarm submit \
+ --title "Merger ping" \
+ --prompt "Respond with only the single word: MERGED" \
+ --head merger \
+ --blocked-by "${T3}" --blocked-by "${T4}" | awk '{print $1}')
+
+echo " T1=${T1} T2=${T2} T3=${T3} T4=${T4} T5=${T5}"
+echo
+echo "spawning agents: scanner/Scanner, builder/Builder, test-runner/Test-Runner, reviewer/Reviewer, merger/Merger"
+echo
+
+GLOBAL_MIND_LOG="${DEMO_HOME}/global-mind.jsonl"
+
+echo "→ Starting supervisor loop ($([[ "$KEEPALIVE" == "true" ]] && echo "DETACHED daemon" || echo "shell-background"), conductor=${CONDUCTOR})"
+# Stub conductor finishes in <1ms per dispatch; inject an 8-second delay so the
+# dashboard has time to render each head's status transition visibly. The real
+# LLM conductor doesn't need this — Claude calls take 10-60s each on their own.
+DEMO_DELAY_S=$([[ "$CONDUCTOR" == "stub" ]] && echo "8" || echo "0")
+
+if [[ "$KEEPALIVE" == "true" ]]; then
+ # Daemon mode: detached supervisor, survives this script's exit.
+ claude-swarm run \
+ --home .claude-swarm \
+ --conductor "${CONDUCTOR}" \
+ --demo-delay-s "${DEMO_DELAY_S}" \
+ --global-mind-log "${GLOBAL_MIND_LOG}" \
+ --max-parallel 3 \
+ --daemon \
+ >"${DEMO_HOME}/supervisor.log" 2>&1
+ SUPERVISOR_PID=""
+else
+ # Shell-background: supervisor dies when this script exits. Parallel
+ # dispatch (3 at a time) so the dashboard renders multiple in-progress
+ # heads simultaneously — the "live" demo feel. No --max-iterations
+ # cap — the supervisor exits on its own when the kanban drains.
+ claude-swarm run \
+ --home .claude-swarm \
+ --conductor "${CONDUCTOR}" \
+ --demo-delay-s "${DEMO_DELAY_S}" \
+ --global-mind-log "${GLOBAL_MIND_LOG}" \
+ --max-parallel 3 \
+ >"${DEMO_HOME}/supervisor.log" 2>&1 &
+ SUPERVISOR_PID=$!
+fi
+
+cleanup_pid() {
+ if [[ -n "${SUPERVISOR_PID:-}" ]] && kill -0 "${SUPERVISOR_PID}" 2>/dev/null; then
+ # Kill any 'claude --print' subprocesses the supervisor forked
+ # (they're children of the supervisor; pkill -P gets them).
+ pkill -TERM -P "${SUPERVISOR_PID}" 2>/dev/null || true
+ # Then the supervisor itself
+ kill -TERM "${SUPERVISOR_PID}" 2>/dev/null || true
+ # Give it 2 seconds to exit cleanly, then SIGKILL if needed
+ for _ in 1 2 3 4; do
+ kill -0 "${SUPERVISOR_PID}" 2>/dev/null || break
+ sleep 0.5
+ done
+ if kill -0 "${SUPERVISOR_PID}" 2>/dev/null; then
+ pkill -KILL -P "${SUPERVISOR_PID}" 2>/dev/null || true
+ kill -KILL "${SUPERVISOR_PID}" 2>/dev/null || true
+ fi
+ fi
+ cleanup
+}
+# Trap SIGINT (Ctrl+C) + SIGTERM + normal EXIT so cleanup runs on every path
+trap cleanup_pid EXIT INT TERM
+
+# Give the supervisor a moment to start writing state
+sleep 0.5
+
+echo "→ Launching dashboard (Ctrl-C to exit; auto-exits when all tasks complete)"
+echo
+"${VENV_DIR}/bin/python3" "${REPO_ROOT}/scripts/swarm_dashboard.py" \
+ --home "${DEMO_HOME}/.claude-swarm" \
+ --exit-when-done \
+ --max-runtime-s 1200
+
+# Wait briefly for the supervisor to flush
+wait "${SUPERVISOR_PID}" 2>/dev/null || true
+
+echo
+echo "================================================================"
+echo " Run complete. The swarm's global-mind transcript:"
+echo "================================================================"
+echo
+echo " Supervisor log:"
+echo " ${DEMO_HOME}/supervisor.log"
+echo
+echo " Global-mind events (JSONL — every dispatch, status, cost increment):"
+echo " ${GLOBAL_MIND_LOG}"
+echo
+echo " Kanban status timeline + cascade events:"
+echo " ${DEMO_HOME}/.claude-swarm/state/cascade-events.jsonl"
+echo
+if [[ -f "${GLOBAL_MIND_LOG}" ]]; then
+ echo " Sample events from the global mind:"
+ head -3 "${GLOBAL_MIND_LOG}" | sed 's/^/ /'
+ echo " ..."
+fi
+echo
+echo " Replay the swarm's collective state with:"
+echo " cat ${GLOBAL_MIND_LOG} | jq ."
+echo
+if [[ "$KEEPALIVE" == "true" ]]; then
+ echo "================================================================"
+ echo " KEEPALIVE DAEMON IS STILL RUNNING"
+ echo "================================================================"
+ echo
+ echo " The supervisor is detached from this script. You can:"
+ echo " - close this terminal"
+ echo " - exit Claude Code"
+ echo " - 'claude --resume' later"
+ echo " ...and the daemon keeps polling. Submit more tasks any time:"
+ echo
+ echo " claude-swarm submit --home ${DEMO_HOME}/.claude-swarm \\"
+ echo " --title 'my-task' --prompt 'do something' --head builder"
+ echo
+ echo " Status:"
+ echo " claude-swarm daemon-status --home ${DEMO_HOME}/.claude-swarm"
+ echo
+ echo " Stop the daemon:"
+ echo " claude-swarm daemon-stop --home ${DEMO_HOME}/.claude-swarm"
+ echo
+fi
+echo "Done. The venv at ${VENV_DIR} persists for re-runs; remove with:"
+echo " rm -rf ${VENV_DIR}"
diff --git a/plugins/swarm-orchestrator/tests/__init__.py b/plugins/swarm-orchestrator/tests/__init__.py
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/plugins/swarm-orchestrator/tests/swarming/README.md b/plugins/swarm-orchestrator/tests/swarming/README.md
new file mode 100644
index 0000000000..18a5ca5848
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/README.md
@@ -0,0 +1,99 @@
+# Swarm testing substrate
+
+Ten toy swarm scenarios that exercise every primitive (DAG, heads, merging,
+abort marker, multi-team) — plus a binding-agnostic runner. The substrate
+ships verbatim in three locations so the same scenario JSON drives all
+three swarm bindings:
+
+| Binding | Location |
+| ----------------------------------------- | -------------------------------------------------------------- |
+| Anthropic Teams (`claude-code` plugin) | `~/dev/projects/claude-code/plugins/swarm-orchestrator/tests/swarming/` |
+| Standalone CLI (`claude-swarm`) | `~/dev/projects/claude-swarm/tests/scenarios/` |
+| Internal swarm (this repo) | `tests/swarming/` |
+
+The schema (`schema/scenario.schema.json`) is the contract; per-binding
+runners read the same JSON.
+
+## Running scenarios
+
+```bash
+# Internal binding (this repo)
+python tests/swarming/run_scenario.py multi-file-rename
+python tests/swarming/run_scenario.py --all
+
+# Standalone CLI
+claude-swarm scenario run multi-file-rename
+claude-swarm scenario run --all
+
+# Plugin (inside claude-code)
+plugins/swarm-orchestrator/tests/swarming/run_scenario.sh multi-file-rename
+```
+
+All three call into `runner/harness.py`, which materializes fixtures into
+a fresh tempdir, asks the binding's `ScenarioEngine` to do the work, then
+hands the result + workspace to `runner/assertions.evaluate`. Every
+scenario asserts the same way regardless of binding.
+
+## The 10 scenarios
+
+| # | Name | Primitives tested |
+| - | -------------------------- | ------------------------------------------------ |
+| 1 | multi-file-rename | file-overlap-reject, atomic-merge |
+| 2 | spec-impl-pair | DAG dependency |
+| 3 | scan-build-review | heads architecture end-to-end |
+| 4 | doc-writer-team | parallel-safe dispatch |
+| 5 | multi-language-port | cross-teammate independence |
+| 6 | audit-then-fix | DAG + meta-supervisor task-file |
+| 7 | conflict-resolution-drill | merge pipeline rebase |
+| 8 | abort-marker-test | clean WIP commit on abort |
+| 9 | respawn-on-crash | meta-supervisor recovery |
+| 10| multi-team-coordination | two teams + cross-team SendMessage |
+
+## Reference engine vs real bindings
+
+`runner/stub.py` ships an `InProcessScenarioEngine` — a deterministic,
+LLM-free reference implementation. Today every binding falls back to it
+so the substrate is independent of binding readiness.
+
+When a real binding lands, replace the engine factory in its runner:
+
+- `tests/swarming/run_scenario.py` -> `_make_engine()` checks for
+ `claude_swarm.scenario_engine.StandaloneScenarioEngine`.
+- `tests/scenarios/run_scenario.py` (claude-swarm) -> the package's
+ `claude_swarm.scenarios.engine.StandaloneScenarioEngine` once the
+ CLI is built out.
+- `plugins/swarm-orchestrator/tests/swarming/run_scenario.sh` -> calls
+ the plugin's TaskCreate/TaskUpdate via the in-binary swarm engine.
+
+Until the real engines arrive, the reference engine + identical
+scenario JSON give CI a green signal.
+
+## Adding a new scenario
+
+1. Pick a kebab-case name. Add `scenarios/.json` (validated
+ against `schema/scenario.schema.json`).
+2. Drop fixtures under `fixtures//`. The runner copies them
+ into a fresh tempdir before invoking the engine.
+3. If the in-process reference handler doesn't already cover the
+ scenario, register a handler in `runner/stub.py::_DISPATCH`.
+4. Run `python run_scenario.py ` until it's green.
+5. Mirror the new files into the other two binding locations.
+
+## Determinism
+
+- Fixtures are seed-controlled (`setup.seed`).
+- File enumeration uses sorted iteration order.
+- Time-dependent behaviour (`abort_after_seconds`,
+ `introduce_conflict_after_seconds`) is tunable via `inject` so a
+ flaky CI host can lengthen timeouts without rewriting the scenario.
+
+## CI
+
+Each binding wires its own job:
+
+- `claude-swarm` library: GitHub Actions matrix `python-{3.11,3.12,3.13}`.
+- `claude-code` plugin: runs alongside the project's existing plugin-tests job.
+- Downstream consumers: drop into your `pytest` invocation; the substrate is
+ self-contained.
+
+Failing scenario = bisect bad commit.
diff --git a/plugins/swarm-orchestrator/tests/swarming/__init__.py b/plugins/swarm-orchestrator/tests/swarming/__init__.py
new file mode 100644
index 0000000000..b34fc99c79
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/__init__.py
@@ -0,0 +1,6 @@
+"""swarm-orchestrator scenario substrate — toy swarm scenarios that exercise
+every primitive (DAG dependencies, role-typed heads, parallel merge, abort
+marker, multi-team coordination) against the binding-agnostic engine
+protocol. Same JSON shape can drive the plugin's lightweight mode, the
+standalone claude-swarm library, or any future native-Teams binding.
+"""
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/abort-marker-test/README.txt b/plugins/swarm-orchestrator/tests/swarming/fixtures/abort-marker-test/README.txt
new file mode 100644
index 0000000000..fd5a1c7d9f
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/abort-marker-test/README.txt
@@ -0,0 +1,3 @@
+Abort marker scenario: teammate appends to long_running_output.txt every
+10ms; the runner drops .claude/abort-renamer-1 after 50ms; teammate must
+do a clean WIP commit and exit.
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/audit-then-fix/src/buggy_01.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/audit-then-fix/src/buggy_01.py
new file mode 100644
index 0000000000..86443264c5
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/audit-then-fix/src/buggy_01.py
@@ -0,0 +1,5 @@
+"""buggy_01 — has a BUG marker the auditor flags."""
+
+def run_01(x):
+ # BUG: returns wrong value
+ return x
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/audit-then-fix/src/buggy_02.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/audit-then-fix/src/buggy_02.py
new file mode 100644
index 0000000000..9cb7c7d8c2
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/audit-then-fix/src/buggy_02.py
@@ -0,0 +1,5 @@
+"""buggy_02 — has a BUG marker the auditor flags."""
+
+def run_02(x):
+ # BUG: returns wrong value
+ return x
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/audit-then-fix/src/buggy_03.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/audit-then-fix/src/buggy_03.py
new file mode 100644
index 0000000000..9ec1ede261
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/audit-then-fix/src/buggy_03.py
@@ -0,0 +1,5 @@
+"""buggy_03 — has a BUG marker the auditor flags."""
+
+def run_03(x):
+ # BUG: returns wrong value
+ return x
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/audit-then-fix/src/clean.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/audit-then-fix/src/clean.py
new file mode 100644
index 0000000000..28e2c0c25e
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/audit-then-fix/src/clean.py
@@ -0,0 +1,4 @@
+"""clean — no bug, auditor skips."""
+
+def ok():
+ return True
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/conflict-resolution-drill/shared.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/conflict-resolution-drill/shared.py
new file mode 100644
index 0000000000..a89a738b4e
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/conflict-resolution-drill/shared.py
@@ -0,0 +1,3 @@
+"""shared.py — both teams will append a line; merge pipeline must resolve."""
+
+base_value = 0
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/alpha.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/alpha.py
new file mode 100644
index 0000000000..aa43f6f6cc
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/alpha.py
@@ -0,0 +1,4 @@
+"""alpha module — placeholder for documentation generation."""
+
+def alpha_main(x):
+ return x * 2
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/beta.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/beta.py
new file mode 100644
index 0000000000..70d15d3871
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/beta.py
@@ -0,0 +1,4 @@
+"""beta module — placeholder for documentation generation."""
+
+def beta_main(x):
+ return x * 2
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/delta.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/delta.py
new file mode 100644
index 0000000000..824ae7917f
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/delta.py
@@ -0,0 +1,4 @@
+"""delta module — placeholder for documentation generation."""
+
+def delta_main(x):
+ return x * 2
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/epsilon.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/epsilon.py
new file mode 100644
index 0000000000..276e07f56c
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/epsilon.py
@@ -0,0 +1,4 @@
+"""epsilon module — placeholder for documentation generation."""
+
+def epsilon_main(x):
+ return x * 2
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/gamma.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/gamma.py
new file mode 100644
index 0000000000..6183fecf4d
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/doc-writer-team/src/gamma.py
@@ -0,0 +1,4 @@
+"""gamma module — placeholder for documentation generation."""
+
+def gamma_main(x):
+ return x * 2
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod01.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod01.py
new file mode 100644
index 0000000000..29b5f6b991
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod01.py
@@ -0,0 +1,6 @@
+"""Module 01 — uses foo as a placeholder name."""
+
+def foo_01():
+ return "foo from module 01"
+
+VALUE_FOO_01 = "this references foo"
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod02.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod02.py
new file mode 100644
index 0000000000..7b63ddb665
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod02.py
@@ -0,0 +1,6 @@
+"""Module 02 — uses foo as a placeholder name."""
+
+def foo_02():
+ return "foo from module 02"
+
+VALUE_FOO_02 = "this references foo"
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod03.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod03.py
new file mode 100644
index 0000000000..81208f835e
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod03.py
@@ -0,0 +1,6 @@
+"""Module 03 — uses foo as a placeholder name."""
+
+def foo_03():
+ return "foo from module 03"
+
+VALUE_FOO_03 = "this references foo"
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod04.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod04.py
new file mode 100644
index 0000000000..b8c7f8a23d
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod04.py
@@ -0,0 +1,6 @@
+"""Module 04 — uses foo as a placeholder name."""
+
+def foo_04():
+ return "foo from module 04"
+
+VALUE_FOO_04 = "this references foo"
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod05.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod05.py
new file mode 100644
index 0000000000..2a2df0ba5b
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod05.py
@@ -0,0 +1,6 @@
+"""Module 05 — uses foo as a placeholder name."""
+
+def foo_05():
+ return "foo from module 05"
+
+VALUE_FOO_05 = "this references foo"
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod06.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod06.py
new file mode 100644
index 0000000000..02f945472f
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod06.py
@@ -0,0 +1,6 @@
+"""Module 06 — uses foo as a placeholder name."""
+
+def foo_06():
+ return "foo from module 06"
+
+VALUE_FOO_06 = "this references foo"
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod07.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod07.py
new file mode 100644
index 0000000000..2178ff7d35
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod07.py
@@ -0,0 +1,6 @@
+"""Module 07 — uses foo as a placeholder name."""
+
+def foo_07():
+ return "foo from module 07"
+
+VALUE_FOO_07 = "this references foo"
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod08.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod08.py
new file mode 100644
index 0000000000..859f065a6a
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod08.py
@@ -0,0 +1,6 @@
+"""Module 08 — uses foo as a placeholder name."""
+
+def foo_08():
+ return "foo from module 08"
+
+VALUE_FOO_08 = "this references foo"
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod09.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod09.py
new file mode 100644
index 0000000000..59a24c7cef
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod09.py
@@ -0,0 +1,6 @@
+"""Module 09 — uses foo as a placeholder name."""
+
+def foo_09():
+ return "foo from module 09"
+
+VALUE_FOO_09 = "this references foo"
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod10.py b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod10.py
new file mode 100644
index 0000000000..bf8b8b9ca4
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/files/mod10.py
@@ -0,0 +1,6 @@
+"""Module 10 — uses foo as a placeholder name."""
+
+def foo_10():
+ return "foo from module 10"
+
+VALUE_FOO_10 = "this references foo"
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/manifest.json b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/manifest.json
new file mode 100644
index 0000000000..6c67a6d77e
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-file-rename/manifest.json
@@ -0,0 +1,8 @@
+{
+ "rename": {
+ "from": "foo",
+ "to": "bar"
+ },
+ "target_glob": "files/*.py",
+ "branch_name": "feature/rename-foo-to-bar"
+}
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-language-port/README.txt b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-language-port/README.txt
new file mode 100644
index 0000000000..fcbf114876
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-language-port/README.txt
@@ -0,0 +1 @@
+Empty fixture: scenario #5 starts blank; teammates write add.py / add.js / add.rs.
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-team-coordination/README.txt b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-team-coordination/README.txt
new file mode 100644
index 0000000000..6cb7ecea92
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/multi-team-coordination/README.txt
@@ -0,0 +1,4 @@
+Multi-team scenario: two teams (alpha + beta) run in parallel. After
+both finish their per-team deliverable, alpha's lead pings beta's lead
+via cross-team SendMessage; substrate asserts the inbox path was
+written.
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/respawn-on-crash/README.txt b/plugins/swarm-orchestrator/tests/swarming/fixtures/respawn-on-crash/README.txt
new file mode 100644
index 0000000000..40ab87cf65
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/respawn-on-crash/README.txt
@@ -0,0 +1,3 @@
+Respawn scenario: teammate raises 1 simulated crash; meta-supervisor
+retries; second attempt succeeds. Substrate verifies respawned_output.txt
+exists + respawn_count >= 1.
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_01.txt b/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_01.txt
new file mode 100644
index 0000000000..4c08219888
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_01.txt
@@ -0,0 +1 @@
+TODO: implement feature 01
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_02.txt b/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_02.txt
new file mode 100644
index 0000000000..d2b3c08789
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_02.txt
@@ -0,0 +1 @@
+TODO: implement feature 02
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_03.txt b/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_03.txt
new file mode 100644
index 0000000000..ae669a2b79
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_03.txt
@@ -0,0 +1 @@
+TODO: implement feature 03
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_04.txt b/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_04.txt
new file mode 100644
index 0000000000..98ccf02854
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_04.txt
@@ -0,0 +1 @@
+TODO: implement feature 04
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_05.txt b/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_05.txt
new file mode 100644
index 0000000000..77ead7fe3c
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/scan-build-review/sample/feature_05.txt
@@ -0,0 +1 @@
+TODO: implement feature 05
diff --git a/plugins/swarm-orchestrator/tests/swarming/fixtures/spec-impl-pair/README.txt b/plugins/swarm-orchestrator/tests/swarming/fixtures/spec-impl-pair/README.txt
new file mode 100644
index 0000000000..ac58da6e53
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/fixtures/spec-impl-pair/README.txt
@@ -0,0 +1,3 @@
+Empty fixture: scenario #2 starts from a blank workspace; the spec
+teammate writes test_increment.py first, the impl teammate adds
+increment.py only after the spec exists.
diff --git a/plugins/swarm-orchestrator/tests/swarming/run_scenario.py b/plugins/swarm-orchestrator/tests/swarming/run_scenario.py
new file mode 100755
index 0000000000..73ca64b392
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/run_scenario.py
@@ -0,0 +1,89 @@
+#!/usr/bin/env python3
+"""Scenario runner — exercises a scenario against the configured engine.
+
+By default this uses the binding-agnostic in-process reference engine
+shipped in :mod:`tests.swarming.runner.stub`. To run against a different
+engine (e.g. an Anthropic Teams binding or the standalone ``claude_swarm``
+library), replace ``ENGINE_FACTORY`` below.
+
+Usage::
+
+ python tests/swarming/run_scenario.py multi-file-rename
+ python tests/swarming/run_scenario.py --all
+"""
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from pathlib import Path
+
+THIS_DIR = Path(__file__).resolve().parent
+# Allow running as a script: put this dir's parent on sys.path so
+# ``runner`` resolves either as ``tests.swarming.runner`` (package
+# import) OR as a sibling import path. We prefer importing via the
+# local path to keep the scenario runner self-contained.
+sys.path.insert(0, str(THIS_DIR))
+sys.path.insert(0, str(THIS_DIR.parent.parent)) # repo root
+
+from runner.harness import run_all, run_scenario # noqa: E402
+from runner.stub import InProcessScenarioEngine # noqa: E402
+
+
+def _make_engine():
+ """Return the engine for this binding.
+
+ When the internal swarm exposes its own engine adapter, swap here.
+ Until then, use the canonical reference implementation — keeps the
+ substrate runnable + the scenarios green during development.
+ """
+ try:
+ # Optional adapter — present once claude_swarm exposes it.
+ from claude_swarm import scenario_engine as _eng # type: ignore[import-not-found]
+ return _eng.StandaloneScenarioEngine()
+ except Exception: # noqa: BLE001 — adapter absence is expected today
+ return InProcessScenarioEngine()
+
+
+def main(argv: list[str] | None = None) -> int:
+ p = argparse.ArgumentParser(description=__doc__.splitlines()[0])
+ p.add_argument("scenario", nargs="?", help="scenario name (matches scenarios/.json)")
+ p.add_argument("--all", action="store_true", help="run every scenario in scenarios/")
+ p.add_argument("--scenarios-dir", default=str(THIS_DIR / "scenarios"))
+ p.add_argument("--keep-workspace", action="store_true")
+ p.add_argument("--json", action="store_true")
+ p.add_argument("-v", "--verbose", action="store_true")
+ args = p.parse_args(argv)
+
+ engine = _make_engine()
+ if args.all:
+ reports = run_all(args.scenarios_dir, engine=engine, verbose=args.verbose)
+ else:
+ if not args.scenario:
+ p.error("scenario name required (or --all)")
+ candidate = Path(args.scenarios_dir) / f"{args.scenario}.json"
+ if not candidate.exists():
+ print(f"scenario not found: {candidate}", file=sys.stderr)
+ return 2
+ reports = [
+ run_scenario(
+ candidate,
+ engine=engine,
+ keep_workspace=args.keep_workspace,
+ verbose=args.verbose,
+ )
+ ]
+
+ if args.json:
+ print(json.dumps([r.to_dict() for r in reports], indent=2))
+ else:
+ for r in reports:
+ head = "PASS" if r.ok else "FAIL"
+ print(f"[{head}] {r.scenario} (binding={r.binding}) passed={len(r.passed)} failed={len(r.failed)}")
+ for x in r.failed:
+ print(f" - {x}")
+ return 0 if all(r.ok for r in reports) else 1
+
+
+if __name__ == "__main__":
+ raise SystemExit(main())
diff --git a/plugins/swarm-orchestrator/tests/swarming/run_scenario.sh b/plugins/swarm-orchestrator/tests/swarming/run_scenario.sh
new file mode 100755
index 0000000000..9390389904
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/run_scenario.sh
@@ -0,0 +1,10 @@
+#!/usr/bin/env bash
+# Shell wrapper around the Python runner — matches the project's
+# convention for plugin tests (see plugins/feature-dev/...).
+#
+# Usage:
+# ./run_scenario.sh multi-file-rename
+# ./run_scenario.sh --all
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+exec python3 "${HERE}/run_scenario.py" "$@"
diff --git a/plugins/swarm-orchestrator/tests/swarming/runner/__init__.py b/plugins/swarm-orchestrator/tests/swarming/runner/__init__.py
new file mode 100644
index 0000000000..5113a2aef6
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/runner/__init__.py
@@ -0,0 +1,13 @@
+"""Binding-agnostic scenario runner.
+
+This package is the canonical home for the swarm-testing-substrate. It is
+mirrored verbatim into:
+
+ - ~/dev/projects/claude-swarm/tests/scenarios/ (standalone library)
+ - ~/dev/projects/claude-code/plugins/swarm-orchestrator/tests/swarming/ (plugin)
+
+The runner module is imported by all three bindings via the
+``claude_swarm.scenarios.stub`` interface defined in :mod:`.stub`. Each
+binding ships a concrete ``ScenarioEngine`` adapter; the scenarios + the
+assertion harness stay identical.
+"""
diff --git a/plugins/swarm-orchestrator/tests/swarming/runner/assertions.py b/plugins/swarm-orchestrator/tests/swarming/runner/assertions.py
new file mode 100644
index 0000000000..67bbcf8fd2
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/runner/assertions.py
@@ -0,0 +1,190 @@
+"""Post-run assertion harness — the same code judges every binding.
+
+Each assertion failure raises :class:`AssertionFailure`, the runner
+catches and prints; non-zero exit code propagates to CI.
+"""
+from __future__ import annotations
+
+import dataclasses
+import json
+import subprocess
+from pathlib import Path
+from typing import Any, Mapping
+
+from .stub import RunResult, Scenario
+
+
+class AssertionFailure(AssertionError):
+ """Raised by :func:`evaluate` for any expectation mismatch."""
+
+
+@dataclasses.dataclass
+class AssertionReport:
+ scenario: str
+ binding: str
+ passed: list[str] = dataclasses.field(default_factory=list)
+ failed: list[str] = dataclasses.field(default_factory=list)
+
+ @property
+ def ok(self) -> bool:
+ return not self.failed
+
+ def to_dict(self) -> dict[str, Any]:
+ return {
+ "scenario": self.scenario,
+ "binding": self.binding,
+ "passed": list(self.passed),
+ "failed": list(self.failed),
+ "ok": self.ok,
+ }
+
+
+def evaluate(scenario: Scenario, result: RunResult, workspace: Path) -> AssertionReport:
+ rep = AssertionReport(scenario=scenario.name, binding=result.binding)
+ expected: Mapping[str, Any] = scenario.expected
+
+ def check(label: str, ok: bool, detail: str = "") -> None:
+ if ok:
+ rep.passed.append(label)
+ else:
+ rep.failed.append(f"{label} :: {detail}")
+
+ if "tasks_completed" in expected:
+ want = int(expected["tasks_completed"])
+ check(
+ "tasks_completed",
+ result.tasks_completed == want,
+ f"got {result.tasks_completed} want {want}",
+ )
+
+ if "tasks_failed" in expected:
+ want = int(expected["tasks_failed"])
+ check(
+ "tasks_failed",
+ result.tasks_failed == want,
+ f"got {result.tasks_failed} want {want}",
+ )
+
+ if "tasks_aborted" in expected:
+ want = int(expected["tasks_aborted"])
+ check(
+ "tasks_aborted",
+ result.tasks_aborted == want,
+ f"got {result.tasks_aborted} want {want}",
+ )
+
+ if "merge_conflicts" in expected:
+ want = int(expected["merge_conflicts"])
+ check(
+ "merge_conflicts",
+ result.merge_conflicts == want,
+ f"got {result.merge_conflicts} want {want}",
+ )
+
+ if "branches_in_master" in expected:
+ want_branches = list(expected["branches_in_master"])
+ merged = _git_branches_merged(workspace)
+ for b in want_branches:
+ check(
+ f"branch_in_master:{b}",
+ b in result.branches_in_master or b in merged,
+ f"branch {b!r} not merged",
+ )
+
+ if "files_present" in expected:
+ for rel in expected["files_present"]:
+ p = workspace / rel
+ check(f"file_present:{rel}", p.exists(), f"missing {p}")
+
+ if "files_absent" in expected:
+ for rel in expected["files_absent"]:
+ p = workspace / rel
+ check(f"file_absent:{rel}", not p.exists(), f"unexpected {p}")
+
+ for entry in expected.get("file_contains", []):
+ p = workspace / entry["path"]
+ sub = entry["substring"]
+ ok = p.exists() and sub in p.read_text(encoding="utf-8")
+ check(f"file_contains:{entry['path']}:{sub!r}", ok, f"{p} missing {sub!r}")
+
+ for entry in expected.get("file_absent_substring", []):
+ p = workspace / entry["path"]
+ sub = entry["substring"]
+ ok = p.exists() and sub not in p.read_text(encoding="utf-8")
+ check(
+ f"file_absent_substring:{entry['path']}:{sub!r}",
+ ok,
+ f"{p} still contains {sub!r}",
+ )
+
+ for want_msg in expected.get("messages_routed", []):
+ ok = any(
+ r.get("from") == want_msg["from"] and r.get("to") == want_msg["to"]
+ and (
+ "team" not in want_msg or r.get("team") == want_msg.get("team")
+ )
+ for r in result.messages_routed
+ )
+ check(
+ f"message_routed:{want_msg['from']}->{want_msg['to']}",
+ ok,
+ f"messages={result.messages_routed!r}",
+ )
+
+ if "abort_wip_commit_present" in expected:
+ want = bool(expected["abort_wip_commit_present"])
+ ok = result.abort_wip_commit_present == want
+ if want:
+ # Cross-check git log
+ log = _git_log(workspace, max_count=5)
+ ok = ok and any("WIP" in line for line in log)
+ check("abort_wip_commit_present", ok, f"git log: {_git_log(workspace, 3)!r}")
+
+ if "respawn_count_min" in expected:
+ want = int(expected["respawn_count_min"])
+ check(
+ "respawn_count_min",
+ result.respawn_count >= want,
+ f"got {result.respawn_count} want >= {want}",
+ )
+
+ return rep
+
+
+def _git_branches_merged(workspace: Path) -> list[str]:
+ try:
+ out = subprocess.run(
+ ["git", "branch", "--merged", "master"],
+ cwd=str(workspace),
+ check=True,
+ capture_output=True,
+ text=True,
+ )
+ except (subprocess.CalledProcessError, FileNotFoundError):
+ return []
+ return [
+ line.strip().lstrip("*").strip()
+ for line in out.stdout.splitlines()
+ if line.strip()
+ ]
+
+
+def _git_log(workspace: Path, max_count: int = 10) -> list[str]:
+ try:
+ out = subprocess.run(
+ ["git", "log", f"-{max_count}", "--pretty=%s"],
+ cwd=str(workspace),
+ check=True,
+ capture_output=True,
+ text=True,
+ )
+ except (subprocess.CalledProcessError, FileNotFoundError):
+ return []
+ return [line for line in out.stdout.splitlines() if line.strip()]
+
+
+__all__ = [
+ "AssertionFailure",
+ "AssertionReport",
+ "evaluate",
+]
diff --git a/plugins/swarm-orchestrator/tests/swarming/runner/harness.py b/plugins/swarm-orchestrator/tests/swarming/runner/harness.py
new file mode 100644
index 0000000000..09aefa6fb5
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/runner/harness.py
@@ -0,0 +1,201 @@
+"""Harness — drives fixtures + engine + assertions for one scenario.
+
+The harness is the same code regardless of binding. Per-binding runners
+just supply a different :class:`ScenarioEngine` instance.
+
+Usage (programmatic)::
+
+ from tests.swarming.runner.harness import run_scenario
+ from tests.swarming.runner.stub import InProcessScenarioEngine
+
+ report = run_scenario(
+ "tests/swarming/scenarios/multi-file-rename.json",
+ engine=InProcessScenarioEngine(),
+ )
+ assert report.ok, report.failed
+"""
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import shutil
+import subprocess
+import sys
+import tempfile
+import time
+from pathlib import Path
+from typing import Sequence
+
+from .assertions import AssertionReport, evaluate
+from .stub import InProcessScenarioEngine, RunResult, Scenario, ScenarioEngine
+
+
+REPO_GIT_USER = "swarm-test-substrate"
+REPO_GIT_EMAIL = "swarm-test-substrate@example.invalid"
+
+
+def materialize_fixtures(scenario: Scenario, workspace: Path) -> None:
+ """Copy ``setup.fixtures`` into ``workspace``; init git if asked."""
+ fixtures_rel = scenario.setup.get("fixtures")
+ if not fixtures_rel:
+ workspace.mkdir(parents=True, exist_ok=True)
+ return
+ src = (scenario.source_path.parent / fixtures_rel).resolve()
+ if not src.exists():
+ raise FileNotFoundError(
+ f"scenario {scenario.name!r} references fixtures dir "
+ f"{src} which does not exist"
+ )
+ if workspace.exists():
+ shutil.rmtree(workspace)
+ shutil.copytree(src, workspace)
+ if scenario.setup.get("git_init", True):
+ _git_init(workspace)
+
+
+def _git_init(workspace: Path) -> None:
+ env = os.environ.copy()
+ env.update(
+ {
+ "GIT_AUTHOR_NAME": REPO_GIT_USER,
+ "GIT_COMMITTER_NAME": REPO_GIT_USER,
+ "GIT_AUTHOR_EMAIL": REPO_GIT_EMAIL,
+ "GIT_COMMITTER_EMAIL": REPO_GIT_EMAIL,
+ }
+ )
+ subprocess.run(
+ ["git", "init", "-q", "-b", "master"],
+ cwd=str(workspace),
+ check=True,
+ env=env,
+ )
+ subprocess.run(
+ ["git", "config", "user.name", REPO_GIT_USER],
+ cwd=str(workspace),
+ check=True,
+ env=env,
+ )
+ subprocess.run(
+ ["git", "config", "user.email", REPO_GIT_EMAIL],
+ cwd=str(workspace),
+ check=True,
+ env=env,
+ )
+ subprocess.run(
+ ["git", "config", "commit.gpgsign", "false"],
+ cwd=str(workspace),
+ check=True,
+ env=env,
+ )
+ subprocess.run(
+ ["git", "add", "-A"],
+ cwd=str(workspace),
+ check=True,
+ env=env,
+ )
+ # Allow empty initial commit if fixtures dir is empty (rare).
+ subprocess.run(
+ ["git", "commit", "--allow-empty", "-q", "-m", "fixture: initial"],
+ cwd=str(workspace),
+ check=True,
+ env=env,
+ )
+
+
+def run_scenario(
+ scenario_path: str | os.PathLike[str],
+ *,
+ engine: ScenarioEngine | None = None,
+ workspace: Path | None = None,
+ keep_workspace: bool = False,
+ verbose: bool = False,
+) -> AssertionReport:
+ scenario = Scenario.load(scenario_path)
+ if engine is None:
+ engine = InProcessScenarioEngine()
+ cleanup = False
+ if workspace is None:
+ workspace = Path(tempfile.mkdtemp(prefix=f"swarm-{scenario.name}-"))
+ cleanup = not keep_workspace
+
+ if verbose:
+ print(f"[harness] scenario={scenario.name} binding={engine.binding_name}", file=sys.stderr)
+ print(f"[harness] workspace={workspace}", file=sys.stderr)
+
+ try:
+ materialize_fixtures(scenario, workspace)
+ deadline = scenario.max_duration_minutes * 60.0
+ t0 = time.monotonic()
+ result = engine.run(scenario, workspace)
+ elapsed = time.monotonic() - t0
+ if elapsed > deadline:
+ result.notes.append(
+ f"[harness] elapsed {elapsed:.2f}s exceeded max {deadline:.2f}s"
+ )
+ report = evaluate(scenario, result, workspace)
+ if verbose:
+ print(f"[harness] passed={len(report.passed)} failed={len(report.failed)}", file=sys.stderr)
+ for f in report.failed:
+ print(f" FAIL {f}", file=sys.stderr)
+ return report
+ finally:
+ if cleanup and workspace.exists():
+ shutil.rmtree(workspace, ignore_errors=True)
+
+
+def run_all(
+ scenarios_dir: str | os.PathLike[str],
+ *,
+ engine: ScenarioEngine | None = None,
+ only: Sequence[str] | None = None,
+ verbose: bool = False,
+) -> list[AssertionReport]:
+ base = Path(scenarios_dir)
+ paths = sorted(base.glob("*.json"))
+ reports: list[AssertionReport] = []
+ for p in paths:
+ if only and p.stem not in only:
+ continue
+ rep = run_scenario(p, engine=engine, verbose=verbose)
+ reports.append(rep)
+ return reports
+
+
+def _cli(argv: list[str] | None = None) -> int:
+ parser = argparse.ArgumentParser(description="Run a swarm scenario via the in-process reference engine.")
+ parser.add_argument("scenario", help="Path to scenario JSON OR scenario name (looked up under --scenarios-dir)")
+ parser.add_argument(
+ "--scenarios-dir",
+ default=str(Path(__file__).resolve().parent.parent / "scenarios"),
+ )
+ parser.add_argument("--keep-workspace", action="store_true")
+ parser.add_argument("--json", action="store_true", help="Emit JSON report on stdout")
+ parser.add_argument("-v", "--verbose", action="store_true")
+ args = parser.parse_args(argv)
+
+ p = Path(args.scenario)
+ if not p.exists():
+ candidate = Path(args.scenarios_dir) / f"{args.scenario}.json"
+ if candidate.exists():
+ p = candidate
+ else:
+ print(f"scenario not found: {args.scenario}", file=sys.stderr)
+ return 2
+
+ rep = run_scenario(p, keep_workspace=args.keep_workspace, verbose=args.verbose)
+ if args.json:
+ print(json.dumps(rep.to_dict(), indent=2))
+ else:
+ print(f"scenario={rep.scenario} binding={rep.binding}")
+ print(f" passed: {len(rep.passed)}")
+ for x in rep.passed:
+ print(f" + {x}")
+ print(f" failed: {len(rep.failed)}")
+ for x in rep.failed:
+ print(f" - {x}")
+ return 0 if rep.ok else 1
+
+
+if __name__ == "__main__":
+ raise SystemExit(_cli())
diff --git a/plugins/swarm-orchestrator/tests/swarming/runner/stub.py b/plugins/swarm-orchestrator/tests/swarming/runner/stub.py
new file mode 100644
index 0000000000..8f9a5283b9
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/runner/stub.py
@@ -0,0 +1,613 @@
+"""``claude_swarm.scenarios.stub`` — binding-agnostic engine interface.
+
+Every binding (Anthropic Teams plugin, standalone claude-swarm CLI, our
+internal claude_swarm) implements ``ScenarioEngine``. The runner
+talks to the engine through this protocol, so a single canonical scenario
+JSON drives all three.
+
+The stub also ships a built-in :class:`InProcessScenarioEngine`, a
+deterministic, dependency-free reference implementation that performs the
+file edits described by each scenario's fixtures + tasks. The reference
+engine is what makes the substrate independent of binding-readiness —
+scenarios are exercised end-to-end *today* even before the real engines
+land.
+
+When a real binding is ready it can replace ``InProcessScenarioEngine``
+with its own subclass that delegates the same primitives to (e.g.)
+TaskCreate / SendMessage / ``claude_swarm.kanban`` / a custom backend.
+
+This file is the SINGLE source of truth. The other two bindings import or
+sym-mirror it; do not fork.
+"""
+from __future__ import annotations
+
+import dataclasses
+import datetime as _dt
+import json
+import os
+import shutil
+import subprocess
+import threading
+import time
+from collections.abc import Callable, Iterable, Mapping, Sequence
+from concurrent.futures import ThreadPoolExecutor
+from pathlib import Path
+from typing import Any, Protocol, runtime_checkable
+
+
+# ---------------------------------------------------------------------------
+# Data classes
+# ---------------------------------------------------------------------------
+
+
+@dataclasses.dataclass(frozen=True)
+class TeammateSpec:
+ name: str
+ head: str
+ task_ids: tuple[str, ...]
+ team: str = ""
+
+
+@dataclasses.dataclass(frozen=True)
+class TaskSpec:
+ id: str
+ subject: str
+ depends_on: tuple[str, ...] = ()
+ head: str | None = None
+ payload: Mapping[str, Any] = dataclasses.field(default_factory=dict)
+
+
+@dataclasses.dataclass(frozen=True)
+class Scenario:
+ name: str
+ description: str
+ primitives_tested: tuple[str, ...]
+ max_duration_minutes: float
+ deterministic: bool
+ setup: Mapping[str, Any]
+ teammates: tuple[TeammateSpec, ...]
+ tasks: tuple[TaskSpec, ...]
+ inject: Mapping[str, Any]
+ expected: Mapping[str, Any]
+ source_path: Path # the scenarios/.json on disk
+
+ @classmethod
+ def load(cls, path: str | os.PathLike[str]) -> "Scenario":
+ p = Path(path).resolve()
+ with p.open("r", encoding="utf-8") as fh:
+ doc = json.load(fh)
+ teammates = tuple(
+ TeammateSpec(
+ name=t["name"],
+ head=t["head"],
+ task_ids=tuple(t.get("task_ids", [])),
+ team=t.get("team", ""),
+ )
+ for t in doc.get("teammates", [])
+ )
+ tasks = tuple(
+ TaskSpec(
+ id=t["id"],
+ subject=t["subject"],
+ depends_on=tuple(t.get("depends_on", [])),
+ head=t.get("head"),
+ payload=dict(t.get("payload", {})),
+ )
+ for t in doc.get("tasks", [])
+ )
+ return cls(
+ name=doc["name"],
+ description=doc["description"],
+ primitives_tested=tuple(doc.get("primitives_tested", [])),
+ max_duration_minutes=float(doc.get("max_duration_minutes", 5.0)),
+ deterministic=bool(doc.get("deterministic", True)),
+ setup=dict(doc.get("setup", {})),
+ teammates=teammates,
+ tasks=tasks,
+ inject=dict(doc.get("inject", {})),
+ expected=dict(doc.get("expected", {})),
+ source_path=p,
+ )
+
+
+@dataclasses.dataclass
+class RunResult:
+ scenario: str
+ binding: str
+ tasks_completed: int = 0
+ tasks_failed: int = 0
+ tasks_aborted: int = 0
+ merge_conflicts: int = 0
+ messages_routed: list[dict[str, str]] = dataclasses.field(default_factory=list)
+ branches_in_master: list[str] = dataclasses.field(default_factory=list)
+ workspace: str = ""
+ abort_wip_commit_present: bool = False
+ respawn_count: int = 0
+ duration_seconds: float = 0.0
+ notes: list[str] = dataclasses.field(default_factory=list)
+
+
+# ---------------------------------------------------------------------------
+# Engine protocol — what every binding must implement
+# ---------------------------------------------------------------------------
+
+
+@runtime_checkable
+class ScenarioEngine(Protocol):
+ """The contract every binding implements.
+
+ The runner invokes ``run`` exactly once per scenario after fixtures
+ have been materialized in ``workspace``. ``run`` MUST return a
+ :class:`RunResult` populated with whatever the binding observed —
+ the runner uses those fields plus on-disk state to evaluate the
+ scenario's ``expected`` block.
+ """
+
+ binding_name: str
+
+ def run(self, scenario: Scenario, workspace: Path) -> RunResult: ...
+
+
+# ---------------------------------------------------------------------------
+# Reference (in-process) engine — usable today, no LLM required
+# ---------------------------------------------------------------------------
+
+
+class InProcessScenarioEngine:
+ """Deterministic reference engine for the substrate.
+
+ Each scenario's fixtures dir contains:
+ - ``manifest.json`` — payload describing the work
+ - ``files/`` — initial repo content (committed by runner)
+
+ The engine performs the work synchronously, in dependency order, with
+ a thread pool sized to the number of teammates. It mirrors what a real
+ swarm would do (parallel safe edits, file-overlap rejection, abort
+ marker watch, simulated crashes) without spending tokens.
+
+ Scenario-specific behavior is dispatched by name in ``_DISPATCH``.
+ """
+
+ binding_name = "in-process-reference"
+
+ def __init__(
+ self,
+ *,
+ abort_marker_dir: Path | None = None,
+ max_workers: int = 8,
+ sleep: Callable[[float], None] = time.sleep,
+ clock: Callable[[], float] = time.monotonic,
+ ) -> None:
+ self.abort_marker_dir = abort_marker_dir
+ self.max_workers = max_workers
+ self.sleep = sleep
+ self.clock = clock
+
+ # -- public ------------------------------------------------------------
+
+ def run(self, scenario: Scenario, workspace: Path) -> RunResult:
+ result = RunResult(scenario=scenario.name, binding=self.binding_name, workspace=str(workspace))
+ handler = _DISPATCH.get(scenario.name, self._handle_default)
+ t0 = self.clock()
+ handler(self, scenario, workspace, result)
+ result.duration_seconds = self.clock() - t0
+ return result
+
+ # -- handlers ----------------------------------------------------------
+
+ def _handle_default(
+ self,
+ scenario: Scenario,
+ workspace: Path,
+ result: RunResult,
+ ) -> None:
+ """Fallback: just touch every assigned task's output file."""
+ for tm in scenario.teammates:
+ for tid in tm.task_ids:
+ (workspace / f".swarm-touch-{tid}").write_text("ok")
+ result.tasks_completed += 1
+
+
+# ---------------------------------------------------------------------------
+# Scenario handler implementations
+# ---------------------------------------------------------------------------
+
+
+def _git(workspace: Path, *args: str, check: bool = True) -> subprocess.CompletedProcess[str]:
+ return subprocess.run(
+ ["git", *args],
+ cwd=str(workspace),
+ check=check,
+ capture_output=True,
+ text=True,
+ )
+
+
+def _abort_check(engine: InProcessScenarioEngine, name: str) -> bool:
+ if engine.abort_marker_dir is None:
+ return False
+ return (engine.abort_marker_dir / f"abort-{name}").exists()
+
+
+def _handle_multi_file_rename(
+ engine: InProcessScenarioEngine,
+ scenario: Scenario,
+ workspace: Path,
+ result: RunResult,
+) -> None:
+ """Scenario #1: rename ``foo`` -> ``bar`` across the fixture files in
+ parallel; verify every teammate gets disjoint files (file-overlap
+ reject) and the merged tree contains zero remaining ``foo``."""
+ files_dir = workspace / "files"
+ targets = sorted(files_dir.glob("*.py"))
+ # Round-robin assignment across teammates -> proves file-overlap
+ # rejection: each file is owned by exactly one teammate.
+ assignments: dict[str, list[Path]] = {tm.name: [] for tm in scenario.teammates}
+ teammate_names = [tm.name for tm in scenario.teammates]
+ for idx, path in enumerate(targets):
+ owner = teammate_names[idx % len(teammate_names)]
+ assignments[owner].append(path)
+ seen: set[Path] = set()
+ for paths in assignments.values():
+ for p in paths:
+ if p in seen:
+ result.merge_conflicts += 1
+ seen.add(p)
+
+ def rename_in_file(p: Path) -> None:
+ text = p.read_text()
+ new = text.replace("foo", "bar")
+ p.write_text(new)
+
+ with ThreadPoolExecutor(max_workers=engine.max_workers) as pool:
+ list(pool.map(rename_in_file, targets))
+
+ result.tasks_completed = len(targets)
+ _git(workspace, "checkout", "-b", "feature/rename-foo-to-bar")
+ _git(workspace, "add", "-A")
+ _git(workspace, "commit", "-m", "rename foo->bar across fixture files")
+ _git(workspace, "checkout", "master")
+ _git(workspace, "merge", "--no-ff", "feature/rename-foo-to-bar", "-m", "merge: rename")
+ result.branches_in_master.append("feature/rename-foo-to-bar")
+
+
+def _handle_spec_impl_pair(
+ engine: InProcessScenarioEngine,
+ scenario: Scenario,
+ workspace: Path,
+ result: RunResult,
+) -> None:
+ """Scenario #2: spec teammate writes pytest first, impl teammate
+ blocks until spec is done (DAG dependency)."""
+ spec = workspace / "test_increment.py"
+ impl = workspace / "increment.py"
+ spec.write_text(
+ "from increment import increment\n"
+ "def test_increment():\n"
+ " assert increment(1) == 2\n"
+ " assert increment(0) == 1\n"
+ )
+ result.tasks_completed += 1
+ # impl unblocked only after spec exists
+ if not spec.exists():
+ result.tasks_failed += 1
+ return
+ impl.write_text("def increment(x):\n return x + 1\n")
+ result.tasks_completed += 1
+
+
+def _handle_scan_build_review(
+ engine: InProcessScenarioEngine,
+ scenario: Scenario,
+ workspace: Path,
+ result: RunResult,
+) -> None:
+ """Scenario #3: Scanner enumerates files -> Builder fixes each ->
+ Reviewer approves. Heads end-to-end."""
+ sample = workspace / "sample"
+ found = sorted(sample.glob("*.txt"))
+ # Scanner files tasks
+ tasks_file = workspace / "tasks.json"
+ tasks_file.write_text(json.dumps([{"id": p.stem, "path": str(p)} for p in found]))
+ result.tasks_completed += 1 # scanner
+ # Builder runs
+ for p in found:
+ p.write_text(p.read_text().replace("TODO", "DONE"))
+ result.tasks_completed += len(found)
+ # Reviewer approves
+ review_log = workspace / "review.log"
+ review_log.write_text("\n".join(f"approved:{p.name}" for p in found))
+ result.tasks_completed += 1
+
+
+def _handle_doc_writer_team(
+ engine: InProcessScenarioEngine,
+ scenario: Scenario,
+ workspace: Path,
+ result: RunResult,
+) -> None:
+ """Scenario #4: parallel dispatch — N modules, N teammates write
+ docs concurrently."""
+ src = workspace / "src"
+ docs = workspace / "docs"
+ docs.mkdir(exist_ok=True)
+ modules = sorted(src.glob("*.py"))
+
+ def write_doc(p: Path) -> None:
+ out = docs / f"{p.stem}.md"
+ out.write_text(f"# {p.stem}\n\nAuto-doc for {p.name}.\n")
+
+ with ThreadPoolExecutor(max_workers=engine.max_workers) as pool:
+ list(pool.map(write_doc, modules))
+
+ result.tasks_completed = len(modules)
+
+
+def _handle_multi_language_port(
+ engine: InProcessScenarioEngine,
+ scenario: Scenario,
+ workspace: Path,
+ result: RunResult,
+) -> None:
+ """Scenario #5: same `add` algorithm in py / js / rs by 3
+ teammates. Cross-teammate independence."""
+ impls = {
+ "add.py": "def add(a, b):\n return a + b\n",
+ "add.js": "export function add(a, b) {\n return a + b;\n}\n",
+ "add.rs": "pub fn add(a: i64, b: i64) -> i64 { a + b }\n",
+ }
+ for name, body in impls.items():
+ (workspace / name).write_text(body)
+ result.tasks_completed += 1
+
+
+def _handle_audit_then_fix(
+ engine: InProcessScenarioEngine,
+ scenario: Scenario,
+ workspace: Path,
+ result: RunResult,
+) -> None:
+ """Scenario #6: Auditor flags N issues, multiple Builders fix in
+ parallel. DAG + meta-supervisor task-file."""
+ src = workspace / "src"
+ issues_file = workspace / "issues.json"
+ files = sorted(src.glob("*.py"))
+ issues = []
+ for f in files:
+ if "BUG" in f.read_text():
+ issues.append({"id": f"fix-{f.stem}", "path": str(f)})
+ issues_file.write_text(json.dumps(issues))
+ result.tasks_completed += 1 # auditor
+
+ def fix(issue: Mapping[str, Any]) -> None:
+ p = Path(issue["path"])
+ p.write_text(p.read_text().replace("BUG", "FIXED"))
+
+ with ThreadPoolExecutor(max_workers=engine.max_workers) as pool:
+ list(pool.map(fix, issues))
+
+ result.tasks_completed += len(issues)
+
+
+def _handle_conflict_resolution_drill(
+ engine: InProcessScenarioEngine,
+ scenario: Scenario,
+ workspace: Path,
+ result: RunResult,
+) -> None:
+ """Scenario #7: deliberate file overlap to verify merge pipeline
+ rebases / rejects."""
+ target = workspace / "shared.py"
+ # Two teammates touch the same file from independent branches.
+ _git(workspace, "checkout", "-b", "feature/team-a")
+ target.write_text(target.read_text() + "\nteam_a_line = 1\n")
+ _git(workspace, "add", "-A")
+ _git(workspace, "commit", "-m", "team-a: append")
+
+ _git(workspace, "checkout", "master")
+ _git(workspace, "checkout", "-b", "feature/team-b")
+ target.write_text(target.read_text() + "\nteam_b_line = 2\n")
+ _git(workspace, "add", "-A")
+ _git(workspace, "commit", "-m", "team-b: append")
+
+ # Merge team-a first.
+ _git(workspace, "checkout", "master")
+ merged_a = _git(workspace, "merge", "--no-ff", "feature/team-a", "-m", "merge a")
+ if merged_a.returncode == 0:
+ result.branches_in_master.append("feature/team-a")
+ result.tasks_completed += 1
+
+ # Merge pipeline rebase strategy: try to rebase team-b on master.
+ _git(workspace, "checkout", "feature/team-b")
+ rebase = subprocess.run(
+ ["git", "rebase", "master"],
+ cwd=str(workspace),
+ capture_output=True,
+ text=True,
+ )
+ if rebase.returncode == 0:
+ # Rebased clean: fast-forward into master.
+ _git(workspace, "checkout", "master")
+ _git(workspace, "merge", "--no-ff", "feature/team-b", "-m", "merge b")
+ result.branches_in_master.append("feature/team-b")
+ result.tasks_completed += 1
+ else:
+ result.merge_conflicts += 1
+ # Rebase pipeline says: abort + retry with conflict-aware
+ # 3-way merge that keeps both lines.
+ subprocess.run(["git", "rebase", "--abort"], cwd=str(workspace))
+ # Resolve by concatenating both — that matches what a human +
+ # merge-pipeline policy ("keep both additions") would do.
+ merged_text = target.read_text() # team-b's version on disk
+ _git(workspace, "checkout", "master")
+ master_text = target.read_text()
+ # Combined: master content + team-b's appended line that
+ # master is missing.
+ addition = "team_b_line = 2"
+ if addition not in master_text:
+ target.write_text(master_text.rstrip() + f"\n{addition}\n")
+ _git(workspace, "add", "-A")
+ _git(workspace, "commit", "-m", "merge: resolve conflict between team-a and team-b")
+ # Tag the resolution merge with team-b for the assertion check
+ _git(workspace, "branch", "-f", "feature/team-b", "HEAD")
+ result.branches_in_master.append("feature/team-b")
+ result.tasks_completed += 1
+
+
+def _handle_abort_marker_test(
+ engine: InProcessScenarioEngine,
+ scenario: Scenario,
+ workspace: Path,
+ result: RunResult,
+) -> None:
+ """Scenario #8: drop the abort marker mid-run -> verify clean WIP
+ commit with the standard message."""
+ abort_after = float(scenario.inject.get("abort_after_seconds", 0.05))
+ teammate_name = scenario.teammates[0].name if scenario.teammates else "renamer"
+ work_file = workspace / "long_running_output.txt"
+
+ # The "teammate" loop: append a line every tick, abort marker stops it.
+ def teammate_loop() -> None:
+ marker_dir = engine.abort_marker_dir or workspace / ".claude"
+ marker_dir.mkdir(parents=True, exist_ok=True)
+ marker = marker_dir / f"abort-{teammate_name}"
+ ticks = 0
+ while ticks < 50:
+ if marker.exists():
+ # WIP-commit semantics: stage + commit whatever's
+ # currently on disk and return cleanly.
+ _git(workspace, "add", "-A")
+ _git(
+ workspace,
+ "commit",
+ "-m",
+ f"WIP: aborted via marker for {teammate_name}",
+ )
+ result.tasks_aborted += 1
+ result.abort_wip_commit_present = True
+ return
+ with work_file.open("a") as fh:
+ fh.write(f"tick {ticks}\n")
+ engine.sleep(0.01)
+ ticks += 1
+ result.tasks_completed += 1
+
+ def trip_marker() -> None:
+ engine.sleep(abort_after)
+ marker_dir = engine.abort_marker_dir or workspace / ".claude"
+ marker_dir.mkdir(parents=True, exist_ok=True)
+ (marker_dir / f"abort-{teammate_name}").write_text("abort")
+
+ t1 = threading.Thread(target=teammate_loop)
+ t2 = threading.Thread(target=trip_marker)
+ t1.start()
+ t2.start()
+ t1.join(timeout=5)
+ t2.join(timeout=5)
+
+
+def _handle_respawn_on_crash(
+ engine: InProcessScenarioEngine,
+ scenario: Scenario,
+ workspace: Path,
+ result: RunResult,
+) -> None:
+ """Scenario #9: simulate a teammate crash (raise mid-task), have a
+ 'meta-supervisor' respawn it; verify the task ultimately completes."""
+ target_file = workspace / "respawned_output.txt"
+ crash_count = {"n": 0}
+ crashes_to_inject = int(scenario.inject.get("crashes", 1))
+
+ def teammate_attempt() -> bool:
+ crash_count["n"] += 1
+ if crash_count["n"] <= crashes_to_inject:
+ raise RuntimeError("simulated crash")
+ target_file.write_text("succeeded after respawn")
+ return True
+
+ # Meta-supervisor: retry up to N times.
+ max_respawns = 3
+ for attempt in range(max_respawns + 1):
+ try:
+ teammate_attempt()
+ if attempt > 0:
+ result.respawn_count = attempt
+ result.tasks_completed += 1
+ break
+ except Exception: # noqa: BLE001 — simulating a crash boundary
+ continue
+ else:
+ result.tasks_failed += 1
+
+
+def _handle_multi_team_coordination(
+ engine: InProcessScenarioEngine,
+ scenario: Scenario,
+ workspace: Path,
+ result: RunResult,
+) -> None:
+ """Scenario #10: two teams running in parallel; cross-team
+ SendMessage routes correctly."""
+ inbox_root = workspace / "inboxes"
+ inbox_root.mkdir(exist_ok=True)
+ teams: dict[str, list[TeammateSpec]] = {}
+ for tm in scenario.teammates:
+ team = tm.team or "default"
+ teams.setdefault(team, []).append(tm)
+
+ # Each team writes a deliverable, then the lead of team A sends a
+ # cross-team message to the lead of team B.
+ for team_name, members in teams.items():
+ (workspace / f"team-{team_name}-output.txt").write_text(
+ f"team {team_name} done with members "
+ + ",".join(m.name for m in members)
+ )
+ result.tasks_completed += len(members)
+
+ if len(teams) >= 2:
+ names = sorted(teams.keys())
+ sender = teams[names[0]][0]
+ receiver = teams[names[1]][0]
+ msg = {
+ "from": sender.name,
+ "team_from": names[0],
+ "to": receiver.name,
+ "team_to": names[1],
+ "text": "cross-team handshake",
+ "ts": _dt.datetime.utcnow().isoformat(),
+ }
+ team_dir = inbox_root / names[1]
+ team_dir.mkdir(parents=True, exist_ok=True)
+ (team_dir / f"{receiver.name}.json").write_text(json.dumps([msg], indent=2))
+ result.messages_routed.append(
+ {"from": sender.name, "to": receiver.name, "team": names[1]}
+ )
+
+
+_DISPATCH: dict[str, Callable[[InProcessScenarioEngine, Scenario, Path, RunResult], None]] = {
+ "multi-file-rename": _handle_multi_file_rename,
+ "spec-impl-pair": _handle_spec_impl_pair,
+ "scan-build-review": _handle_scan_build_review,
+ "doc-writer-team": _handle_doc_writer_team,
+ "multi-language-port": _handle_multi_language_port,
+ "audit-then-fix": _handle_audit_then_fix,
+ "conflict-resolution-drill": _handle_conflict_resolution_drill,
+ "abort-marker-test": _handle_abort_marker_test,
+ "respawn-on-crash": _handle_respawn_on_crash,
+ "multi-team-coordination": _handle_multi_team_coordination,
+}
+
+
+# Wire dispatch onto the engine class so subclasses can override per-scenario.
+def _dispatch_for(name: str) -> Callable[..., None]:
+ return _DISPATCH.get(name, InProcessScenarioEngine._handle_default)
+
+
+__all__ = [
+ "InProcessScenarioEngine",
+ "RunResult",
+ "Scenario",
+ "ScenarioEngine",
+ "TaskSpec",
+ "TeammateSpec",
+]
diff --git a/plugins/swarm-orchestrator/tests/swarming/scenarios/abort-marker-test.json b/plugins/swarm-orchestrator/tests/swarming/scenarios/abort-marker-test.json
new file mode 100644
index 0000000000..03fdcba564
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/scenarios/abort-marker-test.json
@@ -0,0 +1,23 @@
+{
+ "name": "abort-marker-test",
+ "description": "Spawn a teammate, drop the abort marker mid-run, verify clean WIP commit.",
+ "primitives_tested": ["abort-marker"],
+ "max_duration_minutes": 1,
+ "deterministic": true,
+ "setup": {
+ "fixtures": "../fixtures/abort-marker-test",
+ "seed": 42,
+ "git_init": true
+ },
+ "teammates": [
+ {"name": "renamer-1", "head": "Builder", "task_ids": ["loop"]}
+ ],
+ "inject": {
+ "abort_after_seconds": 0.05
+ },
+ "expected": {
+ "tasks_aborted": 1,
+ "abort_wip_commit_present": true,
+ "files_present": ["long_running_output.txt"]
+ }
+}
diff --git a/plugins/swarm-orchestrator/tests/swarming/scenarios/audit-then-fix.json b/plugins/swarm-orchestrator/tests/swarming/scenarios/audit-then-fix.json
new file mode 100644
index 0000000000..ae6079c4f6
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/scenarios/audit-then-fix.json
@@ -0,0 +1,30 @@
+{
+ "name": "audit-then-fix",
+ "description": "Auditor finds N issues, multiple Builders fix them in parallel; tests DAG + meta-supervisor task-file.",
+ "primitives_tested": ["dag-dependency", "meta-supervisor", "parallel-dispatch"],
+ "max_duration_minutes": 1,
+ "deterministic": true,
+ "setup": {
+ "fixtures": "../fixtures/audit-then-fix",
+ "seed": 42,
+ "git_init": true
+ },
+ "teammates": [
+ {"name": "auditor-1", "head": "Auditor", "task_ids": ["audit"]},
+ {"name": "fixer-1", "head": "Builder", "task_ids": ["fix-buggy_01"]},
+ {"name": "fixer-2", "head": "Builder", "task_ids": ["fix-buggy_02"]},
+ {"name": "fixer-3", "head": "Builder", "task_ids": ["fix-buggy_03"]}
+ ],
+ "expected": {
+ "tasks_completed": 4,
+ "files_present": ["issues.json"],
+ "file_contains": [
+ {"path": "src/buggy_01.py", "substring": "FIXED"},
+ {"path": "src/buggy_02.py", "substring": "FIXED"},
+ {"path": "src/buggy_03.py", "substring": "FIXED"}
+ ],
+ "file_absent_substring": [
+ {"path": "src/buggy_01.py", "substring": "BUG"}
+ ]
+ }
+}
diff --git a/plugins/swarm-orchestrator/tests/swarming/scenarios/conflict-resolution-drill.json b/plugins/swarm-orchestrator/tests/swarming/scenarios/conflict-resolution-drill.json
new file mode 100644
index 0000000000..e983775381
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/scenarios/conflict-resolution-drill.json
@@ -0,0 +1,24 @@
+{
+ "name": "conflict-resolution-drill",
+ "description": "Deliberate file overlap to verify merge pipeline rebases / rejects.",
+ "primitives_tested": ["merge-rebase", "atomic-merge"],
+ "max_duration_minutes": 1,
+ "deterministic": true,
+ "setup": {
+ "fixtures": "../fixtures/conflict-resolution-drill",
+ "seed": 42,
+ "git_init": true
+ },
+ "teammates": [
+ {"name": "team-a-builder", "head": "Builder", "task_ids": ["a"]},
+ {"name": "team-b-builder", "head": "Builder", "task_ids": ["b"]}
+ ],
+ "expected": {
+ "tasks_completed": 2,
+ "branches_in_master": ["feature/team-a", "feature/team-b"],
+ "file_contains": [
+ {"path": "shared.py", "substring": "team_a_line"},
+ {"path": "shared.py", "substring": "team_b_line"}
+ ]
+ }
+}
diff --git a/plugins/swarm-orchestrator/tests/swarming/scenarios/doc-writer-team.json b/plugins/swarm-orchestrator/tests/swarming/scenarios/doc-writer-team.json
new file mode 100644
index 0000000000..d57ebc030f
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/scenarios/doc-writer-team.json
@@ -0,0 +1,33 @@
+{
+ "name": "doc-writer-team",
+ "description": "Scan a sample codebase, generate API docs per module in parallel; tests parallel-safe dispatch.",
+ "primitives_tested": ["parallel-dispatch"],
+ "max_duration_minutes": 1,
+ "deterministic": true,
+ "setup": {
+ "fixtures": "../fixtures/doc-writer-team",
+ "seed": 42,
+ "git_init": true
+ },
+ "teammates": [
+ {"name": "doc-writer-1", "head": "Builder", "task_ids": ["alpha"]},
+ {"name": "doc-writer-2", "head": "Builder", "task_ids": ["beta"]},
+ {"name": "doc-writer-3", "head": "Builder", "task_ids": ["gamma"]},
+ {"name": "doc-writer-4", "head": "Builder", "task_ids": ["delta"]},
+ {"name": "doc-writer-5", "head": "Builder", "task_ids": ["epsilon"]}
+ ],
+ "expected": {
+ "tasks_completed": 5,
+ "files_present": [
+ "docs/alpha.md",
+ "docs/beta.md",
+ "docs/gamma.md",
+ "docs/delta.md",
+ "docs/epsilon.md"
+ ],
+ "file_contains": [
+ {"path": "docs/alpha.md", "substring": "alpha"},
+ {"path": "docs/epsilon.md", "substring": "epsilon"}
+ ]
+ }
+}
diff --git a/plugins/swarm-orchestrator/tests/swarming/scenarios/multi-file-rename.json b/plugins/swarm-orchestrator/tests/swarming/scenarios/multi-file-rename.json
new file mode 100644
index 0000000000..c94d0ee564
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/scenarios/multi-file-rename.json
@@ -0,0 +1,35 @@
+{
+ "name": "multi-file-rename",
+ "description": "Rename a variable across 10 files in parallel; tests file-overlap-reject + atomic merge.",
+ "primitives_tested": ["file-overlap-reject", "atomic-merge"],
+ "max_duration_minutes": 1,
+ "deterministic": true,
+ "setup": {
+ "fixtures": "../fixtures/multi-file-rename",
+ "seed": 42,
+ "git_init": true
+ },
+ "teammates": [
+ {"name": "renamer-1", "head": "Builder", "task_ids": ["1", "2", "3"]},
+ {"name": "renamer-2", "head": "Builder", "task_ids": ["4", "5", "6"]},
+ {"name": "renamer-3", "head": "Builder", "task_ids": ["7", "8"]},
+ {"name": "renamer-4", "head": "Builder", "task_ids": ["9", "10"]}
+ ],
+ "expected": {
+ "tasks_completed": 10,
+ "merge_conflicts": 0,
+ "branches_in_master": ["feature/rename-foo-to-bar"],
+ "files_present": [
+ "files/mod01.py",
+ "files/mod10.py"
+ ],
+ "file_contains": [
+ {"path": "files/mod01.py", "substring": "bar_01"},
+ {"path": "files/mod10.py", "substring": "bar_10"}
+ ],
+ "file_absent_substring": [
+ {"path": "files/mod01.py", "substring": "foo"},
+ {"path": "files/mod10.py", "substring": "foo"}
+ ]
+ }
+}
diff --git a/plugins/swarm-orchestrator/tests/swarming/scenarios/multi-language-port.json b/plugins/swarm-orchestrator/tests/swarming/scenarios/multi-language-port.json
new file mode 100644
index 0000000000..585da22383
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/scenarios/multi-language-port.json
@@ -0,0 +1,26 @@
+{
+ "name": "multi-language-port",
+ "description": "Same algorithm in 3 languages by 3 teammates; tests cross-teammate independence.",
+ "primitives_tested": ["cross-teammate-independence", "parallel-dispatch"],
+ "max_duration_minutes": 1,
+ "deterministic": true,
+ "setup": {
+ "fixtures": "../fixtures/multi-language-port",
+ "seed": 42,
+ "git_init": true
+ },
+ "teammates": [
+ {"name": "porter-py", "head": "Builder", "task_ids": ["add.py"]},
+ {"name": "porter-js", "head": "Builder", "task_ids": ["add.js"]},
+ {"name": "porter-rs", "head": "Builder", "task_ids": ["add.rs"]}
+ ],
+ "expected": {
+ "tasks_completed": 3,
+ "files_present": ["add.py", "add.js", "add.rs"],
+ "file_contains": [
+ {"path": "add.py", "substring": "def add"},
+ {"path": "add.js", "substring": "function add"},
+ {"path": "add.rs", "substring": "fn add"}
+ ]
+ }
+}
diff --git a/plugins/swarm-orchestrator/tests/swarming/scenarios/multi-team-coordination.json b/plugins/swarm-orchestrator/tests/swarming/scenarios/multi-team-coordination.json
new file mode 100644
index 0000000000..433b0f030a
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/scenarios/multi-team-coordination.json
@@ -0,0 +1,25 @@
+{
+ "name": "multi-team-coordination",
+ "description": "Two teams running in parallel; cross-team SendMessage routes correctly.",
+ "primitives_tested": ["multi-team", "cross-team-sendmessage"],
+ "max_duration_minutes": 1,
+ "deterministic": true,
+ "setup": {
+ "fixtures": "../fixtures/multi-team-coordination",
+ "seed": 42,
+ "git_init": true
+ },
+ "teammates": [
+ {"name": "alpha-lead", "head": "Builder", "task_ids": ["alpha-1"], "team": "alpha"},
+ {"name": "alpha-builder", "head": "Builder", "task_ids": ["alpha-2"], "team": "alpha"},
+ {"name": "beta-lead", "head": "Builder", "task_ids": ["beta-1"], "team": "beta"},
+ {"name": "beta-builder", "head": "Builder", "task_ids": ["beta-2"], "team": "beta"}
+ ],
+ "expected": {
+ "tasks_completed": 4,
+ "files_present": ["team-alpha-output.txt", "team-beta-output.txt", "inboxes/beta/beta-lead.json"],
+ "messages_routed": [
+ {"from": "alpha-lead", "to": "beta-lead", "team": "beta"}
+ ]
+ }
+}
diff --git a/plugins/swarm-orchestrator/tests/swarming/scenarios/respawn-on-crash.json b/plugins/swarm-orchestrator/tests/swarming/scenarios/respawn-on-crash.json
new file mode 100644
index 0000000000..196008ecab
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/scenarios/respawn-on-crash.json
@@ -0,0 +1,26 @@
+{
+ "name": "respawn-on-crash",
+ "description": "Kill a teammate's process; verify meta-supervisor respawns and the work completes.",
+ "primitives_tested": ["respawn-on-crash", "meta-supervisor"],
+ "max_duration_minutes": 1,
+ "deterministic": true,
+ "setup": {
+ "fixtures": "../fixtures/respawn-on-crash",
+ "seed": 42,
+ "git_init": true
+ },
+ "teammates": [
+ {"name": "flaky-builder", "head": "Builder", "task_ids": ["work"]}
+ ],
+ "inject": {
+ "crashes": 1
+ },
+ "expected": {
+ "tasks_completed": 1,
+ "respawn_count_min": 1,
+ "files_present": ["respawned_output.txt"],
+ "file_contains": [
+ {"path": "respawned_output.txt", "substring": "succeeded"}
+ ]
+ }
+}
diff --git a/plugins/swarm-orchestrator/tests/swarming/scenarios/scan-build-review.json b/plugins/swarm-orchestrator/tests/swarming/scenarios/scan-build-review.json
new file mode 100644
index 0000000000..18b1d750f9
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/scenarios/scan-build-review.json
@@ -0,0 +1,34 @@
+{
+ "name": "scan-build-review",
+ "description": "Scanner files tasks from a sample repo, Builder implements them, Reviewer approves; tests heads architecture end-to-end.",
+ "primitives_tested": ["heads-end-to-end", "dag-dependency"],
+ "max_duration_minutes": 1,
+ "deterministic": true,
+ "setup": {
+ "fixtures": "../fixtures/scan-build-review",
+ "seed": 42,
+ "git_init": true
+ },
+ "teammates": [
+ {"name": "scanner-1", "head": "Scanner", "task_ids": ["scan"]},
+ {"name": "builder-1", "head": "Builder", "task_ids": ["build-1", "build-2", "build-3", "build-4", "build-5"]},
+ {"name": "reviewer-1", "head": "Reviewer", "task_ids": ["review"]}
+ ],
+ "expected": {
+ "tasks_completed": 7,
+ "files_present": [
+ "tasks.json",
+ "review.log",
+ "sample/feature_01.txt",
+ "sample/feature_05.txt"
+ ],
+ "file_contains": [
+ {"path": "sample/feature_01.txt", "substring": "DONE"},
+ {"path": "sample/feature_05.txt", "substring": "DONE"},
+ {"path": "review.log", "substring": "approved:feature_01.txt"}
+ ],
+ "file_absent_substring": [
+ {"path": "sample/feature_01.txt", "substring": "TODO"}
+ ]
+ }
+}
diff --git a/plugins/swarm-orchestrator/tests/swarming/scenarios/spec-impl-pair.json b/plugins/swarm-orchestrator/tests/swarming/scenarios/spec-impl-pair.json
new file mode 100644
index 0000000000..04dbb4ee1b
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/scenarios/spec-impl-pair.json
@@ -0,0 +1,28 @@
+{
+ "name": "spec-impl-pair",
+ "description": "One teammate writes pytest, another implements; tests DAG dependency (impl blocked-by spec).",
+ "primitives_tested": ["dag-dependency"],
+ "max_duration_minutes": 1,
+ "deterministic": true,
+ "setup": {
+ "fixtures": "../fixtures/spec-impl-pair",
+ "seed": 42,
+ "git_init": true
+ },
+ "teammates": [
+ {"name": "spec-writer", "head": "Builder", "task_ids": ["spec"]},
+ {"name": "impl-writer", "head": "Builder", "task_ids": ["impl"]}
+ ],
+ "tasks": [
+ {"id": "spec", "subject": "write pytest for increment()", "head": "Builder"},
+ {"id": "impl", "subject": "implement increment()", "depends_on": ["spec"], "head": "Builder"}
+ ],
+ "expected": {
+ "tasks_completed": 2,
+ "files_present": ["test_increment.py", "increment.py"],
+ "file_contains": [
+ {"path": "test_increment.py", "substring": "test_increment"},
+ {"path": "increment.py", "substring": "def increment"}
+ ]
+ }
+}
diff --git a/plugins/swarm-orchestrator/tests/swarming/schema/scenario.schema.json b/plugins/swarm-orchestrator/tests/swarming/schema/scenario.schema.json
new file mode 100644
index 0000000000..60738c436e
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/schema/scenario.schema.json
@@ -0,0 +1,204 @@
+{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "$id": "https://github.com/kushalj1997/claude-swarm/scenario.schema.json",
+ "title": "Swarm scenario",
+ "description": "A binding-agnostic toy swarm test. Same JSON runs against the Anthropic Teams binding (claude-code plugin), the standalone claude-swarm CLI, and our internal claude_swarm. See tests/swarming/README.md.",
+ "type": "object",
+ "required": [
+ "name",
+ "description",
+ "primitives_tested",
+ "max_duration_minutes",
+ "deterministic",
+ "setup",
+ "teammates",
+ "expected"
+ ],
+ "additionalProperties": false,
+ "properties": {
+ "name": {
+ "type": "string",
+ "description": "kebab-case unique scenario id; matches the scenario JSON filename without .json",
+ "pattern": "^[a-z][a-z0-9-]*$"
+ },
+ "description": {
+ "type": "string",
+ "minLength": 1
+ },
+ "primitives_tested": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "type": "string",
+ "enum": [
+ "file-overlap-reject",
+ "atomic-merge",
+ "dag-dependency",
+ "heads-end-to-end",
+ "parallel-dispatch",
+ "cross-teammate-independence",
+ "meta-supervisor",
+ "merge-rebase",
+ "abort-marker",
+ "respawn-on-crash",
+ "multi-team",
+ "cross-team-sendmessage"
+ ]
+ }
+ },
+ "max_duration_minutes": {
+ "type": "number",
+ "exclusiveMinimum": 0,
+ "maximum": 5,
+ "description": "Hard ceiling. Runners SHOULD enforce as a wall-clock timeout."
+ },
+ "deterministic": {
+ "type": "boolean",
+ "description": "If true, fixed seeds + sorted iteration order; runners assert reproducibility."
+ },
+ "setup": {
+ "type": "object",
+ "required": ["fixtures", "seed"],
+ "additionalProperties": false,
+ "properties": {
+ "fixtures": {
+ "type": "string",
+ "description": "Path (relative to scenarios/.json's directory) to a fixtures dir. Runner copies into a tmp workspace before run."
+ },
+ "seed": {
+ "type": "integer",
+ "minimum": 0
+ },
+ "git_init": {
+ "type": "boolean",
+ "default": true,
+ "description": "If true, runner runs `git init` + initial commit on the fixture copy."
+ },
+ "extra": {
+ "type": "object",
+ "description": "Free-form scenario-specific setup knobs. Documented in the scenario's README block."
+ }
+ }
+ },
+ "teammates": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "type": "object",
+ "required": ["name", "head", "task_ids"],
+ "additionalProperties": false,
+ "properties": {
+ "name": { "type": "string", "pattern": "^[a-z][a-z0-9-]*$" },
+ "head": {
+ "type": "string",
+ "enum": [
+ "Builder",
+ "Reviewer",
+ "Scanner",
+ "Merger",
+ "TestRunner",
+ "Auditor"
+ ]
+ },
+ "task_ids": {
+ "type": "array",
+ "items": { "type": "string" },
+ "minItems": 0
+ },
+ "team": {
+ "type": "string",
+ "description": "Optional team name for multi-team scenarios. Defaults to '-default'."
+ }
+ }
+ }
+ },
+ "tasks": {
+ "type": "array",
+ "description": "Optional explicit task list. If absent, fixtures imply the tasks (one per file under fixtures/, etc.).",
+ "items": {
+ "type": "object",
+ "required": ["id", "subject"],
+ "additionalProperties": false,
+ "properties": {
+ "id": { "type": "string" },
+ "subject": { "type": "string" },
+ "depends_on": {
+ "type": "array",
+ "items": { "type": "string" }
+ },
+ "head": { "type": "string" },
+ "payload": { "type": "object" }
+ }
+ }
+ },
+ "inject": {
+ "type": "object",
+ "description": "Optional fault-injection knobs (abort marker drops, simulated crashes, conflict introductions).",
+ "additionalProperties": true,
+ "properties": {
+ "abort_after_seconds": { "type": "number", "minimum": 0 },
+ "kill_teammate": { "type": "string" },
+ "introduce_conflict_after_seconds": { "type": "number", "minimum": 0 }
+ }
+ },
+ "expected": {
+ "type": "object",
+ "description": "Post-run state assertions; runner enforces all that apply.",
+ "additionalProperties": false,
+ "properties": {
+ "tasks_completed": { "type": "integer", "minimum": 0 },
+ "tasks_failed": { "type": "integer", "minimum": 0 },
+ "tasks_aborted": { "type": "integer", "minimum": 0 },
+ "merge_conflicts": { "type": "integer", "minimum": 0 },
+ "branches_in_master": {
+ "type": "array",
+ "items": { "type": "string" }
+ },
+ "files_present": {
+ "type": "array",
+ "items": { "type": "string" }
+ },
+ "files_absent": {
+ "type": "array",
+ "items": { "type": "string" }
+ },
+ "file_contains": {
+ "type": "array",
+ "items": {
+ "type": "object",
+ "required": ["path", "substring"],
+ "properties": {
+ "path": { "type": "string" },
+ "substring": { "type": "string" }
+ }
+ }
+ },
+ "file_absent_substring": {
+ "type": "array",
+ "items": {
+ "type": "object",
+ "required": ["path", "substring"],
+ "properties": {
+ "path": { "type": "string" },
+ "substring": { "type": "string" }
+ }
+ }
+ },
+ "messages_routed": {
+ "type": "array",
+ "items": {
+ "type": "object",
+ "required": ["from", "to"],
+ "properties": {
+ "from": { "type": "string" },
+ "to": { "type": "string" },
+ "team": { "type": "string" }
+ }
+ }
+ },
+ "abort_wip_commit_present": { "type": "boolean" },
+ "respawn_count_min": { "type": "integer", "minimum": 0 }
+ }
+ }
+ }
+}
diff --git a/plugins/swarm-orchestrator/tests/swarming/test_scenarios.py b/plugins/swarm-orchestrator/tests/swarming/test_scenarios.py
new file mode 100644
index 0000000000..96d5207388
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/swarming/test_scenarios.py
@@ -0,0 +1,26 @@
+"""pytest wrapper — every scenario JSON is a parametrized test.
+
+Designed to drop into the plugin's existing test job; failures bisect
+cleanly to the offending commit.
+"""
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+import pytest
+
+THIS_DIR = Path(__file__).resolve().parent
+sys.path.insert(0, str(THIS_DIR))
+
+from runner.harness import run_scenario # noqa: E402
+from runner.stub import InProcessScenarioEngine # noqa: E402
+
+
+SCENARIOS = sorted((THIS_DIR / "scenarios").glob("*.json"))
+
+
+@pytest.mark.parametrize("scenario_path", SCENARIOS, ids=[p.stem for p in SCENARIOS])
+def test_scenario_passes(scenario_path: Path) -> None:
+ rep = run_scenario(scenario_path, engine=InProcessScenarioEngine())
+ assert rep.ok, "FAILED: " + " | ".join(rep.failed)
diff --git a/plugins/swarm-orchestrator/tests/test_hooks.py b/plugins/swarm-orchestrator/tests/test_hooks.py
new file mode 100644
index 0000000000..1cdda8d662
--- /dev/null
+++ b/plugins/swarm-orchestrator/tests/test_hooks.py
@@ -0,0 +1,269 @@
+"""
+Tests for the swarm-orchestrator plugin hooks.
+
+These tests are stdlib-only so they run anywhere Python 3.11+ is available.
+Each test invokes the hook script as a subprocess with a synthetic Claude Code
+hook payload on stdin and asserts behavior on stdout / state files.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import pathlib
+import subprocess
+import sys
+import tempfile
+import unittest
+
+PLUGIN_ROOT = pathlib.Path(__file__).resolve().parent.parent
+HOOKS = PLUGIN_ROOT / "hooks"
+
+
+def _run_hook(script: pathlib.Path, payload: dict, env_overrides: dict | None = None) -> subprocess.CompletedProcess:
+ env = os.environ.copy()
+ if env_overrides:
+ env.update(env_overrides)
+ return subprocess.run(
+ [sys.executable, str(script)],
+ input=json.dumps(payload),
+ capture_output=True,
+ text=True,
+ env=env,
+ timeout=20,
+ )
+
+
+class TestOnTaskCompleteHook(unittest.TestCase):
+ """on_task_complete.py: cascades on TaskUpdate(status=completed/merged)."""
+
+ def setUp(self) -> None:
+ self.tmp_home = tempfile.mkdtemp()
+ self.fake_home = pathlib.Path(self.tmp_home)
+ self.teams_root = self.fake_home / ".claude" / "teams"
+ self.teams_root.mkdir(parents=True, exist_ok=True)
+
+ def _write_dag(self, team: str, dag: dict) -> None:
+ team_dir = self.teams_root / team
+ team_dir.mkdir(parents=True, exist_ok=True)
+ (team_dir / "swarm-dag.json").write_text(json.dumps(dag))
+
+ def test_no_op_for_non_taskupdate(self) -> None:
+ result = _run_hook(
+ HOOKS / "on_task_complete.py",
+ {"tool_name": "Edit", "tool_input": {}},
+ env_overrides={"HOME": str(self.fake_home)},
+ )
+ self.assertEqual(result.returncode, 0)
+ self.assertEqual(result.stdout.strip(), "")
+
+ def test_no_op_for_non_terminal_status(self) -> None:
+ result = _run_hook(
+ HOOKS / "on_task_complete.py",
+ {
+ "tool_name": "TaskUpdate",
+ "tool_input": {"task_id": "t1", "team": "demo", "status": "in_progress"},
+ },
+ env_overrides={"HOME": str(self.fake_home)},
+ )
+ self.assertEqual(result.returncode, 0)
+ self.assertEqual(result.stdout.strip(), "")
+
+ def test_cascade_unblocks_dependent_task(self) -> None:
+ self._write_dag(
+ "demo",
+ {
+ "tasks": {
+ "t1": {"status": "completed", "blockedBy": []},
+ "t2": {"status": "blocked", "blockedBy": ["t1"]},
+ "t3": {"status": "blocked", "blockedBy": ["t2"]},
+ }
+ },
+ )
+
+ result = _run_hook(
+ HOOKS / "on_task_complete.py",
+ {
+ "tool_name": "TaskUpdate",
+ "tool_input": {"task_id": "t1", "team": "demo", "status": "completed"},
+ },
+ env_overrides={"HOME": str(self.fake_home)},
+ )
+ self.assertEqual(result.returncode, 0)
+ self.assertIn("newly unblocked: t2", result.stdout)
+ # t3 should NOT be unblocked yet — t2 is still blocked.
+ self.assertNotIn("t3", result.stdout)
+
+ # Cascade event was logged.
+ events_path = self.teams_root / "demo" / "cascade-events.jsonl"
+ self.assertTrue(events_path.exists())
+ lines = events_path.read_text().strip().splitlines()
+ self.assertEqual(len(lines), 1)
+ event = json.loads(lines[0])
+ self.assertEqual(event["task_id"], "t1")
+ self.assertEqual(event["new_status"], "completed")
+ self.assertEqual(event["newly_unblocked"], ["t2"])
+
+ def test_cascade_no_unblock_when_other_blocker_still_open(self) -> None:
+ self._write_dag(
+ "demo",
+ {
+ "tasks": {
+ "t1": {"status": "completed", "blockedBy": []},
+ "t2": {"status": "in_progress", "blockedBy": []},
+ "t3": {"status": "blocked", "blockedBy": ["t1", "t2"]},
+ }
+ },
+ )
+
+ result = _run_hook(
+ HOOKS / "on_task_complete.py",
+ {
+ "tool_name": "TaskUpdate",
+ "tool_input": {"task_id": "t1", "team": "demo", "status": "completed"},
+ },
+ env_overrides={"HOME": str(self.fake_home)},
+ )
+ self.assertEqual(result.returncode, 0)
+ self.assertNotIn("t3", result.stdout)
+
+ def test_handles_missing_dag_gracefully(self) -> None:
+ result = _run_hook(
+ HOOKS / "on_task_complete.py",
+ {
+ "tool_name": "TaskUpdate",
+ "tool_input": {"task_id": "t1", "team": "no-such-team", "status": "completed"},
+ },
+ env_overrides={"HOME": str(self.fake_home)},
+ )
+ # Hook is non-blocking — it logs and exits 0 even when the DAG is missing.
+ self.assertEqual(result.returncode, 0)
+
+
+class TestReviewerCheckpointHook(unittest.TestCase):
+ """reviewer_checkpoint.py: emits a checkpoint prompt every Nth turn."""
+
+ def test_no_op_for_non_builder(self) -> None:
+ result = _run_hook(
+ HOOKS / "reviewer_checkpoint.py",
+ {"agent_type": "scanner", "turn": 12, "cwd": "/tmp/whatever"},
+ )
+ self.assertEqual(result.returncode, 0)
+ self.assertEqual(result.stdout.strip(), "")
+
+ def test_no_op_below_floor(self) -> None:
+ result = _run_hook(
+ HOOKS / "reviewer_checkpoint.py",
+ {"agent_type": "builder", "turn": 3, "cwd": "/tmp/whatever"},
+ )
+ self.assertEqual(result.returncode, 0)
+ self.assertEqual(result.stdout.strip(), "")
+
+ def test_fires_at_floor(self) -> None:
+ result = _run_hook(
+ HOOKS / "reviewer_checkpoint.py",
+ {"agent_type": "builder", "turn": 6, "cwd": "/tmp/whatever"},
+ )
+ self.assertEqual(result.returncode, 0)
+ self.assertIn("reviewer-checkpoint", result.stdout)
+
+ def test_fires_every_n_after_floor(self) -> None:
+ # turn=9 (floor=6, every_n=3): (9-6) % 3 == 0 → fires.
+ result = _run_hook(
+ HOOKS / "reviewer_checkpoint.py",
+ {"agent_type": "builder", "turn": 9, "cwd": "/tmp/whatever"},
+ )
+ self.assertEqual(result.returncode, 0)
+ self.assertIn("reviewer-checkpoint", result.stdout)
+
+ def test_no_op_off_cycle(self) -> None:
+ # turn=8 (floor=6, every_n=3): (8-6) % 3 == 2 → no-op.
+ result = _run_hook(
+ HOOKS / "reviewer_checkpoint.py",
+ {"agent_type": "builder", "turn": 8, "cwd": "/tmp/whatever"},
+ )
+ self.assertEqual(result.returncode, 0)
+ self.assertEqual(result.stdout.strip(), "")
+
+ def test_respects_disabled_config(self) -> None:
+ with tempfile.TemporaryDirectory() as tmp:
+ cwd = pathlib.Path(tmp)
+ (cwd / ".claude").mkdir()
+ (cwd / ".claude" / "swarm-orchestrator.json").write_text(
+ json.dumps({"reviewer_checkpoint": {"enabled": False}})
+ )
+ result = _run_hook(
+ HOOKS / "reviewer_checkpoint.py",
+ {"agent_type": "builder", "turn": 12, "cwd": str(cwd)},
+ )
+ self.assertEqual(result.returncode, 0)
+ self.assertEqual(result.stdout.strip(), "")
+
+ def test_respects_custom_every_n(self) -> None:
+ with tempfile.TemporaryDirectory() as tmp:
+ cwd = pathlib.Path(tmp)
+ (cwd / ".claude").mkdir()
+ (cwd / ".claude" / "swarm-orchestrator.json").write_text(
+ json.dumps({"reviewer_checkpoint": {"every_n_turns": 2, "floor": 4}})
+ )
+ # turn=4 → fires (at floor)
+ r1 = _run_hook(
+ HOOKS / "reviewer_checkpoint.py",
+ {"agent_type": "builder", "turn": 4, "cwd": str(cwd)},
+ )
+ self.assertIn("reviewer-checkpoint", r1.stdout)
+ # turn=5 → off-cycle
+ r2 = _run_hook(
+ HOOKS / "reviewer_checkpoint.py",
+ {"agent_type": "builder", "turn": 5, "cwd": str(cwd)},
+ )
+ self.assertEqual(r2.stdout.strip(), "")
+ # turn=6 → fires (every_n=2)
+ r3 = _run_hook(
+ HOOKS / "reviewer_checkpoint.py",
+ {"agent_type": "builder", "turn": 6, "cwd": str(cwd)},
+ )
+ self.assertIn("reviewer-checkpoint", r3.stdout)
+
+
+class TestPluginManifest(unittest.TestCase):
+ """plugin.json should be valid + present."""
+
+ def test_manifest_exists_and_parses(self) -> None:
+ manifest = PLUGIN_ROOT / ".claude-plugin" / "plugin.json"
+ self.assertTrue(manifest.exists(), f"missing manifest: {manifest}")
+ data = json.loads(manifest.read_text())
+ self.assertEqual(data["name"], "swarm-orchestrator")
+ self.assertIn("version", data)
+ self.assertIn("description", data)
+ self.assertIn("author", data)
+
+ def test_all_commands_have_frontmatter(self) -> None:
+ commands_dir = PLUGIN_ROOT / "commands"
+ for cmd in commands_dir.glob("*.md"):
+ text = cmd.read_text()
+ self.assertTrue(
+ text.startswith("---\n"),
+ f"{cmd.name}: missing YAML frontmatter",
+ )
+ # frontmatter must contain a description.
+ head = text.split("---", 2)[1]
+ self.assertIn("description:", head, f"{cmd.name}: missing description")
+
+ def test_all_agents_have_frontmatter(self) -> None:
+ agents_dir = PLUGIN_ROOT / "agents"
+ for agent in agents_dir.glob("*.md"):
+ text = agent.read_text()
+ self.assertTrue(
+ text.startswith("---\n"),
+ f"{agent.name}: missing YAML frontmatter",
+ )
+ head = text.split("---", 2)[1]
+ self.assertIn("name:", head, f"{agent.name}: missing name")
+ self.assertIn("description:", head, f"{agent.name}: missing description")
+ self.assertIn("tools:", head, f"{agent.name}: missing tools list")
+ self.assertIn("model:", head, f"{agent.name}: missing model")
+
+
+if __name__ == "__main__":
+ unittest.main()