Skip to content

Autonomous Claude Swarms -- Teams Improvement => feat(plugins): swarm-orchestrator — DAG-aware multi-tier coordination with role-typed heads for autonomous agent teams#57880

Open
kushalj1997 wants to merge 3 commits into
anthropics:mainfrom
kushalj1997:feature/swarm-orchestrator-plugin
Open

Conversation

@kushalj1997
Copy link
Copy Markdown

@kushalj1997 kushalj1997 commented May 10, 2026

Autonomous Claude Swarms and Agent Teams Improvement

This PR is intended to improve the native Agent Teams feature of Claude Code. It started as a self-built project (using Claude Code) to speed up task execution for personal projects. I noticed the native Teams Feature on Sunday, May 10, 2026, and figured this would naturally align with the development. Looking forward to iterating on this with the team, and the time is well appreciated. Leveraged Claude Code to build this project and the entire PR. Claude Code drove the integration with native Teams feature, and I drove the initial swarming DAG + meta-supervisor (multi-headed) featureset, all with Claude Code.

Summary

Claude Code has opened up a world of possibility for me. In this PR, I propose an architecture for autonomous development leveraging the Claude API. It naturally falls alongside the in-beta Teams feature native to Anthropic. Thanks for reviewing!

Adds a new plugin at plugins/swarm-orchestrator/ that layers DAG-aware multi-agent coordination on top of vanilla Teams — additive only, no changes to TaskCreate / TaskUpdate / SendMessage / Team* schemas.

The headline capability: autonomous, session-resistant swarm orchestration, backed by a DAG-aware kanban with atomic task claiming so multiple parallel workers can safely pull from the same task queue without races. When claude-swarm init --persistent is configured, the meta-supervisor daemon polls task lists, dispatches role-typed heads, gates merges, and recovers crashed teammates — driven by filesystem state, not by a live session. Tasks continue to completion even after the Claude CLI exits. Scanner heads file new tasks autonomously; Builders implement them; Mergers integrate the results. A visibility CLI (claude-swarm status, swarm-watch TUI) attaches to the daemon regardless of which session started the work.

What the dashboard looks like

The canonical minimal dashboard (scripts/swarm_dashboard.py) modeled on the native Anthropic Teams agent-list interface:

  swarm · 3/5 done · 12s · ↓ 18.4k tokens · $0.0420

  ○ scanner       Scan codebase for type-hint gaps            done          ↓   4.2k   $0.0095
  ○ builder       Refactor utils.py to add type hints         done          ↓   5.1k   $0.0118
  ● test-runner   Write tests for refactored utils            in_progress   ↓   3.8k   $0.0089
  ○ reviewer      Periodic checkpoint review                  done          ↓   2.1k   $0.0048
  ○ merger        Merge cleanly into main                     blocked       ↓      —        —

  ↑/↓ to inspect · Ctrl-C to exit

One header line — aggregate progress, runtime, tokens, cost. One row per role-typed head — status dot ( cyan in_progress, green done / dim idle / magenta blocked, red failed), name, current task, status word, per-head tokens, per-head spend.

What ships in this PR (5 deliverables):

  • claude-swarm CLI + library
    • DAG-aware kanban (SQLite WAL + BEGIN IMMEDIATE atomic claim), supervisor with parallel dispatch, 12 subcommands
    • 94 tests passing, mypy strict + ruff clean, Apache 2.0
    • The goal is to merge this library into Anthropic infrastructure https://github.com/kushalj1997/claude-swarm
    • Postgres upgrade planned — current SQLite WAL + BEGIN IMMEDIATE is single-host; next step is a Postgres adapter using SELECT FOR UPDATE SKIP LOCKED for multi-host production scale (same Kanban.claim_one() public API; only the backend swaps)
  • Session-resistant keepalive daemon
    • claude-swarm run --daemon detaches via single-fork + setsid; survives Claude Code exit + claude --resume
    • Companion daemon-status / daemon-stop for clean lifecycle
  • 6 role-typed agents + 8 slash commands
    • Scanner / Builder / Test-Runner / Reviewer / Merger / Auditor with tool-restriction frontmatter
    • /swarm-spawn, /swarm-submit, /swarm-status, /swarm-start, /swarm-stop, /swarm-merge, /swarm-abort, /swarm-test
  • Live demo + minimal TUI dashboard
    • try-swarm.sh runs a 5-task DAG in ~25 s via Haiku with auth probe + consent prompt
    • Dashboard mirrors the native Anthropic Teams agent-list (one row per head, status + per-head tokens + cost)
  • Persistent-agent state schema + global-mind transcript
    • claude_swarm/agents.py is the on-disk substrate (~/.claude/teams/<team>/agents/<name>.json) the future native Agent(..., persistent=True) would write to
    • Every supervisor dispatch is appended to JSONL — replayable for audit, observable in real time

Pertinent — Agent Tool

Two concrete native-Teams limitations this PR confronts head-on:

  • SendMessage does not surface in spawned teammates. ToolSearch select:SendMessage returns "no matching deferred tools" inside Agent-spawned subprocesses, even with the experimental flag on. Workaround: a filesystem-RPC fallback (claude_swarm/messaging.py) writes directly to ~/.claude/teams/<name>/inboxes/<recipient>.json, which Anthropic's runtime auto-delivers as a conversation turn. Same on-disk schema, drop-in compatible.
  • The Agent tool itself is not session-resistant. In-binary Agent spawns die when Claude Code exits and claude --resume cannot reattach them. This PR ships a parallel substrate (the keepalive daemon dispatching claude --print subprocesses in a detached process group) so the work survives — but making the same Agent survive requires a binary refactor (proposed in "What this PR ships, and what's next").

Where it started — a Kanban board

The seed of this project was a DAG-aware kanban board with atomic claim_one() task claiming, sitting underneath a small set of role-typed agents. Every other primitive in this PR — supervisor loop, heads architecture, auto-cascade unblock, abort-marker contract, keepalive daemon — grew outward from that one data structure. Tasks declare blockedBy edges, the supervisor polls unblocked() for the topological frontier, parallel heads pull atomically so two workers never claim the same task. Shipped backend is SQLite WAL + BEGIN IMMEDIATE; a Postgres adapter with SELECT FOR UPDATE SKIP LOCKED is the natural next step for multi-host production scale.

Slash commands reference

The plugin ships 8 slash commands at plugins/swarm-orchestrator/commands/*.md — single markdown files with YAML frontmatter declaring description / argument-hint / allowed-tools, plus the prompt body Claude executes.

Command Purpose
/swarm-spawn <goal> Decompose a goal into a DAG, propose the plan, dispatch a multi-agent team via native TeamCreate + TaskCreate
/swarm-submit <prompt> Submit a single free-form task to the keepalive swarm; daemon picks it up and dispatches via claude --print
/swarm-status [name] Show daemon liveness + DAG topology + head activity + blockers + abort markers + spend
/swarm-start Launch the keepalive supervisor as a detached daemon (single-fork + setsid); survives CLI exit
/swarm-stop Stop the keepalive daemon (SIGTERM → SIGKILL after timeout); removes PID file
/swarm-merge [name] Run the merge pipeline against completed tasks — rebase, test gate, push. Topo-orders by file overlap
/swarm-abort <head> Drop a <worktree>/.claude/abort-<head> marker; the head commits WIP, pushes, and exits cleanly
/swarm-test [name] Demo path — spawn a team via native TeamCreate + TaskCreate and populate the agent-list view with role-typed heads

claude-swarm CLI reference

The plugin's daemon + slash commands wrap the claude-swarm Python CLI (installed automatically into .swarm-venv/ by try-swarm.sh).

Command Purpose Key flags
claude-swarm init Create the swarm home directory tree --home DIR
claude-swarm submit File a task to the kanban --title --prompt --head --blocked-by --tag --priority
claude-swarm list List tasks, filtered by status / tag --status pending|in_progress|done|failed|... --tag
claude-swarm status JSON snapshot of kanban counts + supervisor state + cost --home
claude-swarm unblocked Print tasks whose blockers are all done (topological frontier) --head
claude-swarm heads List the 6 built-in role-typed heads + their default models
claude-swarm inbox send / recv Write a directed message / drain a teammate's inbox --from --to --kind --body / --name --drain
claude-swarm abort {set,clear,check} Manage <worktree>/.claude/abort-<name> markers --worktree --teammate --reason
claude-swarm merge Auto-merge pipeline (rebase + test gate + push, file-overlap reject) --repo --test-cmd --no-overlap-reject
claude-swarm run The supervisor loop — claims tasks, dispatches heads --conductor stub|claude --demo-delay-s --max-parallel --global-mind-log --daemon --max-iterations --poll-s
claude-swarm daemon-status Check daemon liveness (PID + log paths) --home
claude-swarm daemon-stop SIGTERM the daemon; SIGKILL after timeout --home --timeout-s

Dependency on the claude-swarm library

This plugin requires the standalone claude-swarm Python library (Apache 2.0, 94 tests, mypy strict + ruff clean, ~3 KLOC) for its functional behavior. The plugin's static surfaces (manifest, agent frontmatter, hook scripts, examples, design doc) are self-contained markdown + Python, but the orchestration runtime — kanban with atomic claim, supervisor loop, keepalive daemon, parallel dispatch, global-mind transcript, persistent-agent state schema, dashboard — all live in claude-swarm. The plugin's slash commands shell out to claude-swarm CLI.

The library installs automatically when the operator runs try-swarm.sh (pip install into a sandboxed .swarm-venv/ inside the plugin directory).

Three integration paths Anthropic can pick from:

  1. Accept the dependency as-istry-swarm.sh continues to pip install from GitHub @ main. Lowest-touch.
  2. Vendor claude_swarm/ into plugins/swarm-orchestrator/lib/ — ~3 KLOC additional inside claude-code, no external dep. Happy to prepare the follow-up PR.
  3. Publish claude-swarm to PyPI under Anthropic's namespace (or accept a transfer) — install becomes pip install claude-swarm with formal versioning.

Try it

One command (no prerequisites beyond the claude CLI on PATH):

bash plugins/swarm-orchestrator/scripts/try-swarm.sh

The script:

  1. Creates .swarm-venv/ inside the plugin directory
  2. pip installs claude-swarm + rich (force-reinstall on every run, so a stale venv from a previous demo never silently uses old code)
  3. Probes auth with a tiny Haiku ping. On failure, prints two clear next-step paths (interactive login via Pro/Max/Team plan, or ANTHROPIC_API_KEY env var) and exits without dispatching anything
  4. Asks for explicit yes consent before any LLM dispatch
  5. Submits a 5-task DAG (Scanner → Builder → {Reviewer, Test-Runner} → Merger), runs the supervisor with --max-parallel 3, launches the dashboard
  6. Auto-exits when all tasks finish (~15–25 s with the default Haiku conductor). Cleanup trap covers EXIT INT TERM — no orphan subprocesses on Ctrl + C

Two non-default modes:

bash plugins/swarm-orchestrator/scripts/try-swarm.sh --stub        # ~20s, no LLM dispatch — free smoke test
bash plugins/swarm-orchestrator/scripts/try-swarm.sh --keepalive   # supervisor runs as a detached daemon

How to review this PR

The diff is partitioned cleanly. Suggested order:

  1. plugins/swarm-orchestrator/plugin.json + .claude-plugin/marketplace.json — surface area
  2. plugins/swarm-orchestrator/agents/*.md (6 files) — role-typed heads with tool allowlists
  3. plugins/swarm-orchestrator/commands/*.md (8 files) — operator entrypoints
  4. plugins/swarm-orchestrator/hooks/{on_task_complete,reviewer_checkpoint}.py — DAG cascade + checkpoint
  5. plugins/swarm-orchestrator/scripts/{try-swarm.sh,swarm_dashboard.py} — demo + dashboard
  6. plugins/swarm-orchestrator/tests/ — 25 / 25 tests (15 hook + 10 scenario), no LLM dispatch required
  7. plugins/swarm-orchestrator/IMPROVEMENTS_OVER_VANILLA_TEAMS.md — design proposal + roadmap

Native Teams limitations addressed (the motivation)

The plugin treats current native Anthropic Teams as the substrate (Agent, TeamCreate, TaskCreate, SendMessage, TaskUpdate, idle/wake) and layers DAG-aware orchestration on top. Every gap below is addressed by shipped code in this PR — nothing aspirational. The full design doc + items deferred to follow-up PRs is in plugins/swarm-orchestrator/IMPROVEMENTS_OVER_VANILLA_TEAMS.md.

# Limitation This PR's response
1 SendMessage doesn't surface in spawned teammates (ToolSearch select:SendMessage returns nothing inside Agent-spawned subprocesses) Filesystem-RPC fallback (claude_swarm/messaging.py) writes directly to ~/.claude/teams/<name>/inboxes/<recipient>.json — same on-disk schema Anthropic's auto-delivery already reads
2 Native Agent spawns die on CLI exit; claude --resume can't reattach them Keepalive supervisor daemon (--daemon flag, single-fork + setsid). Workers dispatched as claude --print subprocesses in a different process group; the work survives even though the in-binary Agent doesn't
3 No DAG iterator — blockedBy is settable but no unblocked() topological iterator or auto-cascade Kanban.unblocked() topological iterator + PostToolUse(TaskUpdate) hook auto-cascades on status=completed
4 One generic worker agent type, no role-typed heads 6 role-typed heads (Scanner / Builder / Test-Runner / Reviewer / Merger / Auditor) with tool-restriction frontmatter
5 No abort contract — interrupting an Agent loses WIP AbortMarker (claude_swarm/abort.py) — drop <worktree>/.claude/abort-<head>, the head commits WIP + exits cleanly. Verified end-to-end during PR development.
6 No bounded inbox — a stuck recipient grows the JSON inbox without limit Bounded Inbox (claude_swarm/messaging.py) caps at max_messages (default 1000); drop-oldest with logged warning
7 No atomic file writes — Anthropic's task / inbox writes risk torn state on crash tmp-file + os.replace (claude_swarm/messaging.py _atomic_write) — applied to every task / inbox / status write
8 No atomic task claim — two parallel workers could double-claim Kanban.claim_one() uses SQLite BEGIN IMMEDIATE; tested under simulated concurrency
9 No conductor timeouts — a stuck LLM call hangs the supervisor indefinitely 600 s default per-dispatch timeout (claude_swarm/conductor.py); merge-pipeline test commands at 900 s
10 No worktree garbage collection — branches merged into main leave orphan worktrees WorktreeManager.gc_stale() (claude_swarm/worktree.py) — removes worktrees whose branch is merged upstream
11 No replayable record of supervisor decisions — state is ephemeral, lost on exit --global-mind-log flag emits JSONL per supervisor step (turn, task_id, head, status, elapsed_s, cost_so_far_usd)
12 No per-head cost / token accounting in the team-list UI Dashboard renders per-head ↓ tokens / $cost columns from spend_by_head / tokens_by_head
13 No filesystem-backed agent registry — claude --resume doesn't restore prior agents claude_swarm.agents.AgentState schema + read/write helpers at ~/.claude/teams/<team>/agents/<name>.json — the substrate a future native Agent(..., persistent=True) would write to

Authentication — how the demo gets credentials

The demo dispatches via claude --print and leverages whatever auth the operator's claude binary already uses. It does NOT need its own API key plumbing. The script verifies this with a tiny Haiku probe before doing anything else; failure prints two concrete next-steps and exits cleanly.

The claude binary resolves auth in this priority order:

# Source Typical user
1 ANTHROPIC_API_KEY env var Developer / CI / no Max plan
2 apiKeyHelper via --settings Org-managed credential rotation
3 OAuth tokens in macOS Keychain (service: Claude Code-credentials) Pro / Max / Team plan after claude interactive login

Nothing is stored in the plugin tree. The plugin is keyless.

Plugin surface

  • DAG-aware kanban — atomic claim_one() for race-free task claiming under N parallel workers. unblocked() topological iterator, add_blocked_by / add_blocks mutation, full status timeline, auto-unblock cascade via PostToolUse(TaskUpdate) hook
  • 8 slash commands + 6 role-typed heads with role-specific tool allowlists
  • 2 lifecycle hooks: PostToolUse(TaskUpdate) for DAG cascade dispatch, Stop for periodic reviewer checkpoints
  • Abort-marker contract at <worktree>/.claude/abort-<head-name> for graceful interrupt
  • Testing substrate: 10 binding-agnostic scenarios + 15 hook unit tests, 25 / 25 pass via stdlib unittest (no third-party deps)
  • Marketplace registration in .claude-plugin/marketplace.json + the plugins/README.md table

Self-healing properties

Designed around the assumption that any individual teammate may die mid-task. Below is the honest split between shipped + deferred.

Shipped in this PR:

  • Abort-marker contract (claude_swarm/abort.py) — verified end-to-end during PR development
  • Session-resistant supervisor daemon (--daemon flag) — single-fork + setsid + stdio redirect; survives CLI exit + claude --resume + terminal close
  • Bounded inbox queues (claude_swarm/messaging.py) — drop-oldest with logged warning; no runaway memory
  • Atomic file writes — tmp + os.replace everywhere
  • Worktree garbage collection — removes branches merged into upstream; safe by default
  • Conductor timeouts — 600 s LLM dispatch, 900 s merge-pipeline tests
  • Filesystem-RPC fallback for SendMessage — works when the native tool doesn't surface
  • Atomic task claim — SQLite BEGIN IMMEDIATE prevents double-claim under concurrency

Designed but deferred to a follow-up PR:

  • Multi-host meta-supervisor with auto-respawn for crashed teammates
  • Stuck-task watchdog (in_progress > 30 min → re-dispatch) — depends on multi-host coordinator
  • Pattern-detection classifier — log every supervisor decision; offline-train on task_description → parallelism_safety. Logging is shipped (global-mind transcript); classifier is future work
  • Multi-team-per-session — requires TeamCreate schema extension on Anthropic's side

Session-resistance — the keepalive daemon

claude-swarm run --daemon detaches the supervisor: single-fork + setsid + stdio redirect to log file. Parent prints PID + log path then exits; child keeps polling the kanban and dispatching workers regardless of what happens to the shell, the Claude Code session, or the terminal.

# Start the daemon
claude-swarm run --home ~/.claude/swarm --daemon --conductor claude \
    --global-mind-log ~/.claude/swarm/global-mind.jsonl

# Submit a task at any time
claude-swarm submit --home ~/.claude/swarm --head builder \
    --title "audit imports" --prompt "Audit ./src for unused imports."

# Inspect daemon liveness
claude-swarm daemon-status --home ~/.claude/swarm    # exit 0 alive, 1 dead

# Stop cleanly
claude-swarm daemon-stop --home ~/.claude/swarm      # SIGTERM, SIGKILL after 5s

Bridging native Teams agents to the keepalive daemon. A native Agent (spawned by the binary's Agent tool) can register long-running work with the daemon and exit cleanly — without doing the work itself:

# Inside the native agent's prompt (Bash):
task_id = subprocess.check_output([
    "claude-swarm", "submit",
    "--home", "~/.claude/swarm",
    "--head", "builder",
    "--prompt", "<long-running prompt>",
]).decode().strip()
# Agent exits; daemon picks up the task and runs it via `claude --print`.

The native agent is the front-end (interactive, in your CLI, dies with the binary); the daemon-dispatched worker is the back-end (session-resistant, runs detached). They share the kanban + inbox as the contract.

Operator-facing test plan — session-resistance proof

# 1. Open a fresh Claude Code session in any project
claude
# 2. In the session, start the keepalive daemon
/swarm-start
# 3. Submit a small task (sleeps 30s, writes a proof file)
/swarm-submit "sleep 30 && echo 'still alive after $(date)' > /tmp/swarm-keepalive-proof.txt" --head builder
# 4. EXIT Claude Code
/exit
# 5. Run claude --resume to restore the session
claude --resume
# 6. Verify daemon survived
/swarm-status        # should still show alive with same PID
# 7. Confirm the proof file was written while you were out
cat /tmp/swarm-keepalive-proof.txt
# 8. Stop the daemon
/swarm-stop

Global mind — the swarm's collective transcript

Every supervisor dispatch is appended to <home>/global-mind.jsonl (controlled via claude-swarm run --global-mind-log <path>). One line per event:

{"ts": 1778447590.33, "turn": 1, "task_id": "0019e1...", "head": "scanner", "title": "Scan codebase", "status": "done", "elapsed_s": 0.024, "cost_so_far_usd": 0.0}
{"ts": 1778447590.39, "turn": 2, "task_id": "0019e1...", "head": "builder", "title": "Refactor utils.py", "status": "done", "elapsed_s": 0.060, "cost_so_far_usd": 0.0}

The transcript is replayable (cat global-mind.jsonl | jq .), streamable (tail -f is a live feed any dashboard can subscribe to), and observable (cost accounting, latency tracking, per-head spend rollups all derive from it).

Architecture stack

Layer Choice Why
CLI framework click Same as pip, flask, mkdocs — familiar conventions
TUI / live dashboard rich Live renderer + Text styling
Web / RPC (optional persistent mode) FastAPI Python-native, no extra runtime; for downstream multi-host meta-supervisors with WebSocket pub/sub
Concurrency stdlib asyncio + subprocess Supervisor is single-writer by design
Task storage (library) sqlite3 with WAL + BEGIN IMMEDIATE Atomic claim without external services. Postgres adapter (SELECT FOR UPDATE SKIP LOCKED) is the planned next step.
Type checking / lint mypy --strict + ruff Clean across library + plugin
Tests pytest (library, 94 tests) + stdlib unittest (plugin, 25 tests) Both green

No deep-learning frameworks, no databases beyond optional SQLite/Postgres, no message brokers in the default install.

What this PR ships, and what's next

The 5 deliverables are summarized at the top of this PR. Here's the honest follow-up roadmap:

Designed but deferred to a follow-up PR (not shipped here):

  • Multi-host meta-supervisor — multiple keepalive daemons across hosts coordinating via a shared task graph; cross-machine SendMessage. Single-host keepalive shipped here is the foundation
  • Dead-teammate respawn — depends on the multi-supervisor coordinator
  • Stuck-task watchdog — same dependency; status timeline already records claimed_at
  • Pattern-detection classifier — log every supervisor decision; offline-train on task_description → parallelism_safety. Logging is shipped; classifier is future work
  • The next concrete native-Teams refactor — a small Agent(..., persistent=True) flag + filesystem-backed reattachment on claude --resume. The on-disk schema this PR's daemon already uses (~/.claude/teams/<team>/agents/<name>.json) is the schema the future native flag would write to — zero migration when the binary catches up.

Known limitation — read first

This PR ships session-resistance for the WORK (via the keepalive daemon + filesystem kanban + persistent-agent state schema). It does NOT ship session-resistance for the native Agent tool's team-list UI — that surface requires changes inside the claude-code binary itself.

  • You exit Claude Code, run claude --resumedaemon still alive, dispatched workers still running, kanban state intact, global-mind transcript continues appending. ✅
  • You exit Claude Code, run claude --resumenative in-binary Agent spawns are gone from the team-list UI. ❌ Architectural limit of the current Agent tool — the proposed binary-side fix is in "What this PR ships, and what's next".

Deferred polish — to land after review

Two demo-only conveniences this PR keeps project-local rather than canonical, to keep the review diff scoped:

  • Demo venv location. try-swarm.sh creates .swarm-venv/ inside plugins/swarm-orchestrator/. Once production-ready, it should move to ~/.claude/claude-swarm/venv/ (matching the keepalive daemon's home at ~/.claude/swarm/). Keeping it local during review keeps install + cleanup symmetric with the rest of the plugin tree.
  • Demo model selection. The CLI's claude-swarm run --conductor claude forces claude-haiku-4-5 for every head so the demo finishes in <30 s at minimal token spend. Production callers wire model_override=None (Python API) or CLAUDE_SWARM_MODEL_OVERRIDE="" (env var) to use each head's role-default.

Breaking changes

None. No changes to TaskCreate / TaskUpdate / SendMessage / Team* schemas. No changes to existing plugins. No changes to vanilla Teams behavior. New hooks only fire when the plugin is installed AND the session matches a swarm context.

Test plan

Library (github.com/kushalj1997/claude-swarm):

  • 94 / 94 tests pass (kanban, supervisor, conductor, abort marker, worktree manager, merge pipeline, messaging, reviewer checkpoint, agents)
  • mypy --strict clean + ruff check clean

Plugin (plugins/swarm-orchestrator/):

  • 15 / 15 hook unit tests passpython3 -m unittest tests.test_hooks -v
  • 10 / 10 scenario tests passpython3 tests/swarming/run_scenario.py --all
  • Plugin manifest validates against the marketplace schema
  • Every command + agent file has well-formed YAML frontmatter
  • Hooks exit 0 on malformed stdin (no-op for non-TaskUpdate / non-Builder events)

Demo verification (operator-facing try-swarm.sh path):

  • --stub mode completes in ~20 s, no LLM dispatch, no auth prompt
  • Default mode presents preflight banner + explicit consent prompt; refusing exits cleanly without dispatch
  • Auto-exits when kanban shows done ≥ 1 and pending = in_progress = 0

Maintainer review:

  • claude plugin validate plugins/swarm-orchestrator/
  • CI workflows pass on the PR branch

License

Open to your guidance. This plugin directory does not currently include a LICENSE file — matching the convention I observed across the existing 14 plugins in plugins/ (none ship per-plugin licenses; the repo's top-level LICENSE.md applies). The companion standalone library at https://github.com/kushalj1997/claude-swarm ships under Apache 2.0 explicitly.

CLA

Will sign Anthropic's standard CLA on submission if required.

References

  • Plugin structure follows plugins/feature-dev/ (commands + agents) and plugins/hookify/ (hooks layout)
  • DAG dispatch primitive is built on the existing blockedBy field of TaskCreate (no schema change)
  • Reviewer-checkpoint hook conceptually similar to plugins/learning-output-style/'s SessionStart injection but fires on Stop against turn-count config

@kushalj1997 kushalj1997 force-pushed the feature/swarm-orchestrator-plugin branch from 6c81c49 to 969cb94 Compare May 10, 2026 20:19
@kushalj1997 kushalj1997 changed the title feat(plugins): swarm-orchestrator — DAG-aware multi-tier coordination with role-typed heads for autonomous agent teams Autonomous Claude Swarms => feat(plugins): swarm-orchestrator — DAG-aware multi-tier coordination with role-typed heads for autonomous agent teams May 10, 2026
@kushalj1997 kushalj1997 force-pushed the feature/swarm-orchestrator-plugin branch from 9f516c4 to bc75369 Compare May 10, 2026 21:26
@kushalj1997 kushalj1997 marked this pull request as ready for review May 10, 2026 22:10
@kushalj1997 kushalj1997 force-pushed the feature/swarm-orchestrator-plugin branch 2 times, most recently from 2efddd3 to 959482b Compare May 10, 2026 22:40
@kushalj1997 kushalj1997 changed the title Autonomous Claude Swarms => feat(plugins): swarm-orchestrator — DAG-aware multi-tier coordination with role-typed heads for autonomous agent teams Autonomous Claude Swarms -- Teams Improvement => feat(plugins): swarm-orchestrator — DAG-aware multi-tier coordination with role-typed heads for autonomous agent teams May 10, 2026
@kushalj1997 kushalj1997 force-pushed the feature/swarm-orchestrator-plugin branch from 959482b to 389f885 Compare May 10, 2026 22:50
A new plugin that layers DAG-aware multi-agent coordination on top of
native Anthropic Teams — additive only, no schema changes to TaskCreate
/ TaskUpdate / SendMessage / Team*.

Plugin surface (purely additive):
- 8 slash commands: /swarm-spawn, /swarm-submit, /swarm-status,
  /swarm-start, /swarm-stop, /swarm-merge, /swarm-abort, /swarm-test
- 6 role-typed subagents with tool-restriction frontmatter:
  Scanner, Builder, Test-Runner, Reviewer, Merger, Auditor
- 2 lifecycle hooks: PostToolUse(TaskUpdate) for DAG cascade dispatch,
  Stop for periodic reviewer checkpoints
- 3 worked examples: refactor, feature+review, multi-day audit
- Marketplace registration in .claude-plugin/marketplace.json plus
  a row in plugins/README.md

Designed to operate in two modes: standalone (lightweight kanban +
JSON inbox routing, no Anthropic Teams required) or integrated with
native Teams (where SendMessage / TaskCreate / TeamCreate already
exist and the plugin layers DAG iterators + role-typed heads on top).
Testing substrate at plugins/swarm-orchestrator/tests/swarming/:
- 10 self-contained, deterministic, sub-5-minute toy scenarios
  exercising every primitive: multi-file-rename, spec-impl-pair,
  scan-build-review, doc-writer-team, multi-language-port,
  audit-then-fix, conflict-resolution-drill, abort-marker-test,
  respawn-on-crash, multi-team-coordination
- Canonical scenario JSON schema at schema/scenario.schema.json
- Binding-agnostic ScenarioEngine protocol + reference
  InProcessScenarioEngine
- 15 hook unit tests for the cascade + checkpoint hooks
- All tests pass: 10/10 scenarios + 15/15 hook tests, no LLM dispatch
  (every scenario test runs against a deterministic in-process
  reference engine, so the test suite costs zero tokens)

Design proposal at IMPROVEMENTS_OVER_VANILLA_TEAMS.md:
- 45 documented primitives across core / reliability / observability
  / coordination / quality / docs / advanced layers
- Multi-tier architecture: supervisor per team, meta-supervisor
  per host, head-typed agents
- Explicit shipped-vs-deferred split (PR description's tables remain
  authoritative; design doc sketches the full target surface)
- Comparison table vs vanilla Teams
…oard

scripts/try-swarm.sh — canonical 'try it' entrypoint:
- Three modes: --stub (free smoke test, ~20s), default real-LLM
  dispatch (~15-25s via Haiku with one-word prompts; explicit consent
  prompt), and --keepalive (supervisor runs as a detached daemon that
  survives CLI exit + 'claude --resume')
- Preflight: checks the 'claude' CLI is installed + probes auth with a
  tiny Haiku ping. Failure prints two clear next-step paths (interactive
  login via Pro/Max/Team plan, or ANTHROPIC_API_KEY env var) and exits
  cleanly without dispatching anything
- Force-reinstalls claude-swarm on every run so stale venv state from
  a previous demo can't silently use older code missing the latest flags
- Parallel dispatch (3 tasks concurrent) so multiple in-progress rows
  render simultaneously — the 'live' demo feel
- DAG ordering: scanner → builder → {reviewer, test-runner} → merger,
  so reviewer reviews the build's output (not running in parallel from
  t=0); test-runner runs in parallel with reviewer after builder
- Cleanup trap covers EXIT, SIGINT (Ctrl+C), SIGTERM — pkill -P on the
  supervisor's children, 2s grace, then SIGKILL escalation. No orphans.
- At exit, points to supervisor.log, global-mind.jsonl (every dispatch
  + cost increment as JSONL — the swarm's collective transcript), and
  the kanban cascade-events.jsonl

scripts/swarm_dashboard.py — minimal TUI:
- Modeled on the native Anthropic Teams agent-list interface
- One header line: progress, runtime, tokens, aggregate cost
- One row per role-typed head: status dot (●/○/✗), name, current
  task, status word, per-head tokens, per-head spend
- Reads from claude-swarm list (plain text + optional --json), so it
  works against any installed version of the library

The 'global-mind' framing: the JSONL transcript is the swarm's
collective state-of-the-world, replayable for audit and observable
in real time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant