Autonomous Claude Swarms -- Teams Improvement => feat(plugins): swarm-orchestrator — DAG-aware multi-tier coordination with role-typed heads for autonomous agent teams#57880
Open
kushalj1997 wants to merge 3 commits into
Conversation
6c81c49 to
969cb94
Compare
9f516c4 to
bc75369
Compare
2efddd3 to
959482b
Compare
959482b to
389f885
Compare
A new plugin that layers DAG-aware multi-agent coordination on top of native Anthropic Teams — additive only, no schema changes to TaskCreate / TaskUpdate / SendMessage / Team*. Plugin surface (purely additive): - 8 slash commands: /swarm-spawn, /swarm-submit, /swarm-status, /swarm-start, /swarm-stop, /swarm-merge, /swarm-abort, /swarm-test - 6 role-typed subagents with tool-restriction frontmatter: Scanner, Builder, Test-Runner, Reviewer, Merger, Auditor - 2 lifecycle hooks: PostToolUse(TaskUpdate) for DAG cascade dispatch, Stop for periodic reviewer checkpoints - 3 worked examples: refactor, feature+review, multi-day audit - Marketplace registration in .claude-plugin/marketplace.json plus a row in plugins/README.md Designed to operate in two modes: standalone (lightweight kanban + JSON inbox routing, no Anthropic Teams required) or integrated with native Teams (where SendMessage / TaskCreate / TeamCreate already exist and the plugin layers DAG iterators + role-typed heads on top).
Testing substrate at plugins/swarm-orchestrator/tests/swarming/: - 10 self-contained, deterministic, sub-5-minute toy scenarios exercising every primitive: multi-file-rename, spec-impl-pair, scan-build-review, doc-writer-team, multi-language-port, audit-then-fix, conflict-resolution-drill, abort-marker-test, respawn-on-crash, multi-team-coordination - Canonical scenario JSON schema at schema/scenario.schema.json - Binding-agnostic ScenarioEngine protocol + reference InProcessScenarioEngine - 15 hook unit tests for the cascade + checkpoint hooks - All tests pass: 10/10 scenarios + 15/15 hook tests, no LLM dispatch (every scenario test runs against a deterministic in-process reference engine, so the test suite costs zero tokens) Design proposal at IMPROVEMENTS_OVER_VANILLA_TEAMS.md: - 45 documented primitives across core / reliability / observability / coordination / quality / docs / advanced layers - Multi-tier architecture: supervisor per team, meta-supervisor per host, head-typed agents - Explicit shipped-vs-deferred split (PR description's tables remain authoritative; design doc sketches the full target surface) - Comparison table vs vanilla Teams
…oard
scripts/try-swarm.sh — canonical 'try it' entrypoint:
- Three modes: --stub (free smoke test, ~20s), default real-LLM
dispatch (~15-25s via Haiku with one-word prompts; explicit consent
prompt), and --keepalive (supervisor runs as a detached daemon that
survives CLI exit + 'claude --resume')
- Preflight: checks the 'claude' CLI is installed + probes auth with a
tiny Haiku ping. Failure prints two clear next-step paths (interactive
login via Pro/Max/Team plan, or ANTHROPIC_API_KEY env var) and exits
cleanly without dispatching anything
- Force-reinstalls claude-swarm on every run so stale venv state from
a previous demo can't silently use older code missing the latest flags
- Parallel dispatch (3 tasks concurrent) so multiple in-progress rows
render simultaneously — the 'live' demo feel
- DAG ordering: scanner → builder → {reviewer, test-runner} → merger,
so reviewer reviews the build's output (not running in parallel from
t=0); test-runner runs in parallel with reviewer after builder
- Cleanup trap covers EXIT, SIGINT (Ctrl+C), SIGTERM — pkill -P on the
supervisor's children, 2s grace, then SIGKILL escalation. No orphans.
- At exit, points to supervisor.log, global-mind.jsonl (every dispatch
+ cost increment as JSONL — the swarm's collective transcript), and
the kanban cascade-events.jsonl
scripts/swarm_dashboard.py — minimal TUI:
- Modeled on the native Anthropic Teams agent-list interface
- One header line: progress, runtime, tokens, aggregate cost
- One row per role-typed head: status dot (●/○/✗), name, current
task, status word, per-head tokens, per-head spend
- Reads from claude-swarm list (plain text + optional --json), so it
works against any installed version of the library
The 'global-mind' framing: the JSONL transcript is the swarm's
collective state-of-the-world, replayable for audit and observable
in real time.
389f885 to
f855f82
Compare
This was referenced May 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Autonomous Claude Swarms and Agent Teams Improvement
This PR is intended to improve the native Agent Teams feature of Claude Code. It started as a self-built project (using Claude Code) to speed up task execution for personal projects. I noticed the native Teams Feature on Sunday, May 10, 2026, and figured this would naturally align with the development. Looking forward to iterating on this with the team, and the time is well appreciated. Leveraged Claude Code to build this project and the entire PR. Claude Code drove the integration with native Teams feature, and I drove the initial swarming DAG + meta-supervisor (multi-headed) featureset, all with Claude Code.
Summary
Claude Code has opened up a world of possibility for me. In this PR, I propose an architecture for autonomous development leveraging the Claude API. It naturally falls alongside the in-beta Teams feature native to Anthropic. Thanks for reviewing!
Adds a new plugin at
plugins/swarm-orchestrator/that layers DAG-aware multi-agent coordination on top of vanilla Teams — additive only, no changes to TaskCreate / TaskUpdate / SendMessage / Team* schemas.The headline capability: autonomous, session-resistant swarm orchestration, backed by a DAG-aware kanban with atomic task claiming so multiple parallel workers can safely pull from the same task queue without races. When
claude-swarm init --persistentis configured, the meta-supervisor daemon polls task lists, dispatches role-typed heads, gates merges, and recovers crashed teammates — driven by filesystem state, not by a live session. Tasks continue to completion even after the Claude CLI exits. Scanner heads file new tasks autonomously; Builders implement them; Mergers integrate the results. A visibility CLI (claude-swarm status,swarm-watchTUI) attaches to the daemon regardless of which session started the work.What the dashboard looks like
The canonical minimal dashboard (
scripts/swarm_dashboard.py) modeled on the native Anthropic Teams agent-list interface:One header line — aggregate progress, runtime, tokens, cost. One row per role-typed head — status dot (
●cyan in_progress,○green done / dim idle / magenta blocked,✗red failed), name, current task, status word, per-head tokens, per-head spend.What ships in this PR (5 deliverables):
claude-swarmCLI + libraryBEGIN IMMEDIATEatomic claim), supervisor with parallel dispatch, 12 subcommandsBEGIN IMMEDIATEis single-host; next step is a Postgres adapter usingSELECT FOR UPDATE SKIP LOCKEDfor multi-host production scale (sameKanban.claim_one()public API; only the backend swaps)claude-swarm run --daemondetaches via single-fork + setsid; survives Claude Code exit +claude --resumedaemon-status/daemon-stopfor clean lifecycle/swarm-spawn,/swarm-submit,/swarm-status,/swarm-start,/swarm-stop,/swarm-merge,/swarm-abort,/swarm-testtry-swarm.shruns a 5-task DAG in ~25 s via Haiku with auth probe + consent promptclaude_swarm/agents.pyis the on-disk substrate (~/.claude/teams/<team>/agents/<name>.json) the future nativeAgent(..., persistent=True)would write toPertinent — Agent Tool
Two concrete native-Teams limitations this PR confronts head-on:
SendMessagedoes not surface in spawned teammates.ToolSearch select:SendMessagereturns "no matching deferred tools" inside Agent-spawned subprocesses, even with the experimental flag on. Workaround: a filesystem-RPC fallback (claude_swarm/messaging.py) writes directly to~/.claude/teams/<name>/inboxes/<recipient>.json, which Anthropic's runtime auto-delivers as a conversation turn. Same on-disk schema, drop-in compatible.Agenttool itself is not session-resistant. In-binaryAgentspawns die when Claude Code exits andclaude --resumecannot reattach them. This PR ships a parallel substrate (the keepalive daemon dispatchingclaude --printsubprocesses in a detached process group) so the work survives — but making the same Agent survive requires a binary refactor (proposed in "What this PR ships, and what's next").Where it started — a Kanban board
The seed of this project was a DAG-aware kanban board with atomic
claim_one()task claiming, sitting underneath a small set of role-typed agents. Every other primitive in this PR — supervisor loop, heads architecture, auto-cascade unblock, abort-marker contract, keepalive daemon — grew outward from that one data structure. Tasks declareblockedByedges, the supervisor pollsunblocked()for the topological frontier, parallel heads pull atomically so two workers never claim the same task. Shipped backend is SQLite WAL +BEGIN IMMEDIATE; a Postgres adapter withSELECT FOR UPDATE SKIP LOCKEDis the natural next step for multi-host production scale.Slash commands reference
The plugin ships 8 slash commands at
plugins/swarm-orchestrator/commands/*.md— single markdown files with YAML frontmatter declaringdescription/argument-hint/allowed-tools, plus the prompt body Claude executes./swarm-spawn <goal>TeamCreate+TaskCreate/swarm-submit <prompt>claude --print/swarm-status [name]/swarm-start/swarm-stop/swarm-merge [name]/swarm-abort <head><worktree>/.claude/abort-<head>marker; the head commits WIP, pushes, and exits cleanly/swarm-test [name]TeamCreate+TaskCreateand populate the agent-list view with role-typed headsclaude-swarmCLI referenceThe plugin's daemon + slash commands wrap the
claude-swarmPython CLI (installed automatically into.swarm-venv/bytry-swarm.sh).claude-swarm init--home DIRclaude-swarm submit--title --prompt --head --blocked-by --tag --priorityclaude-swarm list--status pending|in_progress|done|failed|...--tagclaude-swarm status--homeclaude-swarm unblocked--headclaude-swarm headsclaude-swarm inbox send / recv--from --to --kind --body/--name --drainclaude-swarm abort {set,clear,check}<worktree>/.claude/abort-<name>markers--worktree --teammate --reasonclaude-swarm merge--repo --test-cmd --no-overlap-rejectclaude-swarm run--conductor stub|claude--demo-delay-s--max-parallel--global-mind-log--daemon--max-iterations--poll-sclaude-swarm daemon-status--homeclaude-swarm daemon-stop--home --timeout-sDependency on the
claude-swarmlibraryThis plugin requires the standalone
claude-swarmPython library (Apache 2.0, 94 tests, mypy strict + ruff clean, ~3 KLOC) for its functional behavior. The plugin's static surfaces (manifest, agent frontmatter, hook scripts, examples, design doc) are self-contained markdown + Python, but the orchestration runtime — kanban with atomic claim, supervisor loop, keepalive daemon, parallel dispatch, global-mind transcript, persistent-agent state schema, dashboard — all live inclaude-swarm. The plugin's slash commands shell out toclaude-swarmCLI.The library installs automatically when the operator runs
try-swarm.sh(pip install into a sandboxed.swarm-venv/inside the plugin directory).Three integration paths Anthropic can pick from:
try-swarm.shcontinues topip installfrom GitHub @ main. Lowest-touch.claude_swarm/intoplugins/swarm-orchestrator/lib/— ~3 KLOC additional inside claude-code, no external dep. Happy to prepare the follow-up PR.claude-swarmto PyPI under Anthropic's namespace (or accept a transfer) — install becomespip install claude-swarmwith formal versioning.Try it
One command (no prerequisites beyond the
claudeCLI on PATH):The script:
.swarm-venv/inside the plugin directorypip installsclaude-swarm+rich(force-reinstall on every run, so a stale venv from a previous demo never silently uses old code)ANTHROPIC_API_KEYenv var) and exits without dispatching anythingyesconsent before any LLM dispatch--max-parallel 3, launches the dashboardEXIT INT TERM— no orphan subprocesses on Ctrl + CTwo non-default modes:
How to review this PR
The diff is partitioned cleanly. Suggested order:
plugins/swarm-orchestrator/plugin.json+.claude-plugin/marketplace.json— surface areaplugins/swarm-orchestrator/agents/*.md(6 files) — role-typed heads with tool allowlistsplugins/swarm-orchestrator/commands/*.md(8 files) — operator entrypointsplugins/swarm-orchestrator/hooks/{on_task_complete,reviewer_checkpoint}.py— DAG cascade + checkpointplugins/swarm-orchestrator/scripts/{try-swarm.sh,swarm_dashboard.py}— demo + dashboardplugins/swarm-orchestrator/tests/— 25 / 25 tests (15 hook + 10 scenario), no LLM dispatch requiredplugins/swarm-orchestrator/IMPROVEMENTS_OVER_VANILLA_TEAMS.md— design proposal + roadmapNative Teams limitations addressed (the motivation)
The plugin treats current native Anthropic Teams as the substrate (
Agent,TeamCreate,TaskCreate,SendMessage,TaskUpdate, idle/wake) and layers DAG-aware orchestration on top. Every gap below is addressed by shipped code in this PR — nothing aspirational. The full design doc + items deferred to follow-up PRs is inplugins/swarm-orchestrator/IMPROVEMENTS_OVER_VANILLA_TEAMS.md.SendMessagedoesn't surface in spawned teammates (ToolSearch select:SendMessagereturns nothing inside Agent-spawned subprocesses)claude_swarm/messaging.py) writes directly to~/.claude/teams/<name>/inboxes/<recipient>.json— same on-disk schema Anthropic's auto-delivery already readsAgentspawns die on CLI exit;claude --resumecan't reattach them--daemonflag, single-fork + setsid). Workers dispatched asclaude --printsubprocesses in a different process group; the work survives even though the in-binaryAgentdoesn'tblockedByis settable but nounblocked()topological iterator or auto-cascadeKanban.unblocked()topological iterator +PostToolUse(TaskUpdate)hook auto-cascades onstatus=completedAgentloses WIPAbortMarker(claude_swarm/abort.py) — drop<worktree>/.claude/abort-<head>, the head commits WIP + exits cleanly. Verified end-to-end during PR development.Inbox(claude_swarm/messaging.py) caps atmax_messages(default 1000); drop-oldest with logged warningos.replace(claude_swarm/messaging.py_atomic_write) — applied to every task / inbox / status writeKanban.claim_one()uses SQLiteBEGIN IMMEDIATE; tested under simulated concurrencyclaude_swarm/conductor.py); merge-pipeline test commands at 900 sWorktreeManager.gc_stale()(claude_swarm/worktree.py) — removes worktrees whose branch is merged upstream--global-mind-logflag emits JSONL per supervisor step (turn, task_id, head, status, elapsed_s, cost_so_far_usd)↓ tokens/$costcolumns fromspend_by_head/tokens_by_headclaude --resumedoesn't restore prior agentsclaude_swarm.agents.AgentStateschema + read/write helpers at~/.claude/teams/<team>/agents/<name>.json— the substrate a future nativeAgent(..., persistent=True)would write toAuthentication — how the demo gets credentials
The demo dispatches via
claude --printand leverages whatever auth the operator'sclaudebinary already uses. It does NOT need its own API key plumbing. The script verifies this with a tiny Haiku probe before doing anything else; failure prints two concrete next-steps and exits cleanly.The
claudebinary resolves auth in this priority order:ANTHROPIC_API_KEYenv varapiKeyHelpervia--settingsClaude Code-credentials)claudeinteractive loginNothing is stored in the plugin tree. The plugin is keyless.
Plugin surface
claim_one()for race-free task claiming under N parallel workers.unblocked()topological iterator,add_blocked_by/add_blocksmutation, full status timeline, auto-unblock cascade viaPostToolUse(TaskUpdate)hookPostToolUse(TaskUpdate)for DAG cascade dispatch,Stopfor periodic reviewer checkpoints<worktree>/.claude/abort-<head-name>for graceful interrupt.claude-plugin/marketplace.json+ theplugins/README.mdtableSelf-healing properties
Designed around the assumption that any individual teammate may die mid-task. Below is the honest split between shipped + deferred.
Shipped in this PR:
claude_swarm/abort.py) — verified end-to-end during PR development--daemonflag) — single-fork + setsid + stdio redirect; survives CLI exit +claude --resume+ terminal closeclaude_swarm/messaging.py) — drop-oldest with logged warning; no runaway memoryos.replaceeverywhereSendMessage— works when the native tool doesn't surfaceBEGIN IMMEDIATEprevents double-claim under concurrencyDesigned but deferred to a follow-up PR:
in_progress > 30 min→ re-dispatch) — depends on multi-host coordinatortask_description → parallelism_safety. Logging is shipped (global-mind transcript); classifier is future workTeamCreateschema extension on Anthropic's sideSession-resistance — the keepalive daemon
claude-swarm run --daemondetaches the supervisor: single-fork +setsid+ stdio redirect to log file. Parent prints PID + log path then exits; child keeps polling the kanban and dispatching workers regardless of what happens to the shell, the Claude Code session, or the terminal.Bridging native Teams agents to the keepalive daemon. A native Agent (spawned by the binary's
Agenttool) can register long-running work with the daemon and exit cleanly — without doing the work itself:The native agent is the front-end (interactive, in your CLI, dies with the binary); the daemon-dispatched worker is the back-end (session-resistant, runs detached). They share the kanban + inbox as the contract.
Operator-facing test plan — session-resistance proof
Global mind — the swarm's collective transcript
Every supervisor dispatch is appended to
<home>/global-mind.jsonl(controlled viaclaude-swarm run --global-mind-log <path>). One line per event:{"ts": 1778447590.33, "turn": 1, "task_id": "0019e1...", "head": "scanner", "title": "Scan codebase", "status": "done", "elapsed_s": 0.024, "cost_so_far_usd": 0.0} {"ts": 1778447590.39, "turn": 2, "task_id": "0019e1...", "head": "builder", "title": "Refactor utils.py", "status": "done", "elapsed_s": 0.060, "cost_so_far_usd": 0.0}The transcript is replayable (
cat global-mind.jsonl | jq .), streamable (tail -fis a live feed any dashboard can subscribe to), and observable (cost accounting, latency tracking, per-head spend rollups all derive from it).Architecture stack
clickpip,flask,mkdocs— familiar conventionsrichLiverenderer +TextstylingFastAPIasyncio+subprocesssqlite3with WAL +BEGIN IMMEDIATESELECT FOR UPDATE SKIP LOCKED) is the planned next step.mypy --strict+ruffpytest(library, 94 tests) + stdlibunittest(plugin, 25 tests)No deep-learning frameworks, no databases beyond optional SQLite/Postgres, no message brokers in the default install.
What this PR ships, and what's next
The 5 deliverables are summarized at the top of this PR. Here's the honest follow-up roadmap:
Designed but deferred to a follow-up PR (not shipped here):
SendMessage. Single-host keepalive shipped here is the foundationclaimed_attask_description → parallelism_safety. Logging is shipped; classifier is future workAgent(..., persistent=True)flag + filesystem-backed reattachment onclaude --resume. The on-disk schema this PR's daemon already uses (~/.claude/teams/<team>/agents/<name>.json) is the schema the future native flag would write to — zero migration when the binary catches up.Known limitation — read first
This PR ships session-resistance for the WORK (via the keepalive daemon + filesystem kanban + persistent-agent state schema). It does NOT ship session-resistance for the native
Agenttool's team-list UI — that surface requires changes inside the claude-code binary itself.claude --resume→ daemon still alive, dispatched workers still running, kanban state intact, global-mind transcript continues appending. ✅claude --resume→ native in-binaryAgentspawns are gone from the team-list UI. ❌ Architectural limit of the currentAgenttool — the proposed binary-side fix is in "What this PR ships, and what's next".Deferred polish — to land after review
Two demo-only conveniences this PR keeps project-local rather than canonical, to keep the review diff scoped:
try-swarm.shcreates.swarm-venv/insideplugins/swarm-orchestrator/. Once production-ready, it should move to~/.claude/claude-swarm/venv/(matching the keepalive daemon's home at~/.claude/swarm/). Keeping it local during review keeps install + cleanup symmetric with the rest of the plugin tree.claude-swarm run --conductor claudeforcesclaude-haiku-4-5for every head so the demo finishes in <30 s at minimal token spend. Production callers wiremodel_override=None(Python API) orCLAUDE_SWARM_MODEL_OVERRIDE=""(env var) to use each head's role-default.Breaking changes
None. No changes to
TaskCreate/TaskUpdate/SendMessage/Team*schemas. No changes to existing plugins. No changes to vanilla Teams behavior. New hooks only fire when the plugin is installed AND the session matches a swarm context.Test plan
Library (
github.com/kushalj1997/claude-swarm):mypy --strictclean +ruff checkcleanPlugin (
plugins/swarm-orchestrator/):python3 -m unittest tests.test_hooks -vpython3 tests/swarming/run_scenario.py --allTaskUpdate/ non-Builderevents)Demo verification (operator-facing
try-swarm.shpath):--stubmode completes in ~20 s, no LLM dispatch, no auth promptdone ≥ 1andpending = in_progress = 0Maintainer review:
claude plugin validate plugins/swarm-orchestrator/License
Open to your guidance. This plugin directory does not currently include a
LICENSEfile — matching the convention I observed across the existing 14 plugins inplugins/(none ship per-plugin licenses; the repo's top-levelLICENSE.mdapplies). The companion standalone library at https://github.com/kushalj1997/claude-swarm ships under Apache 2.0 explicitly.CLA
Will sign Anthropic's standard CLA on submission if required.
References
plugins/feature-dev/(commands + agents) andplugins/hookify/(hooks layout)blockedByfield ofTaskCreate(no schema change)plugins/learning-output-style/'s SessionStart injection but fires onStopagainst turn-count config