Fleet engineering is replacing ad-hoc populations of agents with an accountable organization. You design the registry, identity, permissions, inbox, audit trail, and sovereign control that let many loops run safely across a team.
A fleet is not "many agents." A fleet is a governed population where every action answers one sentence:
Which agent did it, with what authority, against what task, evidenced by what?
→ cobusgreyling.github.io/fleet-engineering
→ Fleet Engineering essay on Substack
| Symptom | Start with |
|---|---|
| "We have agents everywhere" | Team Agent Registry |
| Agents act without oversight | Shared Inbox HITL |
| Token bill surprise | Fleet Budget Guard |
| "Who did this?" in an incident | Cross-Agent Audit |
| Already have loops | Fleet + Loop starter |
Unsure? Use the Pattern Picker.
| Layer | Unit of design | Question |
|---|---|---|
| Context Engineering | One inference | What does the model see? |
| Harness Engineering | One agent run | How does a single run execute safely? |
| Loop Engineering | One autonomous system | What keeps prompting and verifying over time? |
| Fleet Engineering | Many agents + loops | How do populations coordinate and govern at scale? |
| Start here | Description |
|---|---|
| Concepts | Fleet vs loop vs harness — read this first |
| Maturity Model | F0–F3 phased rollout |
| Five Concerns | Topology, choreography, identity, economics, sovereign control |
| Accountability Test | The one-sentence standard for real fleets |
| Pattern Picker | Which fleet pattern to adopt first |
| Failure Modes | Incident-style catalog |
| Primitives Matrix | LangSmith Fleet vs DIY vs OpenHermit |
| Fleet Design Checklist | Ship readiness rubric (F0–F3) |
| Patterns | 6 production fleet patterns |
| Starters | Clone-and-run kits + GitHub template |
| Examples | Runnable DIY / LangSmith / OpenHermit walkthroughs |
| fleet-audit | npx @cobusgreyling/fleet-audit |
| fleet-init | npx @cobusgreyling/fleet-init |
| fleet-budget | npx @cobusgreyling/fleet-budget |
| Stories | Real wins and honest failures |
# 1. Scaffold a minimal fleet workspace
npx @cobusgreyling/fleet-init ~/my-fleet --pattern team-agent-registry
# 2. Audit readiness
npx @cobusgreyling/fleet-audit ~/my-fleet --suggest
# 3. Roll up token caps
npx @cobusgreyling/fleet-budget ~/my-fleet
# 4. Start F1: registry + permissions doc only — no unattended autonomyFrom a clone:
git clone https://github.com/cobusgreyling/fleet-engineering.git
cd fleet-engineering && npm install && npm test
node tools/fleet-init/cli.js /tmp/fleet-demo --pattern team-agent-registryPhased rollout: F0 ad-hoc → F1 catalog + inbox → F2 shared agents + budgets → F3 enterprise governance
| Primitive | Job in the Fleet |
|---|---|
| Registry | What agents exist, who owns them, version, lifecycle |
| Identity & credentials | Claw (service) vs assistant (act-as-user) |
| Permissions & sharing | clone / run / edit; workspace vs individual |
| Inbox / escalation | Fleet-wide HITL; approve/reject across agents |
| Observability & audit | Traces, decision evidence, cross-agent search |
| Economics | Budgets, quotas, cost attribution per agent/team |
| Sovereign control | Kill switch, rollback, autonomy tiers |
Full detail: docs/primitives.md · Cross-platform matrix: docs/primitives-matrix.md
flowchart TB
subgraph Registry
R[Agent Registry<br/>manifests + owners]
end
subgraph Identity
I[Credentials model<br/>Claw vs Assistant]
P[Permissions<br/>clone · run · edit]
end
subgraph Operations
L1[Loop A]
L2[Loop B]
L3[Loop C]
end
subgraph Control
IN[Shared Inbox<br/>HITL]
AU[Audit / Traces]
EC[Budgets & Quotas]
KS[Kill Switch]
end
R --> L1 & L2 & L3
I --> L1 & L2 & L3
P --> L1 & L2 & L3
L1 & L2 & L3 --> IN
L1 & L2 & L3 --> AU
EC --> L1 & L2 & L3
KS --> L1 & L2 & L3
| Pattern | Scale | Starter | Week 1 | Cost risk |
|---|---|---|---|---|
| Team Agent Registry | 3–20 agents | minimal-fleet | F1 catalog only | Low |
| Shared Inbox HITL | 2+ active agents | minimal-fleet | F1 approve-only | Low |
| Hierarchical Delegation | manager + workers | minimal-fleet | F1 report chain | Medium |
| Agent Clone & Fork | 1 → many teams | minimal-fleet | F1 clone policy | Low |
| Fleet Budget Guard | any active fleet | minimal-fleet | F1 caps only | Low |
| Cross-Agent Audit | compliance / incidents | minimal-fleet | F1 read-only audit | Low |
Machine-readable index: patterns/registry.yaml
- Failure Modes — incident-style catalog
- Multi-Fleet Coordination — when teams run separate fleets
- Operating Fleets — cost, logging, when to kill
- Safety — autonomy tiers, denylist, kill switches
- Security — reporting and fleet-scale automation risks
- Stack — context → harness → loop → fleet trail
- loop-engineering — the layer below: autonomous loops that prompt your agents
- awesome-harness-engineering — harness primitives and curated resources
Share production patterns, platform mappings, and failure stories. See CONTRIBUTING.md.
MIT
Practical, platform-aware reference for fleet engineering — patterns you can clone, checklists you can ship against, and stories that include what broke.
