An autonomous multi-agent development team powered by Claude Code - with quality gates to ensure nothing ships broken.
The Agency is a framework for running multiple AI agents as a coordinated software development team. Each agent has a specialized role (Product Owner, Tech Lead, Developers, QA, Reviewer, DevOps) and they communicate through shared markdown files - enabling full observability and human intervention at any point.
Key Features:
- Quality-first - Mandatory QA gate verifies work before shipping
- Autonomous operation - Agents poll for work and execute independently
- Human-readable state - All coordination happens via markdown files you can read/edit
- Token efficient - Stateless agents spawn fresh for each task, no accumulated context
- Git-friendly - Templates tracked in git, runtime state gitignored
- Claude Code CLI installed and authenticated
- Bash shell (Linux/macOS/WSL)
- Git (optional, for version control)
# Clone the repository
git clone https://github.com/forever8896/agency.git
cd agency
# Initialize the data directory (creates agency/data/ with templates)
./agency.sh status
# Add your first request to the inbox
cat >> agency/data/inbox.md << 'EOF'
## NEW: Build a simple CLI todo app
**Priority:** high
**Description:** Command-line todo app with add, list, complete, delete
**Context:** Use Python, keep it simple
EOF
# Start the squad
./agency.sh start
# Watch live activity (no token usage)
./agency.sh watchagency/
βββ inbox.md # Template: request format documentation
βββ backlog.md # Template: workflow documentation
βββ board.md # Template: kanban structure
βββ standup.md # Template: standup format
βββ metrics.md # Template: DORA metrics explanation
β
βββ agency/data/ # Runtime state (gitignored)
β βββ inbox.md # Active requests
β βββ backlog.md # Work items in progress
β βββ board.md # Current kanban state
β βββ standup.md # Agent status updates
β βββ metrics.md # Tracked metrics
β βββ handoffs/ # Inter-agent communication
β βββ projects/ # Project specifications
β βββ knowledge/ # Shared knowledge base
β
βββ agents/ # Agent definitions (AGENT.md prompts)
β βββ product-owner/
β βββ tech-lead/
β βββ dev-alpha/
β βββ dev-beta/
β βββ dev-gamma/
β βββ qa/
β βββ reviewer/
β βββ devops/
β
βββ agency.sh # Main orchestrator
βββ run-agent.sh # Individual agent runner
The Agency separates templates (tracked in git) from runtime data (gitignored):
| Location | Purpose | Git Status |
|---|---|---|
inbox.md |
Template showing request format | Tracked |
agency/data/inbox.md |
Your actual requests | Ignored |
backlog.md |
Template showing workflow | Tracked |
agency/data/backlog.md |
Your actual work items | Ignored |
On first run, templates are automatically copied to agency/data/ if no data exists. This means:
- New users get clean templates with documentation
- Your work state persists across sessions
- Git stays clean - no accidental commits of work-in-progress
1. You add request β agency/data/inbox.md (## NEW:)
2. PO triages β agency/data/backlog.md (## READY:)
3. Dev claims work β ## IN_PROGRESS: @dev-alpha
4. Dev completes β ## DONE: @dev-alpha
5. QA verifies β ## QA_PASSED: @qa (or ## QA_FAILED:)
6. Reviewer (if req) β ## REVIEWED: @reviewer
7. DevOps deploys β ## SHIPPED:
Work items flow through states via markdown headers:
## NEW:- Fresh request in inbox## TRIAGED:- PO has reviewed and added context## READY:- In backlog, available for claiming## IN_PROGRESS: @agent- Being worked on## DONE: @agent- Completed, awaiting QA## QA_TESTING: @qa- Being verified by QA## QA_PASSED: @qa- Verified working## QA_FAILED: @qa- Failed verification (returns to dev)## REVIEWING: @reviewer- Under code review## REVIEWED: @reviewer- Code review approved## SHIPPED:- Live in production
Configure via environment variables:
# Agency files location (default: script directory)
export AGENCY_DIR=~/path/to/agency
# Runtime data location (default: $AGENCY_DIR/agency/data)
export DATA_DIR=~/path/to/data
# Where agents create code projects (default: ~/projects)
export PROJECTS_DIR=~/code
# Polling interval in seconds (default: 30)
export POLL_INTERVAL=60
# Start with custom config
./agency.sh startPoint AGENCY_DIR to your Obsidian vault for seamless note-taking integration:
AGENCY_DIR=~/obsidian/Agency ./agency.sh start./agency.sh start # Start all agents in background
./agency.sh stop # Stop all agents
./agency.sh status # Show agent status and DORA metrics
./agency.sh watch # Live activity log (no token usage)
./agency.sh <agent> # Run single agent in foreground (for debugging)Available agents: product-owner, tech-lead, dev-alpha, dev-beta, dev-gamma, qa, reviewer, devops
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β THE AGENCY v2 β
β Squad Model - Quality First β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ β
β β Product ββββ Triages inbox, defines acceptance β
β β Owner β criteria, prioritizes backlog β
β βββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββ β
β β Tech Lead ββββ Architecture, unblocks devs, β
β β β CAN ALSO CODE (playing coach) β
β βββββββββββββββ β
β β β
β βββββββΌββββββ β
β βΌ βΌ βΌ β
β βββββββββββββββββββββ β
β βDev Ξ±ββDev Ξ²ββDev Ξ³ββββ Parallel builders, claim work directly β
β βββββββββββββββββββββ β
β β β β β
β βββββββΌββββββ β
β β β
β βΌ β
β βββββββββββββββ β
β β QA ββββ MANDATORY: Verifies work actually works β
β βββββββββββββββ before shipping (catches broken code) β
β β β
β ββββββββββββββββββββββ β
β βΌ βΌ (if Review Required) β
β βββββββββββββββ βββββββββββββββ β
β β DevOps β β Reviewer ββββ Code quality, security, β
β βββββββββββββββ βββββββββββββββ patterns (optional) β
β β β β
β ββββββββββββββββββββββ β
β β β
β βΌ β
β [SHIPPED]βββ Deployed to production β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Three files to monitor your squad:
| File | What to Watch |
|---|---|
agency/data/standup.md |
Real-time agent status, blockers |
agency/data/backlog.md |
Work states: Ready β In Progress β Done |
agency/data/metrics.md |
DORA metrics: deployment frequency, lead time |
Use watch mode for live updates without token usage:
./agency.sh watch| Challenge | Approach | Solution |
|---|---|---|
| 1 developer bottleneck | More builders = more throughput | 3 developers + tech lead who can code |
| 4 handoffs per task | Handoffs are "silent killers" | Direct claiming, minimal handoffs |
| Broken code shipping | Quality gates catch issues early | Mandatory QA verification |
| Code quality variance | Review catches patterns/security | Optional code review for complex changes |
| No real standups | Async saves ~4 hrs/week | Real async standup with blockers |
| No metrics | DORA metrics correlate with performance | Built-in DORA tracking |
While some research suggests eliminating QA gates, in practice with AI agents:
- Agents can miss edge cases during self-testing
- "It works on my machine" happens with AI too
- A quick verification prevents shipping broken code
- Failed QA loops back to the dev who built it (ownership preserved)
QA is lightweight - verify it works, not exhaustive testing.
- DORA State of DevOps Report - 10 years, 32,000+ professionals
- Spotify Squad Model - Cross-functional autonomous teams
- Amazon Two-Pizza Teams - Max 7 people with end-to-end ownership
- Handoffs Research - Each handoff is a failure point
- Async Standup Research - 23 min focus recovery per interrupt
The Agency tracks the four key metrics that correlate with high-performing teams:
| Metric | Target | Description |
|---|---|---|
| Deployment Frequency | Daily | How often code ships to production |
| Lead Time | < 1 day | Time from commit to production |
| Change Failure Rate | < 15% | % of deployments causing issues |
| MTTR | < 1 hour | Time to recover from incidents |
-
Create a directory under
agents/:mkdir -p agents/my-agent
-
Create
AGENT.mdwith the agent's prompt:# My Agent You are a specialized agent that... ## Responsibilities - Task 1 - Task 2 ## Workflow 1. Check for work in... 2. Process and update...
-
Add to
AGENTSarray inagency.sh:AGENTS=("product-owner" "tech-lead" ... "my-agent")
Agents are designed to be stateless and token-efficient:
- One task, one session - Agents spawn fresh for each task
- Minimal context - Only relevant files are read
- Exit when done - No idle polling inside Claude sessions
- Watch mode - Monitor activity without using tokens
Run a single agent in foreground to see its output:
./agency.sh dev-alphaCheck agent logs:
# Agents run via nohup, check standup.md for status
cat agency/data/standup.md"Speed and stability are not tradeoffs. Elite teams excel at both." β DORA Research
"Handoffs are a silent killer in software development." β Scrum.org
"No team should be set up that cannot be fed by two pizzas." β Jeff Bezos
The Agency is an experiment in applying organizational research to AI agents. The goal is not just to build software, but to observe how different team structures affect autonomous agent performance.
Contributions welcome! Areas of interest:
- New agent roles
- Alternative team structures
- Metrics and observability
- Integration with other AI tools
MIT - do whatever you want with this.
- Research: DORA, Spotify Engineering, Amazon/Google team structure studies
- Original concept: Denislav Gavrilov