Contract-based accountability runtime for AI agents.
AgentGuard eliminates the "false progress" problem — when an AI agent reports work is done, but nothing actually happened. Instead of trusting agent output, AgentGuard independently verifies it.
You ask an agent to deploy a website. It responds: "Done! The website is deployed at example.com."
But is it? Did the agent actually run the deploy command? Is the URL live? Does the page contain the right content?
Today, you have two options: trust the agent blindly, or manually verify everything yourself. Neither scales.
With agents running multi-step pipelines — writing code, deploying services, calling APIs — you need a system that verifies results independently, not one that takes the agent's word for it.
AgentGuard is an MCP server that sits between you and your agent. You define a contract: what the agent must do, and what evidence proves it was done. The agent works autonomously, submits evidence, and AgentGuard verifies it independently.
You write a contract Agent gets the task Agent submits evidence AgentGuard verifies
┌──────────────────┐ ┌────────────────┐ ┌──────────────────────┐ ┌─────────────────┐
│ "Deploy site, │────>│ Reads task + │────>│ Sends command output │───>│ Checks exit code│
│ prove it's live │ │ requirements │ │ and deploy URL │ │ GETs the URL │
│ with http 200" │ └────────────────┘ └──────────────────────┘ │ Verifies body │
└──────────────────┘ └────────┬────────┘
│
VERIFIED or FAILED
No evidence = not done. It's that simple.
Run one agent with a contract — the classic mode. Define a task, required evidence, and bounds.
Orchestrate multi-agent pipelines as a DAG. Each stage has its own agent, contract, and evidence. Stages run in dependency order with artifact passing between them.
stages:
- id: build
agent_id: builder
contract: { ... }
output_artifacts: ["binary"]
- id: test
agent_id: tester
depends_on: [build]
contract: { ... }
- id: deploy
agent_id: deployer
depends_on: [test]
contract: { ... }Give an orchestrator agent a high-level goal with meta-evidence. The orchestrator decomposes the task, creates sub-contracts, and delegates to worker agents — all within enforced limits (max sub-contracts, max depth, max cost).
Multiple agents connect to a single AgentGuard instance via HTTP Server-Sent Events. Each agent gets its own SSE stream and message endpoint. Supports ?agent_id= query parameter to identify agents when all instances share the same clientInfo.name.
go install github.com/agentguard/agentguard/cmd/agentguard@latestOr build from source:
git clone https://github.com/agentguard/agentguard.git
cd agentguard
go build -o agentguard ./cmd/agentguardCreate contract.yaml:
id: ct_deploy
version: 1
task:
summary: "Create an index.html file"
context: |
Create a file at /tmp/agentguard-test/output.txt
with the content "hello agentguard".
evidence:
required:
- id: file_created
type: file_exists
description: "Output file exists and is not empty"
verify:
path: "/tmp/agentguard-test/output.txt"
non_empty: true
- id: content_correct
type: file_content
description: "File contains the expected text"
depends_on: file_created
verify:
path: "/tmp/agentguard-test/output.txt"
contains: "hello agentguard"
bounds:
timeout: "300s"
max_retries: 2
on_failure:
action: escalateagentguard contract validate contract.yaml
# Contract is valid.agentguard serve --contract contract.yamlagentguard pipeline run --file pipeline.yaml --port 9200Agents connect via SSE:
GET http://127.0.0.1:9200/sse?agent_id=builder
GET http://127.0.0.1:9200/sse?agent_id=tester
Every state transition and verification is recorded:
agentguard audit log --contract-id ct_deploy
# [14:28:01] contract.created contract=ct_deploy
# [14:28:01] contract.state_changed contract=ct_deploy {from: created, to: assigned}
# [14:28:05] evidence.verified contract=ct_deploy {evidence_id: file_created}
# [14:28:06] evidence.verified contract=ct_deploy {evidence_id: content_correct}
# [14:28:06] contract.state_changed contract=ct_deploy {from: assigned, to: done}| Tool | Purpose |
|---|---|
task/get |
Get the assigned task and evidence requirements |
evidence/submit |
Submit evidence for independent verification |
task/complete |
Declare the task done (fails if evidence is missing) |
task/blocked |
Report a blocker — triggers escalation |
status/get |
Check current verification progress |
artifact/get |
Retrieve input artifacts |
artifact/put |
Store output artifacts |
| Tool | Purpose |
|---|---|
task/create |
Create a child task contract for a worker agent |
task/list-children |
List all child task contracts |
task/get-child |
Get details of a child task contract |
task/cancel-child |
Cancel a child task contract |
AgentGuard independently verifies each type:
| Type | What the Agent Provides | What AgentGuard Checks |
|---|---|---|
command_output |
command, stdout, exit_code | Validates exit code, patterns in stdout |
http_check |
URL | Makes GET request, checks status code and body |
file_exists |
file path | Checks existence, size, SHA256 hash |
file_content |
file path | Checks contains/regex/valid JSON/valid YAML |
json_value |
JSON data | JSONPath extraction + equality/range checks |
test_result |
test command | Runs tests, verifies exit code (Phase 2) |
api_state |
resource ID | Calls provider API to verify state (Phase 2) |
Evidence can depend on other evidence. For example, first verify a deploy command succeeded, then use the URL from its output to verify the site is live:
evidence:
required:
- id: deploy_done
type: command_output
verify:
exit_code: 0
stdout_pattern: "https://.*\\.vercel\\.app"
- id: site_live
type: http_check
depends_on: deploy_done
verify:
extract_pattern: "https://[\\S]+"
status: 200
body_contains: "Welcome"CREATED ──> ASSIGNED ──> VERIFYING ──> DONE
│ │ │
│ │ ├──> RETRYING ──> ASSIGNED (retry)
│ │ │ │
│ │ └──> FAILED └──> FAILED (max retries)
│ │
│ └──> EXPIRED (timeout)
│
└──> CANCELLED
Every transition is recorded in the audit log with a timestamp.
┌──────────────────────────────────────────────────────────┐
│ AgentGuard │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌────────┐ │
│ │ Transport │ │ Contract │ │ Store │ │
│ │ MCP stdio │ │ Lifecycle │ │ (bbolt) │ │
│ │ MCP SSE │ │ DAG / Parse │ │ │ │
│ └──────┬──────┘ └──────┬───────┘ └────┬────┘ │
│ │ │ │ │
│ ┌──────┴────────────────┴───────────────┴──────────┐ │
│ │ Evidence Verifiers │ │
│ │ command | http | file | json | ... │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Control Plane │ │
│ │ limits | circuit breaker | scheduler │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
▲ │
JSON-RPC 2.0 Audit Log
stdio / SSE (append-only)
│
┌────┴────┐
│ Agents │
└─────────┘
- Single binary, zero external dependencies at runtime
- bbolt embedded database — no PostgreSQL, no Redis, no Docker
- Append-only audit log for compliance
- Go 1.23+
Create agentguard.yaml (all values are optional — sensible defaults are built in):
store:
path: "./data/agentguard.db"
artifacts_dir: "./data/artifacts"
defaults:
timeout: 300s
max_retries: 2
max_evidence_submissions: 50
escalation:
webhook_url: "https://hooks.slack.com/..."
verification:
http_check:
timeout: 10s
retries: 3
logging:
level: info # debug | info | warn | error
format: json # json | textagentguard serve Start MCP server (stdio, single agent)
--contract <path> Contract YAML (required)
--config <path> Config file
--log-level <level> debug|info|warn|error
agentguard pipeline run Start pipeline server (SSE, multi-agent)
--file <path> Pipeline YAML (required)
--port <port> HTTP port (default: 9200)
agentguard contract validate <path> Validate contract YAML
agentguard audit log View audit events
--contract-id <id> Filter by contract
--limit <n> Max entries (default: 100)
agentguard status System health check
agentguard version Print version
# Unit and integration tests
go test ./...
# E2E test with Claude Code agents (requires claude CLI)
bash test/e2e/pipeline/run.sh| Manual | AgentGuard | |
|---|---|---|
| Speed | You check after the fact | Real-time, as evidence comes in |
| Consistency | Depends on your attention | Same checks every time |
| Audit trail | Hope you remember | Every event recorded with proof |
| Scale | 1 agent, maybe | Multi-agent pipelines with dependency chains |
| Cost | Your time | Milliseconds of CPU |
- v0.1 — Single agent, contract verification, MCP server (stdio), audit log
- v0.2 (current) — Multi-agent pipelines, Free Mode, MCP over SSE, circuit breaker, limits enforcement
- v0.3 — gRPC API, terminal dashboard, plugin system
- v0.4 — Kubernetes operator, distributed mode
MIT