Skip to content

sneiko/agent-guard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AgentGuard

Contract-based accountability runtime for AI agents.

AgentGuard eliminates the "false progress" problem — when an AI agent reports work is done, but nothing actually happened. Instead of trusting agent output, AgentGuard independently verifies it.

The Problem

You ask an agent to deploy a website. It responds: "Done! The website is deployed at example.com."

But is it? Did the agent actually run the deploy command? Is the URL live? Does the page contain the right content?

Today, you have two options: trust the agent blindly, or manually verify everything yourself. Neither scales.

With agents running multi-step pipelines — writing code, deploying services, calling APIs — you need a system that verifies results independently, not one that takes the agent's word for it.

How AgentGuard Works

AgentGuard is an MCP server that sits between you and your agent. You define a contract: what the agent must do, and what evidence proves it was done. The agent works autonomously, submits evidence, and AgentGuard verifies it independently.

You write a contract     Agent gets the task     Agent submits evidence     AgentGuard verifies
┌──────────────────┐     ┌────────────────┐     ┌──────────────────────┐    ┌─────────────────┐
│ "Deploy site,    │────>│ Reads task +   │────>│ Sends command output │───>│ Checks exit code│
│  prove it's live │     │ requirements   │     │ and deploy URL       │    │ GETs the URL    │
│  with http 200"  │     └────────────────┘     └──────────────────────┘    │ Verifies body   │
└──────────────────┘                                                        └────────┬────────┘
                                                                                     │
                                                                              VERIFIED or FAILED

No evidence = not done. It's that simple.

Features

Single Agent Mode

Run one agent with a contract — the classic mode. Define a task, required evidence, and bounds.

Pipeline Mode (v0.2)

Orchestrate multi-agent pipelines as a DAG. Each stage has its own agent, contract, and evidence. Stages run in dependency order with artifact passing between them.

stages:
  - id: build
    agent_id: builder
    contract: { ... }
    output_artifacts: ["binary"]
  - id: test
    agent_id: tester
    depends_on: [build]
    contract: { ... }
  - id: deploy
    agent_id: deployer
    depends_on: [test]
    contract: { ... }

Free Mode (v0.2)

Give an orchestrator agent a high-level goal with meta-evidence. The orchestrator decomposes the task, creates sub-contracts, and delegates to worker agents — all within enforced limits (max sub-contracts, max depth, max cost).

MCP over SSE (v0.2)

Multiple agents connect to a single AgentGuard instance via HTTP Server-Sent Events. Each agent gets its own SSE stream and message endpoint. Supports ?agent_id= query parameter to identify agents when all instances share the same clientInfo.name.

Quick Start

Install

go install github.com/agentguard/agentguard/cmd/agentguard@latest

Or build from source:

git clone https://github.com/agentguard/agentguard.git
cd agentguard
go build -o agentguard ./cmd/agentguard

1. Write a Contract

Create contract.yaml:

id: ct_deploy
version: 1

task:
  summary: "Create an index.html file"
  context: |
    Create a file at /tmp/agentguard-test/output.txt
    with the content "hello agentguard".

evidence:
  required:
    - id: file_created
      type: file_exists
      description: "Output file exists and is not empty"
      verify:
        path: "/tmp/agentguard-test/output.txt"
        non_empty: true

    - id: content_correct
      type: file_content
      description: "File contains the expected text"
      depends_on: file_created
      verify:
        path: "/tmp/agentguard-test/output.txt"
        contains: "hello agentguard"

bounds:
  timeout: "300s"
  max_retries: 2

on_failure:
  action: escalate

2. Validate

agentguard contract validate contract.yaml
# Contract is valid.

3. Run as MCP Server (stdio)

agentguard serve --contract contract.yaml

4. Run a Pipeline (SSE)

agentguard pipeline run --file pipeline.yaml --port 9200

Agents connect via SSE:

GET http://127.0.0.1:9200/sse?agent_id=builder
GET http://127.0.0.1:9200/sse?agent_id=tester

5. Check the Audit Log

Every state transition and verification is recorded:

agentguard audit log --contract-id ct_deploy
# [14:28:01] contract.created     contract=ct_deploy
# [14:28:01] contract.state_changed contract=ct_deploy {from: created, to: assigned}
# [14:28:05] evidence.verified    contract=ct_deploy {evidence_id: file_created}
# [14:28:06] evidence.verified    contract=ct_deploy {evidence_id: content_correct}
# [14:28:06] contract.state_changed contract=ct_deploy {from: assigned, to: done}

MCP Tools

Standard Tools (all agents)

Tool Purpose
task/get Get the assigned task and evidence requirements
evidence/submit Submit evidence for independent verification
task/complete Declare the task done (fails if evidence is missing)
task/blocked Report a blocker — triggers escalation
status/get Check current verification progress
artifact/get Retrieve input artifacts
artifact/put Store output artifacts

Orchestrator Tools (Free Mode)

Tool Purpose
task/create Create a child task contract for a worker agent
task/list-children List all child task contracts
task/get-child Get details of a child task contract
task/cancel-child Cancel a child task contract

Evidence Types

AgentGuard independently verifies each type:

Type What the Agent Provides What AgentGuard Checks
command_output command, stdout, exit_code Validates exit code, patterns in stdout
http_check URL Makes GET request, checks status code and body
file_exists file path Checks existence, size, SHA256 hash
file_content file path Checks contains/regex/valid JSON/valid YAML
json_value JSON data JSONPath extraction + equality/range checks
test_result test command Runs tests, verifies exit code (Phase 2)
api_state resource ID Calls provider API to verify state (Phase 2)

Evidence Chains

Evidence can depend on other evidence. For example, first verify a deploy command succeeded, then use the URL from its output to verify the site is live:

evidence:
  required:
    - id: deploy_done
      type: command_output
      verify:
        exit_code: 0
        stdout_pattern: "https://.*\\.vercel\\.app"

    - id: site_live
      type: http_check
      depends_on: deploy_done
      verify:
        extract_pattern: "https://[\\S]+"
        status: 200
        body_contains: "Welcome"

Contract Lifecycle

CREATED ──> ASSIGNED ──> VERIFYING ──> DONE
   │            │            │
   │            │            ├──> RETRYING ──> ASSIGNED (retry)
   │            │            │                     │
   │            │            └──> FAILED           └──> FAILED (max retries)
   │            │
   │            └──> EXPIRED (timeout)
   │
   └──> CANCELLED

Every transition is recorded in the audit log with a timestamp.

Architecture

┌──────────────────────────────────────────────────────────┐
│                       AgentGuard                          │
│                                                           │
│  ┌─────────────┐  ┌──────────────┐  ┌────────┐          │
│  │  Transport   │  │   Contract   │  │  Store  │          │
│  │  MCP stdio   │  │  Lifecycle   │  │ (bbolt) │          │
│  │  MCP SSE     │  │  DAG / Parse │  │         │          │
│  └──────┬──────┘  └──────┬───────┘  └────┬────┘          │
│         │                │               │               │
│  ┌──────┴────────────────┴───────────────┴──────────┐    │
│  │              Evidence Verifiers                    │    │
│  │  command | http | file | json | ...                │    │
│  └──────────────────────────────────────────────────┘    │
│                                                           │
│  ┌──────────────────────────────────────────────────┐    │
│  │              Control Plane                         │    │
│  │  limits | circuit breaker | scheduler              │    │
│  └──────────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────────┘
         ▲                              │
    JSON-RPC 2.0                   Audit Log
    stdio / SSE                  (append-only)
         │
    ┌────┴────┐
    │ Agents  │
    └─────────┘
  • Single binary, zero external dependencies at runtime
  • bbolt embedded database — no PostgreSQL, no Redis, no Docker
  • Append-only audit log for compliance
  • Go 1.23+

Configuration

Create agentguard.yaml (all values are optional — sensible defaults are built in):

store:
  path: "./data/agentguard.db"
  artifacts_dir: "./data/artifacts"

defaults:
  timeout: 300s
  max_retries: 2
  max_evidence_submissions: 50

escalation:
  webhook_url: "https://hooks.slack.com/..."

verification:
  http_check:
    timeout: 10s
    retries: 3

logging:
  level: info       # debug | info | warn | error
  format: json      # json | text

CLI Reference

agentguard serve                        Start MCP server (stdio, single agent)
  --contract <path>                     Contract YAML (required)
  --config <path>                       Config file
  --log-level <level>                   debug|info|warn|error

agentguard pipeline run                 Start pipeline server (SSE, multi-agent)
  --file <path>                         Pipeline YAML (required)
  --port <port>                         HTTP port (default: 9200)

agentguard contract validate <path>     Validate contract YAML

agentguard audit log                    View audit events
  --contract-id <id>                    Filter by contract
  --limit <n>                           Max entries (default: 100)

agentguard status                       System health check
agentguard version                      Print version

Testing

# Unit and integration tests
go test ./...

# E2E test with Claude Code agents (requires claude CLI)
bash test/e2e/pipeline/run.sh

Why Not Just Check Manually?

Manual AgentGuard
Speed You check after the fact Real-time, as evidence comes in
Consistency Depends on your attention Same checks every time
Audit trail Hope you remember Every event recorded with proof
Scale 1 agent, maybe Multi-agent pipelines with dependency chains
Cost Your time Milliseconds of CPU

Roadmap

  • v0.1 — Single agent, contract verification, MCP server (stdio), audit log
  • v0.2 (current) — Multi-agent pipelines, Free Mode, MCP over SSE, circuit breaker, limits enforcement
  • v0.3 — gRPC API, terminal dashboard, plugin system
  • v0.4 — Kubernetes operator, distributed mode

License

MIT

About

Contract-based accountability runtime for AI agents. Define tasks with required evidence — no proof means not done. MCP server with multi-agent pipelines, independent verification, and append-only audit log.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors