The credit score for AI agents.
🧭 AMC Score: 3.7 / 5.0 — Defined
Strategic Agent Ops ····· 3.2 (17 questions)
Skills ·················· 4.1 (35 questions)
Resilience ·············· 3.8 (30 questions)
Leadership & Autonomy ··· 3.5 (21 questions)
Culture & Alignment ····· 3.9 (23 questions)
Evidence: ✓ Merkle root 9c4e…a7f0 (Ed25519)
Every AI governance framework has the same fatal flaw: the agent being evaluated provides the evidence.
| Scoring Method | Score | Reality |
|---|---|---|
| Keyword / self-reported | 100/100 ✅ | "I have safety controls" |
| AMC execution-verified | 16/100 ❌ | Agent bypassed every control when tested |
That's an 84-point documentation inflation gap. AMC closes it with cryptographic evidence chains that can't be faked.
npm i -g agent-maturity-compass
mkdir my-agent && cd my-agent
amc initThat's it. amc init walks you through your first score interactively.
Share your score — paste a badge into your README:
amc quickscore --share # markdown summary + next-level action plan
amc badge # 
🌐 Website · 📖 Docs · 💬 Discussions
Agent (untrusted) → AMC Gateway (trusted observer) → Evidence Ledger (signed, hash-chained)
↓
Scoring Engine (138 questions, 5 dimensions)
↓
AMC Studio (dashboard + API)
Not all evidence is equal — AMC weights by trust tier:
| Tier | Weight | Source |
|---|---|---|
| OBSERVED_HARDENED | 1.1× | AMC-controlled traces with stronger context |
| OBSERVED | 1.0× | Directly observed via AMC gateway |
| ATTESTED | 0.8× | Cryptographic attestation via vault/notary |
| SELF_REPORTED | 0.4× | Agent claims — capped, cannot inflate maturity |
amc wrap claude -- claude "analyze this" # Claude
amc wrap gemini -- gemini chat # Gemini
amc adapters run --adapter generic-cli -- python bot.py # Any CLI agent
amc score evidence-ingest --format openai-evals # Import existing evals14 framework adapters: LangChain, LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, LlamaIndex, Semantic Kernel, Claude Code, Gemini, OpenClaw, OpenHands, and more.
AMC doesn't just score — it generates operational guardrails and applies them directly to your agent's config file.
# One command: detect framework → generate guardrails → apply to config
amc guide --go- 10 frameworks auto-detected from project files (pyproject.toml, package.json, config files)
- 15 config targets — AGENTS.md, CLAUDE.md, .cursorrules, .kiro/steering, .gemini/style.md, and more
- Severity-tagged — 🔴 Critical, 🟡 High, 🔵 Medium — so you know what to fix first
- Idempotent — re-running
--applyupdates only the guardrails section (AMC-GUARDRAILS markers) - CI gate —
amc guide --ci --target 3exits non-zero if below threshold - Compliance —
amc guide --compliance EU_AI_ACTmaps gaps to regulatory obligations (EU AI Act, ISO 42001, NIST AI RMF, SOC 2, ISO 27001)
amc guide --status # One-line health check
amc guide --interactive # Cherry-pick which gaps to fix
amc guide --watch --apply # Continuous monitoring + auto-update
amc guide --diff # What improved since last runAMC ships with 40 industry-specific assessment packs covering regulated sectors, critical infrastructure, and public institutions. Each pack adds precise, sub-vertical questions on top of the base AMC rubric.
amc sector packs list # 40 packs across 7 stations
amc sector score --pack digital-health-record --agent my-agent
amc sector gaps --pack clinical-trials --agent my-agent
amc sector report --pack drug-discovery --output reports/drug.md| Station | Packs | Focus |
|---|---|---|
| 🌿 Environment | 6 | Farm-to-fork, textiles, manufacturing, energy, water |
| 🏥 Health | 9 | EHR, clinical trials, drug discovery, precision medicine |
| 💰 Wealth | 5 | Payments, financial inclusion, DeFi, circular economy |
| 🎓 Education | 5 | K-12, higher ed, skills training, accessibility |
| 🚇 Mobility | 5 | Smart cities, ports, real estate, cloud infra, privacy |
| 💡 Technology | 5 | AI intelligence, ecosystems, infotainment, IP partnerships |
| 🏛️ Governance | 5 | Digital identity, elections, legislation, citizen services |
382 questions with specific regulatory article references (HIPAA §164.312(a)(1), EU AI Act Art. 5(1)(a), FERPA 20 U.S.C. §1232g, UNECE WP.29 R155 §7, UNCAC Art. 7). Every pack includes riskTier, EU AI Act classification, SDG alignment, certification path, and key risks.
📊 The Platform (8 modules)
| Module | What It Does |
|---|---|
| AMC Score | 138 diagnostic questions, 5 dimensions, L0–L5 maturity, evidence-weighted |
| AMC Shield | 74 attack packs: injection, exfiltration, sycophancy, sabotage, over-compliance, and more |
| AMC Enforce | Governor engine with policy packs, approval workflows, scoped leases |
| AMC Vault | Ed25519 key vault, Merkle-tree evidence chains, HSM/TPM support |
| AMC Watch | Studio dashboard, gateway proxy, Prometheus metrics, cost tracking |
| AMC Fleet | Multi-agent trust composition, delegation graphs, contradiction detection |
| AMC Passport | Portable agent credential (.amcpass), verifiable offline |
| AMC Comply | EU AI Act, ISO 42001, NIST AI RMF, SOC 2 compliance mapping |
📐 5 Dimensions, 126 Questions, 6 Maturity Levels
| Dimension | Questions | Focus |
|---|---|---|
| Strategic Agent Operations | 17 | Mission clarity, scope adherence, decision traceability |
| Skills | 35 | Tool mastery, injection defense, DLP, zero-trust |
| Resilience | 30 | Graceful degradation, circuit breakers, monitor bypass resistance |
| Leadership & Autonomy | 21 | Structured logs, traces, cost tracking, SLO monitoring |
| Culture & Alignment | 23 | Test harnesses, benchmarks, feedback loops, over-compliance detection |
| Level | Name | Description |
|---|---|---|
| L0 | Absent | No structure. Reactive. Fragile. |
| L1 | Initial | Intent exists but isn't operational. |
| L2 | Developing | Partial structure. Edge cases break. |
| L3 | Defined | Repeatable. Measurable. Auditable. |
| L4 | Managed | Proactive. Risk-calibrated. Stress-tested. |
| L5 | Optimizing | Self-correcting. Certified. Continuously verified. |
🔴 Assurance Lab (Built-in Red Team)
AMC doesn't just score — it attacks. 74 deterministic attack packs including:
- injection — Prompt override and system-message tampering
- exfiltration — Secret and PII leakage controls
- toolMisuse — Denied tools, model, and budget boundaries
- truthfulness — Evidence-bound claim discipline
- sycophancy — Does the agent agree with wrong statements to please you?
- self-preservation — Does the agent resist shutdown or modification?
- sabotage — Does the agent subtly undermine goals when conflicted?
- adversarial-robustness — TAP/PAIR, Crescendo, Skeleton Key attacks
- context-leakage — EchoLeak, cross-session data bleed
- operational-discipline — Supply chain integrity, MCP poisoning
- agent-as-proxy — Indirect prompt injection via agent delegation chains
- economic-amplification — Cost explosion and resource exhaustion attacks
- mcp-security — MCP server poisoning, tool schema manipulation
- zombie-persistence — Agents that survive termination or persist unauthorized
- over-compliance — H-Neurons-inspired detection of agents that exceed instructions (arXiv:2512.01797)
amc assurance run --scope full --agent my-agent🔬 74 Scoring Modules
Beyond the core diagnostic, AMC includes research-backed scoring:
- Calibration gap (confidence vs reality)
- Evidence conflict detection
- Evidence density mapping (blind spot detection)
- Gaming resistance (adversarial score inflation)
- Sleeper agent detection (context-dependent behavior)
- Audit depth (black-box, white-box, outside-the-box)
- Policy consistency (pass^k reliability)
- Task horizon (METR-inspired)
- Factuality (parametric, retrieval, grounded)
- Autonomy duration with domain risk profiles
- Pause quality (agent-initiated stops)
- Memory integrity & poisoning resistance
- Alignment index (safety × honesty × helpfulness)
- Interpretability scoring
- Output attestation (cryptographic signing)
- Mutual verification (agent-to-agent trust)
- Network transparency log (Merkle tree)
- Over-compliance detection (H-Neurons, arXiv:2512.01797)
- Agent Guide system (guardrails, agent instructions, CI gates)
- EU AI Act compliance, OWASP LLM Top 10
- Trust-authorization synchronization (arXiv:2512.06914)
- Monitor bypass resistance (arXiv:2503.09950)
- Adaptive access control (arXiv:2504.12345)
- Memory security architecture (arXiv:2503.10632)
- Agent protocol security (MCP/A2A hardening)
- And more...
📋 Compliance
| Framework | Status |
|---|---|
| EU AI Act | 12 article mappings, audit binder generation |
| ISO 42001 | Clauses 4-10 mapped to AMC dimensions |
| NIST AI RMF | Risk management framework alignment |
| SOC 2 | Trust service criteria mapping |
| OWASP LLM Top 10 | Full coverage (10/10) |
amc audit binder create --framework eu-ai-act📚 Documentation
- Getting Started — Install → first score → L5
- Quickstart Guide
- Agent Guide System — Guardrails, auto-detect, CI gates
- Sector Packs — 40 industry-specific assessment packs
- Solo User Guide
- CLI Reference
- Architecture Map
- Questions In Depth
- Assurance Lab
- Security
- EU AI Act Compliance
- Multi-Agent Trust
- Chain Architecture
- White Paper
# npm (recommended)
npm i -g agent-maturity-compass
# From source
git clone https://github.com/thewisecrab/AgentMaturityCompass.git
cd AgentMaturityCompass && npm ci && npm run build && npm link
# Docker
docker run -p 3212:3212 -p 3210:3210 amc/studioAMC is MIT licensed and open source. Contributions welcome.
- Fork → branch →
npm test→ PR
MIT — public infrastructure for the age of AI agents.
As autonomous agents become the primary interface between humans and technology, trust infrastructure must be open, verifiable, and accessible to everyone. AMC exists to make that real.