🧭 Agent Maturity Compass (AMC)

The credit score for AI agents.

🧭 AMC Score: 3.7 / 5.0 — Defined
   Strategic Agent Ops ····· 3.2  (17 questions)
   Skills ·················· 4.1  (35 questions)
   Resilience ·············· 3.8  (30 questions)
   Leadership & Autonomy ··· 3.5  (21 questions)
   Culture & Alignment ····· 3.9  (23 questions)
   Evidence: ✓ Merkle root 9c4e…a7f0 (Ed25519)

The 84-Point Lie

Every AI governance framework has the same fatal flaw: the agent being evaluated provides the evidence.

Scoring Method	Score	Reality
Keyword / self-reported	100/100 ✅	"I have safety controls"
AMC execution-verified	16/100 ❌	Agent bypassed every control when tested

That's an 84-point documentation inflation gap. AMC closes it with cryptographic evidence chains that can't be faked.

Get Started (2 minutes)

npm i -g agent-maturity-compass
mkdir my-agent && cd my-agent
amc init

That's it. amc init walks you through your first score interactively.

Share your score — paste a badge into your README:

amc quickscore --share   # markdown summary + next-level action plan
amc badge                # ![AMC L3](https://img.shields.io/badge/AMC-L3%20Defined-blue)

📖 Full guide: install → first score → L5

🌐 Website · 📖 Docs · 💬 Discussions

How It Works

Agent (untrusted) → AMC Gateway (trusted observer) → Evidence Ledger (signed, hash-chained)
                                                              ↓
                                                Scoring Engine (138 questions, 5 dimensions)
                                                              ↓
                                               AMC Studio (dashboard + API)

Not all evidence is equal — AMC weights by trust tier:

Tier	Weight	Source
OBSERVED_HARDENED	1.1×	AMC-controlled traces with stronger context
OBSERVED	1.0×	Directly observed via AMC gateway
ATTESTED	0.8×	Cryptographic attestation via vault/notary
SELF_REPORTED	0.4×	Agent claims — capped, cannot inflate maturity

Works With Any Agent

amc wrap claude -- claude "analyze this"           # Claude
amc wrap gemini -- gemini chat                     # Gemini
amc adapters run --adapter generic-cli -- python bot.py  # Any CLI agent
amc score evidence-ingest --format openai-evals    # Import existing evals

14 framework adapters: LangChain, LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, LlamaIndex, Semantic Kernel, Claude Code, Gemini, OpenClaw, OpenHands, and more.

Agent Guide — Guardrails From Your Score

AMC doesn't just score — it generates operational guardrails and applies them directly to your agent's config file.

# One command: detect framework → generate guardrails → apply to config
amc guide --go

10 frameworks auto-detected from project files (pyproject.toml, package.json, config files)
15 config targets — AGENTS.md, CLAUDE.md, .cursorrules, .kiro/steering, .gemini/style.md, and more
Severity-tagged — 🔴 Critical, 🟡 High, 🔵 Medium — so you know what to fix first
Idempotent — re-running --apply updates only the guardrails section (AMC-GUARDRAILS markers)
CI gate — amc guide --ci --target 3 exits non-zero if below threshold
Compliance — amc guide --compliance EU_AI_ACT maps gaps to regulatory obligations (EU AI Act, ISO 42001, NIST AI RMF, SOC 2, ISO 27001)

amc guide --status              # One-line health check
amc guide --interactive         # Cherry-pick which gaps to fix
amc guide --watch --apply       # Continuous monitoring + auto-update
amc guide --diff                # What improved since last run

📖 Full guide system docs

Sector Packs — Enterprise-Grade Vertical Assessment

AMC ships with 40 industry-specific assessment packs covering regulated sectors, critical infrastructure, and public institutions. Each pack adds precise, sub-vertical questions on top of the base AMC rubric.

amc sector packs list              # 40 packs across 7 stations
amc sector score --pack digital-health-record --agent my-agent
amc sector gaps --pack clinical-trials --agent my-agent
amc sector report --pack drug-discovery --output reports/drug.md

Station	Packs	Focus
🌿 Environment	6	Farm-to-fork, textiles, manufacturing, energy, water
🏥 Health	9	EHR, clinical trials, drug discovery, precision medicine
💰 Wealth	5	Payments, financial inclusion, DeFi, circular economy
🎓 Education	5	K-12, higher ed, skills training, accessibility
🚇 Mobility	5	Smart cities, ports, real estate, cloud infra, privacy
💡 Technology	5	AI intelligence, ecosystems, infotainment, IP partnerships
🏛️ Governance	5	Digital identity, elections, legislation, citizen services

382 questions with specific regulatory article references (HIPAA §164.312(a)(1), EU AI Act Art. 5(1)(a), FERPA 20 U.S.C. §1232g, UNECE WP.29 R155 §7, UNCAC Art. 7). Every pack includes riskTier, EU AI Act classification, SDG alignment, certification path, and key risks.

📖 Full Sector Packs docs

📊 The Platform (8 modules)

Module	What It Does
AMC Score	138 diagnostic questions, 5 dimensions, L0–L5 maturity, evidence-weighted
AMC Shield	74 attack packs: injection, exfiltration, sycophancy, sabotage, over-compliance, and more
AMC Enforce	Governor engine with policy packs, approval workflows, scoped leases
AMC Vault	Ed25519 key vault, Merkle-tree evidence chains, HSM/TPM support
AMC Watch	Studio dashboard, gateway proxy, Prometheus metrics, cost tracking
AMC Fleet	Multi-agent trust composition, delegation graphs, contradiction detection
AMC Passport	Portable agent credential (.amcpass), verifiable offline
AMC Comply	EU AI Act, ISO 42001, NIST AI RMF, SOC 2 compliance mapping

📐 5 Dimensions, 126 Questions, 6 Maturity Levels

Dimension	Questions	Focus
Strategic Agent Operations	17	Mission clarity, scope adherence, decision traceability
Skills	35	Tool mastery, injection defense, DLP, zero-trust
Resilience	30	Graceful degradation, circuit breakers, monitor bypass resistance
Leadership & Autonomy	21	Structured logs, traces, cost tracking, SLO monitoring
Culture & Alignment	23	Test harnesses, benchmarks, feedback loops, over-compliance detection

Level	Name	Description
L0	Absent	No structure. Reactive. Fragile.
L1	Initial	Intent exists but isn't operational.
L2	Developing	Partial structure. Edge cases break.
L3	Defined	Repeatable. Measurable. Auditable.
L4	Managed	Proactive. Risk-calibrated. Stress-tested.
L5	Optimizing	Self-correcting. Certified. Continuously verified.

🔴 Assurance Lab (Built-in Red Team)

AMC doesn't just score — it attacks. 74 deterministic attack packs including:

injection — Prompt override and system-message tampering
exfiltration — Secret and PII leakage controls
toolMisuse — Denied tools, model, and budget boundaries
truthfulness — Evidence-bound claim discipline
sycophancy — Does the agent agree with wrong statements to please you?
self-preservation — Does the agent resist shutdown or modification?
sabotage — Does the agent subtly undermine goals when conflicted?
adversarial-robustness — TAP/PAIR, Crescendo, Skeleton Key attacks
context-leakage — EchoLeak, cross-session data bleed
operational-discipline — Supply chain integrity, MCP poisoning
agent-as-proxy — Indirect prompt injection via agent delegation chains
economic-amplification — Cost explosion and resource exhaustion attacks
mcp-security — MCP server poisoning, tool schema manipulation
zombie-persistence — Agents that survive termination or persist unauthorized
over-compliance — H-Neurons-inspired detection of agents that exceed instructions (arXiv:2512.01797)

amc assurance run --scope full --agent my-agent

🔬 74 Scoring Modules

Beyond the core diagnostic, AMC includes research-backed scoring:

Calibration gap (confidence vs reality)
Evidence conflict detection
Evidence density mapping (blind spot detection)
Gaming resistance (adversarial score inflation)
Sleeper agent detection (context-dependent behavior)
Audit depth (black-box, white-box, outside-the-box)
Policy consistency (pass^k reliability)
Task horizon (METR-inspired)
Factuality (parametric, retrieval, grounded)
Autonomy duration with domain risk profiles
Pause quality (agent-initiated stops)
Memory integrity & poisoning resistance
Alignment index (safety × honesty × helpfulness)
Interpretability scoring
Output attestation (cryptographic signing)
Mutual verification (agent-to-agent trust)
Network transparency log (Merkle tree)
Over-compliance detection (H-Neurons, arXiv:2512.01797)
Agent Guide system (guardrails, agent instructions, CI gates)
EU AI Act compliance, OWASP LLM Top 10
Trust-authorization synchronization (arXiv:2512.06914)
Monitor bypass resistance (arXiv:2503.09950)
Adaptive access control (arXiv:2504.12345)
Memory security architecture (arXiv:2503.10632)
Agent protocol security (MCP/A2A hardening)
And more...

📋 Compliance

Framework	Status
EU AI Act	12 article mappings, audit binder generation
ISO 42001	Clauses 4-10 mapped to AMC dimensions
NIST AI RMF	Risk management framework alignment
SOC 2	Trust service criteria mapping
OWASP LLM Top 10	Full coverage (10/10)

amc audit binder create --framework eu-ai-act

📚 Documentation

Getting Started — Install → first score → L5
Quickstart Guide
Agent Guide System — Guardrails, auto-detect, CI gates
Sector Packs — 40 industry-specific assessment packs
Solo User Guide
CLI Reference
Architecture Map
Questions In Depth
Assurance Lab
Security
EU AI Act Compliance
Multi-Agent Trust
Chain Architecture
White Paper

Install Options

# npm (recommended)
npm i -g agent-maturity-compass

# From source
git clone https://github.com/thewisecrab/AgentMaturityCompass.git
cd AgentMaturityCompass && npm ci && npm run build && npm link

# Docker
docker run -p 3212:3212 -p 3210:3210 amc/studio

Contributing

AMC is MIT licensed and open source. Contributions welcome.

Fork → branch → npm test → PR

License

MIT — public infrastructure for the age of AI agents.

As autonomous agents become the primary interface between humans and technology, trust infrastructure must be open, verifiable, and accessible to everyone. AMC exists to make that real.

Name		Name	Last commit message	Last commit date
Latest commit History 331 Commits
.amc		.amc
.changeset		.changeset
.github/workflows		.github/workflows
Formula		Formula
deploy		deploy
docker		docker
docs		docs
examples		examples
platform		platform
research		research
scripts		scripts
src		src
tests		tests
tmp		tmp
website		website
whitepaper		whitepaper
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vercel.json		vercel.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧭 Agent Maturity Compass (AMC)

The 84-Point Lie

Get Started (2 minutes)

How It Works

Works With Any Agent

Agent Guide — Guardrails From Your Score

Sector Packs — Enterprise-Grade Vertical Assessment

Install Options

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

thewisecrab/AgentMaturityCompass

Folders and files

Latest commit

History

Repository files navigation

🧭 Agent Maturity Compass (AMC)

The 84-Point Lie

Get Started (2 minutes)

How It Works

Works With Any Agent

Agent Guide — Guardrails From Your Score

Sector Packs — Enterprise-Grade Vertical Assessment

Install Options

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages