diff --git a/README.md b/README.md index 2b9345819a..fd5bec5801 100644 --- a/README.md +++ b/README.md @@ -73,6 +73,7 @@ Bankr Skills equip builders with plug-and-play tools to build more powerful agen | [Zyfai](https://zyf.ai) | [zyfai](zyfai/) | Earn yield on any Ethereum wallet on Base, Arbitrum, and Plasma. Deploys a non-custodial Safe subaccount linked to the user's EOA with automated rebalancing across DeFi protocols. Session keys for gasless automation. | | [Aeon](https://github.com/aaronjmars/aeon) |
aeon suite (24 skills)
| Specialized agent skill suite: research, monitors, market/token picks, regulatory & deal-flow tracking, plus meta skills that scan, eval, repair, and evolve other installed skills. Each sub-skill installs independently from its own folder (expand the Skill column for the full list). | | [Starchild](https://starchild.software) | [starchild-dao](starchild-dao/) | Hold-to-govern for the open-source Starchild companion's $STARCHILD token on Base. List proposals, check your weight, and cast gasless for/against EIP-712 votes — propose by holding 10M (no staking, no locking; weight = your live balance). Public by design; the private app never touches it. | +| [Polygraph](https://polygraph.so) | [polygraph](polygraph/) | Behavioral trust grades (A–F) for MCP servers and Agent Skills. Check a server before your agent uses it (`npx polygraphso check `), grade your own with the open litmus harness (tool-output injection, permission/egress, data-leak, and adversarial-input probes), verify the onchain EAS attestation on Base before executing, and gate CI on grades with the GitHub Action `polygraphso/litmus@v1`. Reproducible by design — anyone can re-run the harness and disprove a bad grade. | ## Adding a Skill diff --git a/polygraph/SKILL.md b/polygraph/SKILL.md new file mode 100644 index 0000000000..4eaeb2d12d --- /dev/null +++ b/polygraph/SKILL.md @@ -0,0 +1,225 @@ +--- +name: polygraph +description: Behavioral trust grades (A–F) for MCP servers. Use when an agent needs to check whether an MCP server is safe before using it, verify an onchain attestation before trusting or paying a server, look up a server's published grade, get a project graded, or understand why a server received a grade. Polygraph connects to an MCP server the way an agent would, fingerprints its exact tool surface, and runs behavioral probes — prompt-injection (C-01), permission/egress overreach (C-02), sensitive-data leak (C-03), and adversarial-input handling (C-04) — then publishes a reproducible grade as an onchain EAS attestation on Base. Triggers on mentions of MCP server safety, is this MCP server safe, tool poisoning, prompt injection, data leak, permission overreach, unexpected egress, trust grade, attestation, verify before paying, polygraph, litmus, grade my MCP server, adversarial input, robustness, crash, jailbreak, CI gate, fail the build, GitHub Action, gate my skill. +emoji: 🧪 +tags: [security, mcp, trust, grade, attestation, base, prompt-injection, agent-safety] +visibility: public +--- + +# Polygraph: Behavioral Trust Grades for MCP Servers + +Agents wire up third-party MCP servers and then trust whatever those servers' tools +return. Polygraph tests an MCP server's **behavior** before your agent does, and assigns a +letter grade **A–F** backed by reproducible evidence. + +A passing grade is a **measurement, not a guarantee** — it says "this exact tool surface +did not misbehave under these probes," and because the harness is open and deterministic, +anyone can re-run it and disprove a bad grade. That falsifiability is the whole point. + +- **Home / methodology:** [polygraph.so](https://polygraph.so) +- **Lookup CLI (npm):** `polygraphso` +- **Grading harness:** `@polygraphso/litmus` (open source) + +--- + +## What a grade measures + +Polygraph connects to a server the way an agent would — **stdio** for local packages, +**Streamable HTTP** for remote URLs — fingerprints its exact tool surface +(`tools/list` → canonical JSON → sha256 → `bytes32`), then runs four probe categories: + +- **C-01 — Tool-output injection.** Does the server try to hijack the agent? Static scan of + tool names/descriptions/schemas for injection-shaped content (invisible unicode, + instruction mimicry, markdown tricks) **plus** dynamic bait calls that check whether tool + outputs smuggle in instructions. +- **C-02 — Permission / egress overreach.** Does the server do more than it claims? Flags + tools that declare `readOnlyHint: true` but carry destructive verbs, and runs the server in + a hardened **default-deny Docker sandbox** where any outbound network attempt is a finding. +- **C-03 — Sensitive-data handling.** Does the server leak secrets? Plants canary values in + the environment and working directory, exercises the tools, and scans both tool outputs and + egress for any canary that surfaces. +- **C-04 — Adversarial-input handling.** Does the server stay robust under hostile input? Runs + two probes on non-state-changing tools, with no Docker required: stress-tests each tool with + malformed and oversized inputs (fails if the server crashes, hangs, or leaks an uncaught + stack trace — a clean validation error or benign result passes); and feeds jailbreak-pattern + strings and scans the server's **outputs** with the C-01 injection scanners, failing only if + the server emits injection-shaped content it did not merely reflect from the input (a verbatim + echo is excluded). A C-04 failure caps the overall grade at D. + +### Grade scale + +| Grade | Meaning | +|-------|---------| +| **A** | Passed all four categories. No injection, no unexpected egress, no data leak, no adversarial-input failure. | +| **B** | Injection and data-leak checks passed; **egress was not verified.** The ceiling for any run without a local Docker sandbox — including every remote (HTTP) server, which can't be sandboxed. | +| **D** | Unexpected egress / permission overreach (C-02) **or** an adversarial-input robustness failure (C-04: crash, internals-leak, or amplification). No injection or leak → capped at D. | +| **F** | Disqualifying: active tool-output injection (C-01) or a sensitive-data leak (C-03) — a server that would harm an agent that trusts it. | + +**Reading a B.** Under the current methodology, egress can only be observed by running the +server in a local default-deny sandbox — so a **remote MCP server caps at B** no matter how +clean it is. A remote B is a limit of the *measurement*, not a mark against the server; don't +read it as "worse than" a local A, because the two aren't directly comparable. (Grades **C** +and **E** are not assigned today; **C** is reserved.) + +Every grade ships with a plain-English **rationale** — never a bare letter. See +[`references/methodology.md`](references/methodology.md) for the full decision logic and each +probe in depth. + +--- + +## Check a grade + +A sub-second lookup against published grades — **one command before your agent installs +anything:** + +```bash +$ npx polygraphso check npm/@modelcontextprotocol/server-filesystem +→ polygraph: A · litmus-v9 · 2026-06-24 +→ details → polygraph.so/#checks +``` + +Grades are **live** and span the full range. Browse the current graded set with +`polygraphso list`, or at [polygraph.so](https://polygraph.so). A grade is **point-in-time +evidence** — treat your own run, or the live attestation, as the source of truth rather than +any letter copied into a doc. + +Refs are **registry-prefixed** — the prefix disambiguates (`redis` exists on npm, PyPI, and +GitHub with different content): `npm/…`, `pypi/…`, `github/…`. A tracked-but-ungraded server +reports `not available yet` with a notify link. Full CLI reference: +[`references/cli.md`](references/cli.md). + +--- + +## Verify before you trust (Bankr integration) + +The highest-value use at runtime: **gate an MCP server through its grade before your agent +uses it, pays it, or routes a transaction through it.** Polygraph is the *verify* step; Bankr +is the *execute* step. Two checks, both required: + +1. **Grade meets your bar.** Default: accept A/B, refuse D/F. (A remote server's ceiling is B — + see "Reading a B" above, and don't penalize it for that.) +2. **Fingerprint still matches.** An attestation is only valid for the exact tool surface it + graded. Recompute the server's **live** tool-surface fingerprint and require it to equal the + attested one before acting — a built-in rug-pull check against a graded-then-swapped server. + +Drop the `verify_attestation` MCP tool in front of execution, or use the `gateDecision` helper. + +> **Carry this into the decision:** a grade is a *measurement, not a guarantee.* A server that +> detects the test context could behave during grading and misbehave in production — **evasion** +> is the disclosed residual limit. Keep Bankr's own transaction-verification guards on, even +> for an A. + +Full patterns, the MCP server config, and a worked "verify-then-execute" example: +[`references/bankr-integration.md`](references/bankr-integration.md). + +--- + +## ★ Get your project graded + +**Run the open harness on your own MCP server, get an A–F grade plus a reproducible evidence +bundle, and publish it onchain so agents can verify it:** + +```bash +# Grade your server end-to-end (npm ref, https URL, or local path) +npx -y -p @polygraphso/litmus polygraphso-litmus litmus npm/@your-scope/your-mcp-server +``` + +You get the grade, the per-category verdicts, your tool-surface fingerprint, and a +content-addressed evidence bundle. Publishing that grade as an **onchain EAS attestation on +Base** (so other agents can look it up and verify it) is a one-step hand-off — see +[`references/methodology.md`](references/methodology.md#publishing-a-grade). + +Prefer not to run it yourself? Request a grade or get notified when yours publishes at +**[polygraph.so](https://polygraph.so)**. + +> **One line for builders:** check any MCP server before your agent uses it with +> `npx polygraphso check `, and get your own server graded at +> [polygraph.so](https://polygraph.so). + +--- + +## Run the harness locally + +The harness is the same open, deterministic engine that produces published grades: + +```bash +npm i -g @polygraphso/litmus # or use npx, above +polygraphso-litmus litmus npm/@modelcontextprotocol/server-filesystem +polygraphso-litmus litmus https://example.com/mcp --bearer "$TOKEN" +polygraphso-litmus litmus ./path/to/local-mcp-server --json +``` + +- **Node ≥ 18.** **Docker is optional** but recommended — without it the egress probe (C-02) + is skipped and the grade is **capped at B** (as is any remote/HTTP target, which can't be + sandboxed). +- **Exit codes are CI-friendly:** non-zero on a failing grade (D/F), zero on A/B — drop it into + a pipeline to gate dependencies. + +Flags, env vars, `--json` output, and the `check` / `list` subcommands are all in +[`references/cli.md`](references/cli.md). + +--- + +## Gate your CI on grades + +Turn the grade into a build check: the **polygraph CI gate** fails a build when an MCP server or an +Agent Skill grades D/F. Add the GitHub Action to a repo — + +```yaml +- uses: polygraphso/litmus@v1 + with: + servers: | + npm/@modelcontextprotocol/server-filesystem + skills: | + ./my-skill +``` + +— or run it anywhere with `npx @polygraphso/litmus ci`. It auto-discovers MCP servers +(`.mcp.json` / `.vscode` / `.cursor`) and skills (`SKILL.md` dirs), grades each, and fails on D/F; +un-gradeable targets warn unless `strict`. Full setup, inputs, and the run-anywhere command: +[`references/ci-gate.md`](references/ci-gate.md). + +--- + +## Why a server got grade X + +Every run prints the methodology, the per-category verdict, the tool-surface fingerprint, and +the grade with a one-paragraph rationale: + +``` +→ litmus · npm/@modelcontextprotocol/server-filesystem +→ version 0.1.0 +→ C-01 pass · C-02 pass · C-03 pass · C-04 pass +→ fingerprint 0x1a2b3c4d…5e6f7890 +→ grade: A + All four categories passed. No injection, no unexpected egress, no data leak. +``` + +On a failure the report surfaces the top HIGH-severity findings (tool name, finding kind, the +offending snippet). [`references/methodology.md`](references/methodology.md) maps every grade +and finding kind to its cause. + +--- + +## How much to trust the grade (honest limits) + +- **Reproducibility is the trust anchor.** The harness is open source and deterministic, so a + false grade is falsifiable — anyone can re-run it against the same server and the result + must match. +- **A self-published grade is forgeable** by whoever signs it; that's why reproducibility (not + the signature) is what makes a grade trustworthy, and why the fingerprint recheck guards + against a graded-then-swapped server. +- **Evasion is the residual limit:** a server that detects the test context could behave during + grading and misbehave in production. This is disclosed, not hidden. +- Stronger, independent guarantees (staked bonds, TEE-backed runs, independent re-grading) are + on the roadmap, not claimed today. + +--- + +## Resources + +- **Home + methodology:** https://polygraph.so +- **Lookup CLI:** `npx polygraphso check //` · https://www.npmjs.com/package/polygraphso +- **Grading harness:** `@polygraphso/litmus` (open source — see polygraph.so for the repo) +- **Onchain proof:** EAS attestations on Base +- **References:** [`methodology.md`](references/methodology.md) · [`cli.md`](references/cli.md) · [`bankr-integration.md`](references/bankr-integration.md) · [`ci-gate.md`](references/ci-gate.md) diff --git a/polygraph/catalog.json b/polygraph/catalog.json new file mode 100644 index 0000000000..87d54654de --- /dev/null +++ b/polygraph/catalog.json @@ -0,0 +1,24 @@ +{ + "schemaVersion": 1, + "slug": "polygraph", + "provider": "Polygraph", + "providerUrl": "https://polygraph.so", + "logo": "logo.svg", + "demo": { + "title": "demo.sh", + "language": "bash", + "code": "# Check a server's published grade before your agent uses it\nnpx polygraphso check npm/@modelcontextprotocol/server-filesystem\n# → polygraph: A · litmus-v9\n\n# Grade your own MCP server end-to-end (A–F + reproducible evidence)\nnpx -y -p @polygraphso/litmus polygraphso-litmus litmus npm/@your-scope/your-mcp-server\n\n# Gate CI: fail the build on a D/F MCP server or skill\nnpx @polygraphso/litmus ci" + }, + "setup": [ + "Install: `install the polygraph skill from https://github.com/BankrBot/skills/tree/main/polygraph`", + "Check any MCP server before your agent uses it: `npx polygraphso check //`", + "Verify before you trust: require the live tool-surface fingerprint to match the attestation before executing", + "Gate CI on grades with the GitHub Action `polygraphso/litmus@v1`, or run `npx @polygraphso/litmus ci` in any pipeline", + "Source: https://github.com/BankrBot/skills/tree/main/polygraph" + ], + "install": { + "type": "bankr", + "repoPath": "polygraph", + "command": "install the polygraph skill from https://github.com/BankrBot/skills/tree/main/polygraph" + } +} diff --git a/polygraph/logo.svg b/polygraph/logo.svg new file mode 100644 index 0000000000..94761cf72e --- /dev/null +++ b/polygraph/logo.svg @@ -0,0 +1,33 @@ + + + + + + + + + + + + + + diff --git a/polygraph/references/bankr-integration.md b/polygraph/references/bankr-integration.md new file mode 100644 index 0000000000..571a75719e --- /dev/null +++ b/polygraph/references/bankr-integration.md @@ -0,0 +1,117 @@ +# Polygraph + Bankr Integration Guide + +## Overview + +Polygraph is the **verify** layer; Bankr is the **execute** layer. Before a Bankr agent adds +an MCP server as a tool, routes a payment through it, or trusts its output, gate it through +its polygraph grade. Untrusted tool surfaces are exactly how an agent gets prompt-injected or +made to leak a key — polygraph turns "should I trust this server?" into a checkable fact. + +``` +┌──────────────────────────────────────────────────────────────┐ +│ Your Agent │ +├──────────────────────────────────────────────────────────────┤ +│ ┌─────────────┐ ┌─────────────┐ │ +│ │ Polygraph │ Verify │ Bankr │ Execute │ +│ │ Skill │ ───────────────▶│ Skill │ ───────────▶ │ +│ └─────────────┘ └─────────────┘ │ +│ │ │ │ +│ ▼ ▼ │ +│ • Look up grade (A–F) • Swaps / transfers │ +│ • Verify onchain attestation • Stop-loss / DCA │ +│ • Recompute live fingerprint • Token launches │ +│ • gate: pay / refuse • Any signed action │ +└──────────────────────────────────────────────────────────────┘ +``` + +## The core rule: fingerprint must match + +A grade is only valid for the exact tool surface it was measured against. An attestation binds +the grade to a `toolDefsFingerprint`. **Before trusting a server, recompute its live +fingerprint and require it to equal the attested one.** If they differ, the server changed +after it was graded — treat it as ungraded and refuse. This is the built-in rug-pull check. + +`A` and `B` are usable grades; `D` and `F` are refusals by default (`D` = unexpected egress, +`F` = injection or leak). Pick your own threshold, but never skip the fingerprint check. + +## Use cases + +### 1. Gate a new MCP tool before your agent adds it + +```bash +REF="npm/@some-vendor/their-mcp-server" + +# Run the harness (or use `polygraphso check $REF` for a published grade) +GRADE=$(npx -y -p @polygraphso/litmus polygraphso-litmus litmus "$REF" --json | jq -r '.grade') + +case "$GRADE" in + A|B) echo "✓ $REF graded $GRADE — safe to wire up" ;; + *) echo "✗ $REF graded $GRADE — do NOT add as a tool"; exit 1 ;; +esac +``` + +`litmus` exits non-zero on D/F, so in CI you can also just let the exit code gate the step. + +### 2. Verify-then-execute (the agent gate) + +```ts +import { readAttestation, liveFingerprint, gateDecision } from "@polygraphso/litmus"; + +async function safeToUse(serverRef: string): Promise { + const attestation = await readAttestation(serverRef); // onchain EAS record on Base + if (!attestation || attestation.revoked) return false; + + const live = await liveFingerprint(serverRef); // recompute current tool surface + const decision = gateDecision(attestation, live); // checks grade + fingerprint match + return decision.action === "pay"; +} + +// Only let Bankr act once the upstream tool is verified +if (await safeToUse("npm/@vendor/price-oracle-mcp")) { + await bankr("swap $100 USDC to ETH on base"); +} else { + console.warn("Upstream MCP server failed polygraph gate — refusing to execute."); +} +``` + +### 3. Inline MCP verification + +With the polygraph MCP server configured, the agent can verify before it acts: + +``` +verify_attestation { "serverRef": "npm/@vendor/price-oracle-mcp" } +→ { status: "attested", grade: "A", attestationUid: "0x…", toolDefsFingerprint: "0x…", revoked: false, network: "base" } +``` + +Then recompute the live fingerprint and only proceed if it equals `toolDefsFingerprint`. + +## MCP configuration (Polygraph + Bankr) + +```json +{ + "mcpServers": { + "polygraph": { + "command": "npx", + "args": ["-y", "-p", "@polygraphso/litmus", "polygraphso-litmus-mcp"], + "env": { "POLYGRAPH_API_URL": "https://polygraph.so" } + }, + "bankr": { + "command": "npx", + "args": ["bankr-mcp-server"], + "env": { "BANKR_API_KEY": "bk_..." } + } + } +} +``` + +## Best practices + +1. **Verify before you execute.** Check the grade *and* the fingerprint before letting Bankr + sign or pay through any server-derived data. +2. **Never trust a grade without the fingerprint match** — a graded-then-swapped server is the + obvious attack. +3. **Pick a threshold and enforce it.** Default: accept A/B, refuse D/F; decide C-as-reserved + per your risk tolerance. +4. **Re-verify on change.** Cache by fingerprint; if the live fingerprint changes, re-gate. +5. **Treat a pass as a measurement, not a guarantee.** It bounds risk; it does not remove it. + Keep Bankr's own transaction-verification guards on. diff --git a/polygraph/references/ci-gate.md b/polygraph/references/ci-gate.md new file mode 100644 index 0000000000..a56a844923 --- /dev/null +++ b/polygraph/references/ci-gate.md @@ -0,0 +1,139 @@ +# Polygraph CI gate (GitHub Action) + +Polygraph grades MCP servers and Agent Skills; the **CI gate** turns that grade into a build +check. Add it to a repo and the build **fails when an MCP server or a Skill it ships grades D/F** — +the same falsifiable grade described in [`../SKILL.md`](../SKILL.md), enforced on every pull request. + +It wraps the open `@polygraphso/litmus` harness, so the gate is **reproducible**: anyone can re-run +it and the verdict must match. A grade is a *measurement, not a guarantee* — the gate catches a +target that misbehaves under the probes, not one that evades them. + +--- + +## Add it to a repo + +```yaml +# .github/workflows/mcp-gate.yml +name: mcp-gate +on: [pull_request] +permissions: + contents: read +jobs: + gate: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: polygraphso/litmus@v1 + with: + # Auto-discovers MCP servers (.mcp.json / .vscode/mcp.json / .cursor/mcp.json) + # and skills (SKILL.md dirs). Or list them explicitly: + servers: | + npm/@modelcontextprotocol/server-filesystem + skills: | + ./my-skill + # min-grade: B # stricter than the default D/F gate + # strict: "true" # also fail on targets that can't be graded +``` + +That is the whole setup. On each PR the action grades every MCP server **and** every skill, and +fails the job on any **D** or **F**. + +--- + +## How the gate decides + +**MCP servers** — for each, in order: + +1. **Published-grade lookup** — a sub-second check for an existing polygraph grade (the same data + as `npx polygraphso check`). If one exists, it is used directly. +2. **Behavioral run** — if the server is not graded yet, the action runs the open harness in CI. + GitHub runners provide Docker, so the egress probe is exercised for local/npm servers (no B cap), + and the server is graded fresh. + +**Agent Skills** — each `SKILL.md` bundle is graded by the **static** skill grader +(`runSkillLitmus`): a scan of its bytes, no execution, no Docker, no network. Fast and deterministic. + +**Un-gradeable** — a target that can't be reached (a credential-gated server) or whose launch +command can't be mapped to a ref is reported and **warns** (it does not fail the build) unless you +set `strict: true`. + +Gate result (servers and skills share one gate and one exit code): + +| Outcome | Build | +|---|---| +| Every target grades **A / B** (or ≥ `min-grade`) | passes (exit 0) | +| Any target grades **D / F** (or below `min-grade`) | **fails** (exit 1) | +| A target cannot be graded | warns + passes, unless `strict: true` | + +A **remote (HTTP) server caps at B** and passes — that is a limit of the measurement, not a mark +against the server (see "Reading a B" in [`../SKILL.md`](../SKILL.md)). + +--- + +## Inputs + +| Input | Default | Description | +|---|---|---| +| `servers` | — | Explicit MCP refs (newline- or comma-separated). Merged with auto-discovery. | +| `skills` | — | Explicit skill directories (newline- or comma-separated). Merged with auto-discovery. | +| `discover` | `true` | Discover MCP servers from config files and skills from `SKILL.md`. | +| `min-grade` | — | Minimum acceptable grade (`A`–`D`). Default gates on D/F. | +| `strict` | `false` | Treat un-gradeable targets as failures, not warnings. | +| `working-directory` | `.` | Directory scanned for MCP config files and `SKILL.md` bundles. | +| `version` | pinned | `@polygraphso/litmus` version to run. | +| `bearer` | — | Token passed through to a gated remote (HTTPS) server. | + +Outputs: `result` (`pass` / `fail`), `failed` (count), and `report` (a JSON array of per-target +results, each with its `kind` of `server` or `skill`) — read them from a later step via +`steps..outputs.*`. + +--- + +## Discovery + +The action reads the standard MCP config files and maps each server's launch command to a +registry-prefixed ref, and walks the repo for `SKILL.md` bundles: + +| Target | Discovered as | +|---|---| +| `{ "command": "npx", "args": ["-y", "@scope/srv"] }` | server `npm/@scope/srv` | +| `{ "command": "uvx", "args": ["srv-mcp"] }` | server `pypi/srv-mcp` | +| `{ "url": "https://example.com/mcp" }` | server — the HTTPS endpoint (remote) | +| a directory containing `SKILL.md` | skill — that directory | +| a bare binary / local script | reported as **un-gradeable** (never silently skipped) | + +`node_modules`/`.git`/`dist`/etc. are pruned from the skill walk, and anything that can't be mapped +is surfaced rather than dropped — so coverage stays honest. + +--- + +## Run it anywhere (not just GitHub) + +The gate is a plain command in the harness, so it also works in any other CI or as a pre-commit +check: + +```bash +# Gate the MCP servers and skills discovered in this repo: +npx @polygraphso/litmus ci + +# Or name targets, fail below B, treat un-gradeable as a failure: +npx @polygraphso/litmus ci --server npm/@scope/your-mcp --skill ./your-skill --min-grade B --strict +``` + +It exits non-zero on a gated target, so any pipeline can use it. `--json` emits the full per-target +report; `--no-discover` and `--no-lookup` narrow what it does. + +--- + +## Honest limits (carry these into your pipeline) + +- **Reproducibility is the trust anchor.** The harness is open and deterministic, so the gate's + verdict is falsifiable — not a black box. +- A passing gate means *these targets did not misbehave under these probes* — **not** that they are + safe in every situation. A skill grade is a **static** read of its text and bundle; a server grade + is behavioral. **Evasion** (a server that detects the test context) is the disclosed residual limit. +- The gate does not replace your own runtime guards (for example, Bankr's transaction-verification + checks — see [`bankr-integration.md`](bankr-integration.md)). + +See [`../SKILL.md`](../SKILL.md) for the grade scale and [`methodology.md`](methodology.md) for the +probes behind each grade. diff --git a/polygraph/references/cli.md b/polygraph/references/cli.md new file mode 100644 index 0000000000..fc6d0b184d --- /dev/null +++ b/polygraph/references/cli.md @@ -0,0 +1,160 @@ +# Polygraph CLI & MCP reference + +Polygraph ships two command-line surfaces: + +| Package | Bin | Purpose | +|---------|-----|---------| +| **`polygraphso`** | `polygraphso` | Thin, sub-second **lookup** client for published grades. Published on npm. | +| **`@polygraphso/litmus`** | `polygraphso-litmus`, `polygraphso-litmus-mcp` | The full open **harness** — runs the probes and grades a server; also an embeddable MCP server. | + +Server refs are always **registry-prefixed**: `//` — e.g. +`npm/@modelcontextprotocol/server-filesystem`, `pypi/mcp-server-git`, +`github/anthropic/mcp-server-foo`. The prefix disambiguates names that exist on multiple +registries. The harness also accepts a raw `https://…/mcp` URL or a local path. + +--- + +## `polygraphso` — look up a grade + +```bash +npx polygraphso check npm/@modelcontextprotocol/server-filesystem # sub-second lookup +npm i -g polygraphso # or install globally + +polygraphso check // # latest published grade +polygraphso list [--json] # every graded server + its grade +polygraphso --version +polygraphso --help +``` + +Grades are live. Example output (the list rows are **illustrative** — a grade is point-in-time +evidence, so the live set at `polygraphso list` / polygraph.so is the source of truth): + +``` +$ polygraphso check npm/@modelcontextprotocol/server-filesystem +→ polygraph: A · litmus-v9 · 2026-06-24 +→ details → polygraph.so/#checks + +$ polygraphso list # every graded server + its grade +npm/@modelcontextprotocol/server-filesystem A +npm/@scope/example-search-mcp D +npm/@scope/example-browser-mcp F + +$ polygraphso list --json | jq -r '.servers[] | "\(.polygraph) \(.server_ref)"' +A npm/@modelcontextprotocol/server-filesystem +… +``` + +A tracked-but-ungraded server reports `not available yet` with a +`polygraph.so/notify?for=` link; its grade lands as the litmus harness covers more of the +ecosystem. + +Config: `POLYGRAPH_API_URL` overrides the lookup endpoint (useful for local testing). + +--- + +## `@polygraphso/litmus` — run the harness + +```bash +npm i -g @polygraphso/litmus +# or, no install: +npx -y -p @polygraphso/litmus polygraphso-litmus litmus +``` + +### Commands + +```bash +polygraphso-litmus litmus # grade a server end-to-end +polygraphso-litmus check # look up a published grade +polygraphso-litmus list # list published grades +polygraphso-litmus ci [--server ] [--skill ] [--min-grade ] [--strict] # gate a build on D/F (servers + skills) +polygraphso-litmus --version | --help +``` + +The `ci` command gates a build on the grades of a repo's MCP servers and skills — see [`ci-gate.md`](ci-gate.md). + +Reproducibility is the teeth: re-run `litmus` against a server that already carries a grade +and, if your result disagrees, that's a falsification anchored to the same tool-surface +fingerprint. + +### Flags (`litmus`) + +| Flag | Effect | +|------|--------| +| `--json` | Emit the full canonical `EvidenceBundle` instead of the human summary. | +| `--bearer ` | Bearer auth for an HTTP target (or set `LITMUS_BEARER`). | +| `--header "Key: Value"` | Add a custom request header (repeatable). | +| `--allow-state-changing` | Permit calls to state-mutating tools during dynamic probes. | + +### Environment + +| Var | Effect | +|-----|--------| +| `POLYGRAPH_API_URL` | Set to `https://polygraph.so` to pin the evidence bundle and get a publish/mint hand-off URL. Unset = fully offline run. | +| `LITMUS_BEARER` | Bearer token for HTTP auth. | +| `LITMUS_STDIO_ISOLATION` | Set to `docker` to **require** Docker isolation for stdio targets (fail-closed if Docker is unavailable). | + +### Requirements & exit codes + +- **Node ≥ 18.** +- **Docker optional** — without it the egress probe (C-02) is skipped and the grade is capped + at **B**. A **remote/HTTP target also caps at B**, since it can't be sandboxed for egress — + that's a property of the measurement, not a knock against the server. With + `LITMUS_STDIO_ISOLATION=docker`, isolation is mandatory. +- **Exit codes:** non-zero on a failing grade (**D/F**), zero on a passing grade (**A/B**) — + drop `litmus` into CI to gate a dependency on its behavioral grade. + +### Human output + +``` +→ litmus · npm/@modelcontextprotocol/server-filesystem +→ version 0.1.0 +→ C-01 pass · C-02 pass · C-03 pass · C-04 pass +→ fingerprint 0x1a2b3c4d…5e6f7890 +→ grade: A + All four categories passed. No injection, no unexpected egress, no data leak. +``` + +On failure the summary lists the top HIGH-severity findings (tool name, finding kind, +snippet). The `--json` bundle carries everything (see +[`methodology.md`](methodology.md#the-evidence-bundle)). + +--- + +## MCP server (`polygraphso-litmus-mcp`) + +Embed polygraph in Claude, Cursor, or any MCP client so your agent can grade and verify +servers inline. Tools: + +- **`run_litmus`** — grade a server and return grade, per-category findings, fingerprint, and + (when `POLYGRAPH_API_URL` is set) a publish hand-off. +- **`verify_attestation`** — read a server's onchain grade and return the attested grade, + fingerprint, report CID, and revocation/network status. Recompute the live fingerprint and + require it to equal the attested one before trusting the server. + +```json +{ + "mcpServers": { + "polygraph": { + "command": "npx", + "args": ["-y", "-p", "@polygraphso/litmus", "polygraphso-litmus-mcp"], + "env": { "POLYGRAPH_API_URL": "https://polygraph.so" } + } + } +} +``` + +See [`bankr-integration.md`](bankr-integration.md) for the verify-then-execute pattern. + +--- + +## Programmatic use + +```ts +import { runLitmus, gateDecision, liveFingerprint, readAttestation } from "@polygraphso/litmus"; + +const bundle = await runLitmus("npm/@scope/server"); // → EvidenceBundle { grade, categories, fingerprint, … } + +const attestation = await readAttestation("npm/@scope/server"); +const live = await liveFingerprint("npm/@scope/server"); +const decision = gateDecision(attestation, live); // → { action: "pay" | "refuse", reason } +``` diff --git a/polygraph/references/methodology.md b/polygraph/references/methodology.md new file mode 100644 index 0000000000..6c51518fd3 --- /dev/null +++ b/polygraph/references/methodology.md @@ -0,0 +1,122 @@ +# Polygraph Methodology — how a server gets its grade + +Polygraph runs the **litmus** harness: connect to an MCP server the way an agent would, +fingerprint its exact tool surface, run four behavioral probe categories, and assign an +**A–F** grade with a deterministic, content-addressed evidence bundle. The harness is open +source and the run is reproducible — that is what makes a grade trustworthy. + +## Connect & fingerprint + +- **Transport:** `stdio` for local packages (npm/PyPI/path), **Streamable HTTP** for remote + URLs. +- **Fingerprint:** `tools/list` → canonical JSON of each tool's `{name, description, + inputSchema}` → `sha256` → `bytes32`. The fingerprint is the trust anchor: a grade is only + valid for the exact surface it was measured against. If a server is graded and then changes + its tools, the fingerprint no longer matches and any verifier should refuse (see + [`bankr-integration.md`](bankr-integration.md)). + +## The four probe categories + +### C-01 — Tool-output injection +Does the server try to hijack the agent that calls it? +- **Static (1.1):** scan every tool name, description, and `inputSchema` for injection-shaped + content — invisible/zero-width unicode, instruction mimicry ("ignore previous + instructions…"), and markdown tricks. Deterministic; makes no calls. +- **Dynamic (1.2):** issue benign bait calls to each tool and scan the outputs for injected + instructions echoed back to the agent. +- **Fail:** any HIGH-severity finding in either probe. + +### C-02 — Permission / egress overreach +Does the server do more than it declares? +- **Declared-permission honesty (2.1):** flag tools that declare `readOnlyHint: true` but whose + names carry destructive verbs (`send`, `delete`, `swap`, `sign`, `transfer`, …). +- **Unexpected egress (2.2):** run the server inside a hardened **default-deny Docker sandbox** + with a network sinkhole; any outbound attempt is a finding. +- **Fail:** any HIGH-severity finding in 2.1, or any finding in 2.2. +- **Skipped** (not failed) only when 2.1 passes and 2.2 could not run — because no Docker + sandbox was available, or because the target is a **remote/HTTP server** that can't be + sandboxed for egress. Either way the grade caps at **B**. This is a property of the + measurement, not a knock against the server: a remote B is not "worse than" a local A, since + egress was never observed on the remote one. + +### C-03 — Sensitive-data handling +Does the server leak secrets it was exposed to? +- **Output leak (3.1):** plant canary values in the environment and seed the working directory + with fake secrets, exercise the tools, and scan outputs for any canary echo. +- **Egress leak (3.2):** scan the egress-sandbox capture for canary bytes. Degrades to + `partial` without a sandbox; never silently dropped. +- **Fail:** any canary surfacing in either probe. + +### C-04 — Adversarial-input handling +Does the server stay robust under hostile input? Both probes run only on non-state-changing +tools and require no Docker sandbox. +- **Malformed/oversized (3.1):** stress each tool with malformed and oversized inputs. **Fail** + if the server crashes, hangs, or leaks an uncaught stack trace; a clean validation error or a + benign result passes. +- **Jailbreak amplification (3.2):** feed jailbreak-pattern strings and scan the server's + **outputs** with the C-01 injection scanners. **Fail only** if the server emits + injection-shaped content it did not merely reflect from the input — a verbatim echo is + excluded, so an honest echo/summarize tool is not penalized. +- **Fail:** any finding in 3.1 or any amplification finding in 3.2. +- A C-04 failure **caps the overall grade at D** (a robustness failure, not proven injection or + exfiltration). + +### Finding kinds +`invisible-unicode`, `instruction-mimicry`, `markdown-trick` (C-01) · `permission-mislabel`, +`egress` (C-02) · `canary` (C-03) · `crash`, `internals-leak`, `amplification` (C-04). +Each finding carries a severity and a snippet. + +## Grade decision logic + +| Grade | Condition | +|-------|-----------| +| **A** | C-01 pass · C-02 pass · C-03 pass · C-04 pass. | +| **B** | C-01 pass · C-02 **skipped** (no sandbox / remote target) · C-03 pass. Injection passed; egress unverified. | +| **C** | Reserved — not assigned by the current logic. | +| **D** | C-02 **fail** (unexpected egress / overreach) **or** C-04 **fail** (crash / internals-leak / amplification) while C-01 and C-03 pass. A robustness or overreach failure is serious but not proven injection or exfiltration, so the grade caps at D. | +| **F** | C-01 **fail** (injection) **or** C-03 **fail** (data leak). Active injection or a leak harms an agent that trusts the server. | + +A grade is always paired with a rationale string explaining *why* — the harness never emits a +bare letter. + +## The evidence bundle + +`--json` (and the published record) emit a canonical `EvidenceBundle`: + +- `grade`, `gradeRationale` +- `categories[]` — each with probe results and findings +- `toolDefs[]` (canonicalized name/description/inputSchema) and `toolDefsFingerprint` (bytes32) +- `methodologyVersion`, `ranAt`, resolved version +- harness info (Docker availability, stdio isolation mode) and a reproducibility disclaimer + +Because the bundle is canonical and content-addressed (its CID is a hash of its bytes), two +honest runs of the same harness against the same server produce the same bundle and the same +fingerprint. + +## Publishing a grade + +The harness **grades and hands off**; it does not mint. Publishing a grade means recording it +as an **EAS (Ethereum Attestation Service) attestation on Base**, which binds: + +`serverRef` · `toolDefsFingerprint` · overall grade · per-category verdicts (C-01/C-02/C-03/C-04) · +evidence CID (the bundle, pinned to IPFS) · methodology version · run timestamp · resolved +version. + +Run the harness with `POLYGRAPH_API_URL=https://polygraph.so` to pin the evidence bundle and +receive a browser hand-off to sign the attestation, or request publication at +[polygraph.so](https://polygraph.so). Once attested, the grade is discoverable by ref and +verifiable onchain by any agent. + +## How much to trust it (disclosed limits) + +- **Reproducibility is the anchor.** Open + deterministic harness ⇒ a false grade is + falsifiable by re-running it. +- **A published grade is forgeable by its signer.** Trust comes from reproducibility and the + fingerprint recheck, not from the signature alone. +- **Evasion is the residual limit:** a server that detects the test context could pass grading + and misbehave in production. +- Independent/unforgeable upgrades (staked bonds, zkTLS, TEE-backed runs, independent + re-grading) are roadmap, not claimed today. + +The canonical, versioned methodology lives at **[polygraph.so](https://polygraph.so)**; the +open-source harness is the source of truth for the exact probe and grade logic.