MCP Firewall is a thin governance proxy that sits between MCP clients and upstream MCP servers. It requires governed/bonded authorization for protected tool calls before forwarding them upstream.
In the intended public path, the Governed WriteFile Demo is the cold-reader starting point. This repo is the verification layer behind that demo for readers who want implementation depth.
The current shipped proof is narrow: on two current filesystem proof surfaces, the firewall does not resolve from upstream-reported success alone. It independently verifies the filesystem effect it can observe on disk and resolves from that observed effect.
Centralized MCP portals, access controls, logging, and cost controls are valuable and necessary parts of enterprise MCP governance. They help decide which clients, users, and tools should be allowed to interact, and they make that activity visible to operators.
Access governance is not the same as proof that a tool produced the intended effect. MCP Firewall's shipped claim is narrower: for its supported write_file and delete_file proof surfaces, it records the intended filesystem effect and independently verifies the observed on-disk outcome before resolving.
This is not general MCP security, not a replacement for enterprise MCP gateways, and not a claim to verify arbitrary tools.
- governed
write_fileon the supported write proof path - governed single-path
delete_fileon the dedicated delete-capable upstream fixture/demo path
Both proof surfaces are intentionally small. They depend on:
- effects confined to
governed_root - a shared filesystem view between firewall and upstream
- deterministic postconditions the firewall can check directly
For the delete_file surface, the claim attaches only to the local delete-file-test-server fixture. The pinned reference upstream @modelcontextprotocol/server-filesystem still does not expose a native named delete_file tool, so the delete_file proof does not attach to that upstream.
For these two shipped proof surfaces only, the firewall already performs governed-root before/after snapshot-diff verification in substance. The verifier is not limited to checking whether the requested target exists or whether requested content shows up at the target path. On these proof surfaces it captures a full governed-root snapshot before forwarding, captures a full governed-root snapshot after the upstream returns, diffs the changed governed paths, and treats non-target governed-path mutation as unexpected and malicious. On delete_file, it also records target pre-state and rejects missing or non-regular targets before forwarding. This still does not justify any broader general MCP verification claim.
- a thin governance proxy can sit in front of MCP tool calls and require governed/bonded authorization
- for the shipped
write_fileanddelete_fileproof surfaces only, it resolves from governed-root before/after snapshot-diff verification rather than upstream-reported success alone - on those proof surfaces, it verifies the requested target effect and also detects other governed-path mutation
- a compromised upstream can claim
"success"and still be caught when no effect happened, the wrong governed path changed, or the requested write/delete outcome is wrong
- general MCP security or general MCP verification
- independent verification for all MCP tools, all upstream servers, or all upstream results
- arbitrary tool verification
- a general proof against all compromised MCP behavior
- a claim that every upstream result can be independently verified
Important
Start Here First The fastest outsider-readable proof path is the Governed WriteFile Demo. Run that first, then come back here for the implementation details behind this repo's firewall, verifier, policy gate, and audit trail.
For a short visual introduction to the AgentGate / MCP Firewall idea, see:
AgentGate explainer thread on X
Part 3 covers the governed write_file example directly.
An MCP client normally has to trust the upstream MCP server's answer about whether a tool call succeeded. That is not a safe assumption for a governance proxy. If the upstream is compromised or dishonest, it can claim success without producing the intended effect, or it can produce a different effect than the one the client requested.
The repo's claim remains intentionally small: it shows that the firewall can govern a small set of independently checkable filesystem effects without treating upstream self-report as authoritative. The first shipped proof surface was write_file, because it is easy to verify mechanically and easy to demonstrate honestly; the repo now also includes a second narrow delete_file proof surface on its dedicated test/demo upstream.
In scope for this release:
- one upstream filesystem-style MCP server
- one high-risk tool surface:
write_file - effects confined to
governed_root - a shared filesystem view between firewall and upstream
- one deterministic verifier for observable filesystem write outcomes
- honest and dishonest upstream test scenarios
- structured outcome logging that records the basis for each governed decision
Not claimed in this release:
- generalized attestation
- anomaly scoring or reputation systems
- cryptographic proof of remote execution
- coverage for every filesystem tool
- protection against all possible upstream side effects
For the shipped write_file and delete_file proof surfaces, the verifier works from the firewall's own filesystem view. It captures a full governed-root snapshot before forwarding, forwards the request, captures a full governed-root snapshot after the upstream returns, diffs the changed paths, and then evaluates both the requested target effect and any other governed-path mutation it observed.
For governed write_file calls, the intended effect is:
- the exact target path should exist as a regular file
- that file's content hash and byte size should match the requested content
- no other path inside
governed_rootshould have changed during the call
For governed single-path delete_file calls on the dedicated delete fixture in this repo, the intended effect is:
- the exact target path must exist as a regular file before forwarding, or the call is rejected before forward as
failed - that exact target path should be absent after the upstream-reported success
- no other path inside
governed_rootshould have changed during the call
The shipped proof surfaces use a simple deterministic mapping:
- verified intended effect present ->
success - claimed success but intended effect not observed ->
failed - claimed success with a policy-violating observed effect ->
malicious
Concretely:
write_file: target file missing after upstream success ->failedwrite_file: target file content mismatch ->maliciouswrite_fileordelete_file: non-target governed-path mutation ->maliciousdelete_file: target missing or non-regular in pre-state ->faileddelete_file: target still present unchanged after upstream success ->failed- verifier internal failure ->
failed
The firewall returns the governed outcome and, when AgentGate is configured, resolves the bonded action with the same mapping.
- The client sends
write_fileto the firewall. - The firewall verifies AgentGate identity and bond state as usual.
- The firewall validates that the requested path stays inside
governed_root. - The firewall records the bonded action.
- The firewall snapshots the governed tree and records the intended effect.
- The firewall forwards
write_fileto the upstream server. - The upstream returns success or failure.
- The firewall independently verifies the postcondition on disk.
- The firewall resolves the action from the observed effect, not from the upstream claim alone.
The repo now includes deterministic coverage for these three scenarios:
- Honest upstream
The upstream returns success, the exact file is written, verification passes, final resolution is
success. - Lying upstream, no actual effect
The upstream returns success, no file appears, verification fails, final resolution is
failed. - Lying upstream, wrong or forbidden effect
The upstream returns success, a different governed path is written, verification detects the unexpected change, final resolution is
malicious.
There is also a focused failure-path test where the verifier itself throws. In that case the firewall still fails closed and does not treat upstream success as authoritative.
For each governed write_file and delete_file decision on these shipped proof surfaces, the firewall emits a structured FIREWALL_OUTCOME log entry with:
- requested tool call
- intended effect
- upstream reported status and summary
- independent governed-root snapshot/diff verification result
- final resolution
- reason code and reason text
This is meant to make each decision inspectable without re-reading raw transport traffic.
MCP Client
|
| Streamable HTTP
v
+------------------------+
| MCP Firewall |
| auth, bond gate, |
| path validation, |
| write/delete verifier |
+------------------------+
|
| Streamable HTTP
v
+------------------------+
| Upstream MCP Server |
| filesystem-style tool |
| surface |
+------------------------+
|
v
governed_root on disk
For the honest write_file path in tests and local demos, the upstream is @modelcontextprotocol/server-filesystem behind the included HTTP wrapper.
If you are new to the project, run the companion Governed WriteFile Demo first. It is the shortest outsider-readable proof of the shipped thesis.
Come back here when you want the implementation-level run that exercises this repo directly. If you only run one thing in this repo itself, run this demo.
It is the shortest honest path through the real governed write_file flow. It reuses the same happy-path sequence already proven in the filesystem end-to-end test:
- start the filesystem wrapper
- start MCP Firewall with a
write_file-only policy - register executor, resolver, and client identities on AgentGate
- lock executor and client bonds
- authenticate the MCP session with a signed
authenticatecall - call governed
write_file - verify the written file on disk while the firewall emits the real
FIREWALL_OUTCOMEaudit log
One successful run gives you three inspectable artifacts in one session:
- the raw
FIREWALL_OUTCOMEline from the firewall process - a parsed copy of that outcome entry saved to
./data/flagship-demo/last-firewall-outcome.json - the written file at
~/mcp-firewall-sandbox/flagship-demo-output.txtby default
- Node.js 20+
- AgentGate running locally at
http://127.0.0.1:3000 AGENTGATE_REST_KEYexported only if your AgentGate instance requires a REST key
Assumes you have local checkouts of both agentgate and agentgate-mcp-firewall; adjust the cd paths below to where you cloned them.
Terminal 1:
cd /path/to/agentgate
AGENTGATE_DEV_MODE=true npm run devTerminal 2:
cd /path/to/agentgate-mcp-firewall
npm install
npm run demo:write-fileIf your AgentGate repo already has AGENTGATE_REST_KEY configured, export the same value in terminal 2 before running the demo:
export AGENTGATE_REST_KEY=your-key-here
npm run demo:write-fileAGENTGATE_DEV_MODE=true only skips REST auth when AgentGate starts without a REST key already configured. If you already run AgentGate with a valid REST key and do not need dev mode, plain npm run dev on the AgentGate repo also works.
If ports 4444 or 5555 are already in use:
DEMO_WRAPPER_PORT=4480 DEMO_FIREWALL_PORT=5580 npm run demo:write-fileThe demo script:
- starts the filesystem wrapper internally, so you do not need a separate wrapper terminal
- starts the firewall internally, so you do not need a hand-written
policy.json - stores temporary demo identity files under
./data/flagship-demo/ - saves the last governed
FIREWALL_OUTCOMEentry to./data/flagship-demo/last-firewall-outcome.json - writes
~/mcp-firewall-sandbox/flagship-demo-output.txtby default - fails if the file on disk, the captured audit entry, or the final governed resolution do not agree
- leaves the written file in place so you can inspect it after the demo exits
- the firewall logs one real
FIREWALL_OUTCOMEline for the governedwrite_filecall - the demo prints a short evidence summary showing
upstreamReported.status: success,verification.status: verified, andfinalResolution: success - the saved JSON audit copy at
./data/flagship-demo/last-firewall-outcome.jsonmatches that same governed call - the target file exists on disk with the exact requested content
- the demo uses the real signed
authenticateflow before callingwrite_file - the important point is not just that the file exists; it is that the firewall resolved the action from the observed disk effect after the MCP call
This repo also includes a narrow delete_file proof demo for the v0.4.0 surface:
npm run demo:delete-fileIf your local AgentGate instance is already configured with AGENTGATE_REST_KEY, export the same value before running this command.
That demo does not use @modelcontextprotocol/server-filesystem. It starts the dedicated delete-capable fixture upstream in this repo, calls governed single-path delete_file, and then checks that:
- the target existed as a regular file before the call
- the upstream reported success
- the target is absent after the call
- no other governed path changed
- the final resolution is
success
The saved audit copy lands at ./data/delete-file-demo/last-firewall-outcome.json.
Use this only if you want to start the wrapper and firewall yourself after you already understand the proof path. For a first read, use the companion demo repo first; for an implementation-level run in this repo, use the demo above.
- Node.js 20+
- AgentGate running locally
AGENTGATE_REST_KEYexported only if your AgentGate instance requires a REST key
npm installcd ~/Desktop/projects/agentgate && AGENTGATE_DEV_MODE=true npm run devThe wrapper bridges the stdio-only filesystem server to Streamable HTTP so the firewall can connect to it.
npx tsx test/fixtures/filesystem-server-wrapper.ts 4444 ~/mcp-firewall-sandbox{
"governed_root": "/Users/yourname/mcp-firewall-sandbox",
"tools": {
"write_file": {
"tier": "high",
"exposure_cents": 50
}
},
"default_exposure_cents": 100
}Use an absolute path for governed_root.
npm run devThe firewall will:
- load the policy
- create/register executor and resolver identities on AgentGate
- lock a bond
- connect to the upstream
- filter exposed tools to the policy allowlist
- run a canary
write_fileprobe to prove shared write access - listen on port
5555
Use an existing directory inside governed_root, or create parent directories out of band first. The upstream filesystem server does not create missing parent directories automatically.
A real client call sequence is:
- Connect to
http://127.0.0.1:5555/mcp - Call
authenticatewith signed arguments fromAgentGateClient.createAuthenticationArguments(...) - Call
write_file
If you want a working example of that authenticated client flow, run:
npm run demo:write-fileFor the dedicated delete_file proof surface in this repo, run:
npm run demo:delete-file| Variable | Default | Description |
|---|---|---|
UPSTREAM_MCP_URL |
http://127.0.0.1:4444/mcp |
Upstream MCP server URL |
FIREWALL_PORT |
5555 |
Firewall listen port |
FIREWALL_POLICY_PATH |
./policy.json |
Policy config path |
FIREWALL_IDENTITY_PATH |
./agent-identity-firewall.json |
Executor identity file |
RESOLVER_IDENTITY_PATH |
./agent-identity-resolver.json |
Resolver identity file |
FIREWALL_BOND_CENTS |
100 |
Firewall bond amount in cents |
FIREWALL_BOND_TTL_SECONDS |
3600 |
Firewall bond TTL in seconds |
AGENTGATE_URL |
http://127.0.0.1:3000 |
AgentGate base URL |
AGENTGATE_REST_KEY |
unset | Optional REST key for AgentGate instances that are not running in open dev mode |
Run the full suite with:
npm testThe v0.3.0 work adds focused tests for:
- honest
write_fileverification success - upstream lies with no effect
- upstream lies with wrong-target write
- extra governed-path deletion during the claimed write
- governed-path type change during the claimed write
- verifier failure path
- deterministic resolution mapping in the standalone verifier
The repo also now includes focused delete_file tests on the dedicated delete-capable upstream fixture for:
- honest
delete_fileverification success - pre-state ineligibility before forwarding
- unchanged target after claimed success
- extra governed-path mutation
- mutated target instead of delete
Tests that require a local AgentGate instance still skip cleanly when AgentGate is not running.
This section is deliberate. The repo should not claim more than the implementation proves.
- It verifies exactly two shipped proof surfaces: governed
write_fileon one filesystem-style upstream surface and governed single-pathdelete_fileon the dedicated delete fixture used here. That still does not mean every MCP tool, every upstream, or general MCP verification. - It verifies observable postconditions, not causality. If the target file already contained the requested content before the call, a dishonest no-op is indistinguishable from a real idempotent write.
- It watches
governed_root, not the whole machine. A compromised upstream that writes outside the governed tree is out of scope for this verifier unless that behavior also produces an observable governed-tree violation. - It assumes the firewall and upstream share the same filesystem view. If they do not share mounts, verification will fail or become meaningless.
- It assumes no unrelated concurrent writer is modifying
governed_rootduring the governed call. Concurrent writes can create false malicious/failure signals because the verifier uses before/after snapshots. - It is not a general attestation system. There is no cryptographic proof that the upstream executed particular code, only an independent check of one observable effect class.
- It is still a localhost proof-of-concept. Production-grade isolation, supervision, and multi-tenant containment are out of scope here.
- AgentGate — bond-and-slash enforcement substrate
- AgentGate Agents — reference agent implementations
- Governed WriteFile Demo — tiny companion demo repo showing the smallest outsider-readable path through AgentGate + MCP Firewall: identity -> bond -> authenticated governed
write_file-> independent on-disk verification -> audit artifact - Delegation Identity Proof — Ed25519 delegation demonstration