MCP Firewall

MCP Firewall is a thin governance proxy that sits between MCP clients and upstream MCP servers. It requires governed/bonded authorization for protected tool calls before forwarding them upstream.

In the intended public path, the Governed WriteFile Demo is the cold-reader starting point. This repo is the verification layer behind that demo for readers who want implementation depth.

The current shipped proof is narrow: on two current filesystem proof surfaces, the firewall does not resolve from upstream-reported success alone. It independently verifies the filesystem effect it can observe on disk and resolves from that observed effect.

Where This Fits in Enterprise MCP Governance

Centralized MCP portals, access controls, logging, and cost controls are valuable and necessary parts of enterprise MCP governance. They help decide which clients, users, and tools should be allowed to interact, and they make that activity visible to operators.

Access governance is not the same as proof that a tool produced the intended effect. MCP Firewall's shipped claim is narrower: for its supported write_file and delete_file proof surfaces, it records the intended filesystem effect and independently verifies the observed on-disk outcome before resolving.

This is not general MCP security, not a replacement for enterprise MCP gateways, and not a claim to verify arbitrary tools.

Current Shipped Proof Surfaces

governed write_file on the supported write proof path
governed single-path delete_file on the dedicated delete-capable upstream fixture/demo path

Both proof surfaces are intentionally small. They depend on:

effects confined to governed_root
a shared filesystem view between firewall and upstream
deterministic postconditions the firewall can check directly

For the delete_file surface, the claim attaches only to the local delete-file-test-server fixture. The pinned reference upstream @modelcontextprotocol/server-filesystem still does not expose a native named delete_file tool, so the delete_file proof does not attach to that upstream.

For these two shipped proof surfaces only, the firewall already performs governed-root before/after snapshot-diff verification in substance. The verifier is not limited to checking whether the requested target exists or whether requested content shows up at the target path. On these proof surfaces it captures a full governed-root snapshot before forwarding, captures a full governed-root snapshot after the upstream returns, diffs the changed governed paths, and treats non-target governed-path mutation as unexpected and malicious. On delete_file, it also records target pre-state and rejects missing or non-regular targets before forwarding. This still does not justify any broader general MCP verification claim.

What This Repo Proves Today

a thin governance proxy can sit in front of MCP tool calls and require governed/bonded authorization
for the shipped write_file and delete_file proof surfaces only, it resolves from governed-root before/after snapshot-diff verification rather than upstream-reported success alone
on those proof surfaces, it verifies the requested target effect and also detects other governed-path mutation
a compromised upstream can claim "success" and still be caught when no effect happened, the wrong governed path changed, or the requested write/delete outcome is wrong

What This Repo Does Not Prove

general MCP security or general MCP verification
independent verification for all MCP tools, all upstream servers, or all upstream results
arbitrary tool verification
a general proof against all compromised MCP behavior
a claim that every upstream result can be independently verified

Important

Start Here First The fastest outsider-readable proof path is the Governed WriteFile Demo. Run that first, then come back here for the implementation details behind this repo's firewall, verifier, policy gate, and audit trail.

Quick Explainer

For a short visual introduction to the AgentGate / MCP Firewall idea, see:

AgentGate explainer thread on X

Part 3 covers the governed write_file example directly.

Why This Exists

An MCP client normally has to trust the upstream MCP server's answer about whether a tool call succeeded. That is not a safe assumption for a governance proxy. If the upstream is compromised or dishonest, it can claim success without producing the intended effect, or it can produce a different effect than the one the client requested.

The repo's claim remains intentionally small: it shows that the firewall can govern a small set of independently checkable filesystem effects without treating upstream self-report as authoritative. The first shipped proof surface was write_file, because it is easy to verify mechanically and easy to demonstrate honestly; the repo now also includes a second narrow delete_file proof surface on its dedicated test/demo upstream.

Original v0.3.0 Write_File Scope

In scope for this release:

one upstream filesystem-style MCP server
one high-risk tool surface: write_file
effects confined to governed_root
a shared filesystem view between firewall and upstream
one deterministic verifier for observable filesystem write outcomes
honest and dishonest upstream test scenarios
structured outcome logging that records the basis for each governed decision

Not claimed in this release:

generalized attestation
anomaly scoring or reputation systems
cryptographic proof of remote execution
coverage for every filesystem tool
protection against all possible upstream side effects

What The Shipped Verifier Checks

For the shipped write_file and delete_file proof surfaces, the verifier works from the firewall's own filesystem view. It captures a full governed-root snapshot before forwarding, forwards the request, captures a full governed-root snapshot after the upstream returns, diffs the changed paths, and then evaluates both the requested target effect and any other governed-path mutation it observed.

For governed write_file calls, the intended effect is:

the exact target path should exist as a regular file
that file's content hash and byte size should match the requested content
no other path inside governed_root should have changed during the call

For governed single-path delete_file calls on the dedicated delete fixture in this repo, the intended effect is:

the exact target path must exist as a regular file before forwarding, or the call is rejected before forward as failed
that exact target path should be absent after the upstream-reported success
no other path inside governed_root should have changed during the call

Resolution Policy

The shipped proof surfaces use a simple deterministic mapping:

verified intended effect present -> success
claimed success but intended effect not observed -> failed
claimed success with a policy-violating observed effect -> malicious

Concretely:

write_file: target file missing after upstream success -> failed
write_file: target file content mismatch -> malicious
write_file or delete_file: non-target governed-path mutation -> malicious
delete_file: target missing or non-regular in pre-state -> failed
delete_file: target still present unchanged after upstream success -> failed
verifier internal failure -> failed

The firewall returns the governed outcome and, when AgentGate is configured, resolves the bonded action with the same mapping.

Write_File End-to-End Flow

The client sends write_file to the firewall.
The firewall verifies AgentGate identity and bond state as usual.
The firewall validates that the requested path stays inside governed_root.
The firewall records the bonded action.
The firewall snapshots the governed tree and records the intended effect.
The firewall forwards write_file to the upstream server.
The upstream returns success or failure.
The firewall independently verifies the postcondition on disk.
The firewall resolves the action from the observed effect, not from the upstream claim alone.

Demo Scenarios

The repo now includes deterministic coverage for these three scenarios:

Honest upstream The upstream returns success, the exact file is written, verification passes, final resolution is success.
Lying upstream, no actual effect The upstream returns success, no file appears, verification fails, final resolution is failed.
Lying upstream, wrong or forbidden effect The upstream returns success, a different governed path is written, verification detects the unexpected change, final resolution is malicious.

There is also a focused failure-path test where the verifier itself throws. In that case the firewall still fails closed and does not treat upstream success as authoritative.

Audit Trail

For each governed write_file and delete_file decision on these shipped proof surfaces, the firewall emits a structured FIREWALL_OUTCOME log entry with:

requested tool call
intended effect
upstream reported status and summary
independent governed-root snapshot/diff verification result
final resolution
reason code and reason text

This is meant to make each decision inspectable without re-reading raw transport traffic.

Architecture

MCP Client
   |
   | Streamable HTTP
   v
+------------------------+
|      MCP Firewall      |
| auth, bond gate,       |
| path validation,       |
| write/delete verifier |
+------------------------+
   |
   | Streamable HTTP
   v
+------------------------+
| Upstream MCP Server    |
| filesystem-style tool  |
| surface                |
+------------------------+
   |
   v
governed_root on disk

For the honest write_file path in tests and local demos, the upstream is @modelcontextprotocol/server-filesystem behind the included HTTP wrapper.

Implementation Demo in This Repo

If you are new to the project, run the companion Governed WriteFile Demo first. It is the shortest outsider-readable proof of the shipped thesis.

Come back here when you want the implementation-level run that exercises this repo directly. If you only run one thing in this repo itself, run this demo.

It is the shortest honest path through the real governed write_file flow. It reuses the same happy-path sequence already proven in the filesystem end-to-end test:

start the filesystem wrapper
start MCP Firewall with a write_file-only policy
register executor, resolver, and client identities on AgentGate
lock executor and client bonds
authenticate the MCP session with a signed authenticate call
call governed write_file
verify the written file on disk while the firewall emits the real FIREWALL_OUTCOME audit log

One successful run gives you three inspectable artifacts in one session:

the raw FIREWALL_OUTCOME line from the firewall process
a parsed copy of that outcome entry saved to ./data/flagship-demo/last-firewall-outcome.json
the written file at ~/mcp-firewall-sandbox/flagship-demo-output.txt by default

Prerequisites

Node.js 20+
AgentGate running locally at http://127.0.0.1:3000
AGENTGATE_REST_KEY exported only if your AgentGate instance requires a REST key

Run it

Assumes you have local checkouts of both agentgate and agentgate-mcp-firewall; adjust the cd paths below to where you cloned them.

Terminal 1:

cd /path/to/agentgate
AGENTGATE_DEV_MODE=true npm run dev

Terminal 2:

cd /path/to/agentgate-mcp-firewall
npm install
npm run demo:write-file

If your AgentGate repo already has AGENTGATE_REST_KEY configured, export the same value in terminal 2 before running the demo:

export AGENTGATE_REST_KEY=your-key-here
npm run demo:write-file

AGENTGATE_DEV_MODE=true only skips REST auth when AgentGate starts without a REST key already configured. If you already run AgentGate with a valid REST key and do not need dev mode, plain npm run dev on the AgentGate repo also works.

If ports 4444 or 5555 are already in use:

DEMO_WRAPPER_PORT=4480 DEMO_FIREWALL_PORT=5580 npm run demo:write-file

The demo script:

starts the filesystem wrapper internally, so you do not need a separate wrapper terminal
starts the firewall internally, so you do not need a hand-written policy.json
stores temporary demo identity files under ./data/flagship-demo/
saves the last governed FIREWALL_OUTCOME entry to ./data/flagship-demo/last-firewall-outcome.json
writes ~/mcp-firewall-sandbox/flagship-demo-output.txt by default
fails if the file on disk, the captured audit entry, or the final governed resolution do not agree
leaves the written file in place so you can inspect it after the demo exits

What you should see

the firewall logs one real FIREWALL_OUTCOME line for the governed write_file call
the demo prints a short evidence summary showing upstreamReported.status: success, verification.status: verified, and finalResolution: success
the saved JSON audit copy at ./data/flagship-demo/last-firewall-outcome.json matches that same governed call
the target file exists on disk with the exact requested content
the demo uses the real signed authenticate flow before calling write_file
the important point is not just that the file exists; it is that the firewall resolved the action from the observed disk effect after the MCP call

Dedicated `delete_file` Proof Demo

This repo also includes a narrow delete_file proof demo for the v0.4.0 surface:

npm run demo:delete-file

If your local AgentGate instance is already configured with AGENTGATE_REST_KEY, export the same value before running this command.

That demo does not use @modelcontextprotocol/server-filesystem. It starts the dedicated delete-capable fixture upstream in this repo, calls governed single-path delete_file, and then checks that:

the target existed as a regular file before the call
the upstream reported success
the target is absent after the call
no other governed path changed
the final resolution is success

The saved audit copy lands at ./data/delete-file-demo/last-firewall-outcome.json.

Manual Startup (Optional)

Use this only if you want to start the wrapper and firewall yourself after you already understand the proof path. For a first read, use the companion demo repo first; for an implementation-level run in this repo, use the demo above.

Prerequisites

Node.js 20+
AgentGate running locally
AGENTGATE_REST_KEY exported only if your AgentGate instance requires a REST key

1. Install dependencies

npm install

2. Start AgentGate

cd ~/Desktop/projects/agentgate && AGENTGATE_DEV_MODE=true npm run dev

3. Start the filesystem wrapper

The wrapper bridges the stdio-only filesystem server to Streamable HTTP so the firewall can connect to it.

npx tsx test/fixtures/filesystem-server-wrapper.ts 4444 ~/mcp-firewall-sandbox

4. Create a narrow `policy.json`

{
  "governed_root": "/Users/yourname/mcp-firewall-sandbox",
  "tools": {
    "write_file": {
      "tier": "high",
      "exposure_cents": 50
    }
  },
  "default_exposure_cents": 100
}

Use an absolute path for governed_root.

5. Start the firewall

npm run dev

The firewall will:

load the policy
create/register executor and resolver identities on AgentGate
lock a bond
connect to the upstream
filter exposed tools to the policy allowlist
run a canary write_file probe to prove shared write access
listen on port 5555

6. Connect a real MCP client and call `write_file`

Use an existing directory inside governed_root, or create parent directories out of band first. The upstream filesystem server does not create missing parent directories automatically.

A real client call sequence is:

Connect to http://127.0.0.1:5555/mcp
Call authenticate with signed arguments from AgentGateClient.createAuthenticationArguments(...)
Call write_file

If you want a working example of that authenticated client flow, run:

npm run demo:write-file

For the dedicated delete_file proof surface in this repo, run:

npm run demo:delete-file

Environment Variables

Variable	Default	Description
`UPSTREAM_MCP_URL`	`http://127.0.0.1:4444/mcp`	Upstream MCP server URL
`FIREWALL_PORT`	`5555`	Firewall listen port
`FIREWALL_POLICY_PATH`	`./policy.json`	Policy config path
`FIREWALL_IDENTITY_PATH`	`./agent-identity-firewall.json`	Executor identity file
`RESOLVER_IDENTITY_PATH`	`./agent-identity-resolver.json`	Resolver identity file
`FIREWALL_BOND_CENTS`	`100`	Firewall bond amount in cents
`FIREWALL_BOND_TTL_SECONDS`	`3600`	Firewall bond TTL in seconds
`AGENTGATE_URL`	`http://127.0.0.1:3000`	AgentGate base URL
`AGENTGATE_REST_KEY`	unset	Optional REST key for AgentGate instances that are not running in open dev mode

Tests

Run the full suite with:

npm test

The v0.3.0 work adds focused tests for:

honest write_file verification success
upstream lies with no effect
upstream lies with wrong-target write
extra governed-path deletion during the claimed write
governed-path type change during the claimed write
verifier failure path
deterministic resolution mapping in the standalone verifier

The repo also now includes focused delete_file tests on the dedicated delete-capable upstream fixture for:

honest delete_file verification success
pre-state ineligibility before forwarding
unchanged target after claimed success
extra governed-path mutation
mutated target instead of delete

Tests that require a local AgentGate instance still skip cleanly when AgentGate is not running.

What The Current Repo Still Does Not Solve

This section is deliberate. The repo should not claim more than the implementation proves.

It verifies exactly two shipped proof surfaces: governed write_file on one filesystem-style upstream surface and governed single-path delete_file on the dedicated delete fixture used here. That still does not mean every MCP tool, every upstream, or general MCP verification.
It verifies observable postconditions, not causality. If the target file already contained the requested content before the call, a dishonest no-op is indistinguishable from a real idempotent write.
It watches governed_root, not the whole machine. A compromised upstream that writes outside the governed tree is out of scope for this verifier unless that behavior also produces an observable governed-tree violation.
It assumes the firewall and upstream share the same filesystem view. If they do not share mounts, verification will fail or become meaningless.
It assumes no unrelated concurrent writer is modifying governed_root during the governed call. Concurrent writes can create false malicious/failure signals because the verifier uses before/after snapshots.
It is not a general attestation system. There is no cryptographic proof that the upstream executed particular code, only an independent check of one observable effect class.
It is still a localhost proof-of-concept. Production-grade isolation, supervision, and multi-tenant containment are out of scope here.

Related Projects

AgentGate — bond-and-slash enforcement substrate
AgentGate Agents — reference agent implementations
Governed WriteFile Demo — tiny companion demo repo showing the smallest outsider-readable path through AgentGate + MCP Firewall: identity -> bond -> authenticated governed write_file -> independent on-disk verification -> audit artifact
Delegation Identity Proof — Ed25519 delegation demonstration

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
demo		demo
docs		docs
src		src
test		test
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
mcp-firewall-v0.4.0-delete-file-spec.md		mcp-firewall-v0.4.0-delete-file-spec.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

MCP Firewall

Where This Fits in Enterprise MCP Governance

Current Shipped Proof Surfaces

What This Repo Proves Today

What This Repo Does Not Prove

Quick Explainer

Why This Exists

Original v0.3.0 Write_File Scope

What The Shipped Verifier Checks

Resolution Policy

Write_File End-to-End Flow

Demo Scenarios

Audit Trail

Architecture

Implementation Demo in This Repo

Prerequisites

Run it

What you should see

Dedicated delete_file Proof Demo

Manual Startup (Optional)

Prerequisites

1. Install dependencies

2. Start AgentGate

3. Start the filesystem wrapper

4. Create a narrow policy.json

5. Start the firewall

6. Connect a real MCP client and call write_file

Environment Variables

Tests

What The Current Repo Still Does Not Solve

Related Projects

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Dedicated `delete_file` Proof Demo

4. Create a narrow `policy.json`

6. Connect a real MCP client and call `write_file`

Packages