Project Sentinel

A reference testbed for runtime governance of LLM coding agents: sandbox each agent, audit each action, and verify failure modes before customers run agents against production code.

When teams put LLM agents into real workflows, three operational questions come back:

How are they sandboxed?
How are their actions audited?
What happens when something goes wrong?

Project Sentinel makes those questions concrete. It runs a synthetic office workload — sixty personas across three shifts, with real LLM calls — and underneath it the runtime layer an organization would actually operate: per-agent sandboxing (bwrap + Landlock + cgroups + netns), event-sourced audit trails, three independent control planes, and a 9/9-passing breakout test report.

The full stack is documented as a TOGAF v22.1 architecture and runs on a provisioned VM. The included docker demo is a deliberate behavioral subset: it shows the workload and dashboard, but not the kernel-bound parts (eBPF, Landlock, FUSE) that need a real host.

Architecture Guide (TOGAF v22.1) · Sandbox Test Report (9/9) · Demo

Why It Exists

Three things are hard to study without a believable, persistent, multi-agent environment:

Sandbox primitives at scale. What does bwrap + Landlock + cgroups v2 + netns actually cost when 26 agents tick simultaneously? Where do the breakouts come from when nobody is looking? The security test report records 9/9 breakout tests passing.
Controlplane design. Three independent observe / decide / act / verify loops (Agent CP, Platform CP, API CP) co-exist. Each owns one decision domain, none reach across. See docs/governance.md.
Boundary detection. Pattern detector for agent self-recognition (15 regex + two-stage LLM judge) measures when a generation surfaces awareness markers; the synthesis engine intercepts ~70% of routine perceptions before they reach a real LLM call. See Research Context for the narrative convention that underpins the workload.

Architecture at a Glance

flowchart TB
  subgraph AGENTS["Agent Layer · 60 LLM personas"]
    A1["51 shift-bound (3 shifts × 17)"]
    A2["9 always-on duty staff"]
  end

  subgraph SANDBOX["Sandbox Stack (per agent)"]
    S1["bwrap (user-namespaces)"]
    S2["Landlock LSM"]
    S3["cgroups v2"]
    S4["netns + nftables"]
    S5["Wasmtime (tool runtime)"]
  end

  subgraph CP["Three Controlplanes — Observe → Decide → Act → Verify"]
    direction LR
    AGCP["Agent CP<br/>(bio · perception)"]
    PLCP["Platform CP<br/>(infra · health)"]
    APCP["API CP<br/>(cost · routing)"]
  end

  STORE["Event Store<br/>Limbo SQLite · append-only<br/>Lamport ordering · hash-chain"]

  subgraph GATEWAY["Cortex Gateway (Go)"]
    G1["7-step proxy + guardrails"]
    G2["10-rule synthesis engine"]
  end

  subgraph BRIDGE["Quality + Memory Plane"]
    J1["Sentinel Judge<br/>(NATS · drift · quality)"]
    J2["NATS Bridge<br/>(Limbo → JetStream)"]
    J3["Hippocampus<br/>(NMDA night-run)"]
  end

  DASH["Dashboard<br/>Bun + Hono + WebSocket"]

  AGENTS -.->|"sandboxed in"| SANDBOX
  AGENTS -->|prompts| GATEWAY
  GATEWAY -->|emit events| STORE
  STORE -->|projections| DASH
  STORE -->|stream| BRIDGE
  CP -.->|govern| AGENTS
  CP -.->|govern| GATEWAY
  CP -.->|govern| STORE
  BRIDGE -->|alerts + metrics| DASH

Layer	Tech
World simulation	Rust workspace (15 crates), `bevy_ecs`
LLM gateway	Go (`cmd/cortex-gateway`)
Quality monitor	Go (`services/sentinel-judge`)
Dashboard	Bun + Hono + vanilla-JS (`dashboard/`)
Pub/Sub	Zenoh (Rust SHM <10 µs) + NATS JetStream
Storage	redb (state) + Limbo SQLite (events)

For a terminal-friendly plain-text view of the same data flow see Architecture Details further down.

For per-cluster implementation status see docs/togaf-gap-v22.md. For deliberate deviations from the spec see docs/togaf-deviations-v22.md.

Quick Start

Prerequisites

Tool	Version	Purpose
Rust	1.93+	ECS world, all Rust crates
Go	1.23+	Gateway, judge, nats-bridge
Bun	1.x	Dashboard
cargo-remote (optional)	latest	Remote build server
Docker + Compose	24+	Demo stack

Configure

Sentinel takes deployment-specific values from a single local file. Copy the templates and fill in your own values:

cp .env.example .env
cp .make.local.example .make.local

The .env file holds runtime values (NATS URL, dashboard port). The .make.local file holds build values (cargo remote server address, deploy target). Neither file is committed.

Build

make ci          # full: fmt + clippy + test + cargo-deny + typos
make build       # workspace build
make test        # all tests

If you have cargo-remote configured for offload builds, those targets transparently use it.

Demo (one command)

The dashboard surfaces runtime governance signals: control-plane decisions, sandbox enforcer status, audit-event throughput, and agent quality drift.

make demo                                 # build binaries + image, then run
# or, step by step:
make demo-binaries                        # build sentinel-daemon + sentinel-nightrun
make demo-image                           # docker build
./scripts/demo.sh                         # run + open dashboard, tear down after 10 min

The Rust workspace is heavy. make demo-binaries uses cargo-remote against a build server if .cargo-remote.toml is present, otherwise falls back to a local cargo build --release (~8 GB RAM, ~20 min on a developer laptop). See CONTRIBUTING.md for cargo-remote setup if you want to offload the Rust compile.

Runs five agents through a 10-minute morning shift with the default workload configuration. Dashboard: http://localhost:18000 (host port 18000 is used because 8000 is commonly bound by local nginx/dev servers; adjust in docker-compose.demo.yml if you have 8000 free).

What the docker demo shows — and what it does not

The compose stack is deliberately a behavioral demo, not a full production deployment. It is meant to give a recruiter or curious reader a working dashboard in one command, not to reproduce the full sandbox story.

Feature	Demo container	VM deploy
ECS world, Bio-Engine, Physics	yes	yes
Event sourcing + projections + dashboard	yes	yes
Cortex Gateway pipeline + synthesis	yes	yes
NATS JetStream + sentinel-judge	yes	yes
bwrap + Landlock per-agent isolation	no (warned)	yes
cgroups v2 per-agent resource caps	no (warned)	yes
netns + nftables agent network	no (warned)	yes
eBPF probes (aya-rs)	no (warned)	yes
sentinel-fs CAS-FUSE	no (warned)	yes
Zenoh SHM transport	no (TCP only)	yes

These kernel-bound features need user namespaces, CAP_BPF, CAP_SYS_ADMIN, CAP_NET_ADMIN, and a writeable bpf-fs / /dev/fuse. A plain unprivileged container has none of those. The SandboxEnforcer (crates/sentinel-sandbox/src/enforcer.rs) detects the absence at boot and degrades gracefully — warnings in the daemon log are the expected demo signal.

For the full stack with sandbox enforcement see deploy/systemd/*.service, the deployment notes in docs/governance.md, and the TOGAF v22.1 Architecture Guide.

Customer Workshop Path

For engineering leadership and DevSecOps teams evaluating runtime governance for AI coding agents, the recommended walkthrough is a 45-minute hands-on session:

Architecture overview (10 min): TOGAF v22.1 guide, three control planes, sandbox stack.
Hands-on demo (15 min): start the demo stack, observe agent activity, replay events.
Sandbox-config inspection (10 min): bwrap + Landlock + cgroups policy walkthrough.
9/9 breakout test report review (5 min): what the tests prove, what they don't.
Q&A + production deployment caveats (5 min).

Full agenda: docs/workshop-agent-runtime-governance.md.

Demo: What it proves and what it doesn't

The included docker demo (make demo) is a deliberate behavioral subset. It is meant to give a recruiter or curious reader a working dashboard in one command, not to reproduce the full sandbox story.

What the demo proves

ECS world simulation, bio-engine, physics, room sim — 60-persona workload runs end-to-end on a 5-agent subset.
Event sourcing (Limbo SQLite, idempotent, replayable) — full audit trail captured per agent.
Cortex Gateway 7-step pipeline + 10-rule synthesis engine — agent reasoning is observable.
Dashboard (Bun + Hono + WebSocket) — live agent activity, drift, quality metrics.

What the demo does not exercise

The kernel-bound sandbox primitives (per-agent isolation) require CAP_BPF, CAP_SYS_ADMIN, CAP_NET_ADMIN, user namespaces, and a writeable bpf-fs / /dev/fuse. A plain unprivileged Docker container has none of those. The SandboxEnforcer (crates/sentinel-sandbox/src/enforcer.rs) detects the absence at boot and degrades gracefully — warnings in the daemon log are the expected demo signal.

For the full stack with sandbox enforcement (bwrap + Landlock + cgroups

netns + nftables + Wasmtime) see deploy/systemd/*.service and the TOGAF v22.1 architecture guide.

Verified by external tests

Sandbox Test Report: 9/9 breakout tests pass on a privileged host.

Status — what works in this alpha, what doesn't yet

Kernel-bound features are not missing — they are implemented + tested but not deploy-able in the docker demo. The VM deploy is the production target; the docker demo is a deliberate behavioral subset.

Area	Status	Demo-Container	VM-Deploy
ECS world (bevy_ecs), bio + physics + room sim	✅ implemented + exercised	yes	yes
Event sourcing (Limbo SQLite, idempotent, replayable)	✅ implemented + exercised	yes	yes
Cortex Gateway 7-step pipeline + 10-rule synthesis engine	✅ implemented + exercised	yes	yes
Dashboard (Bun + Hono + WebSocket)	✅ implemented + exercised	yes	yes
sentinel-judge quality + drift monitoring (NATS streaming)	✅ implemented + exercised	yes	yes
sentinel-projection CQRS read-models	✅ implemented + exercised	yes	yes
sentinel-nightrun batch consolidation, deterministic replay	✅ implemented, manual trigger	yes	yes
bwrap + Landlock per-agent isolation	✅ implemented + 9/9 breakout-tested (`crates/sentinel-sandbox/`)	no (kernel-caps)	yes
cgroups v2 per-agent caps	✅ implemented	no (kernel-caps)	yes
netns + nftables agent network	✅ implemented	no (kernel-caps)	yes
eBPF probes (aya-rs)	✅ implemented	no (kernel-caps)	yes
sentinel-fs CAS-FUSE	✅ implemented	no (FUSE)	yes
TOGAF v22.1 architecture guide + per-cluster gap report	✅ shipped in `docs/architecture/`	n/a	n/a
60 LLM-persona agents (`config/agents/AGENT-*.toml`)	✅ defined; demo runs a 5-agent subset	partial (5/60)	yes (full 60)
Pre-built demo binaries (linux-x86_64) on every release	✅ since v0.1.0-alpha	yes	yes
CodeQL pipeline	✅ green on main	n/a	n/a
Tag verified-badge on GitHub	✅ verified=true (Ed25519)	n/a	n/a
OpenGraph social-preview image	⏳ image in repo (`docs/images/opengraph-preview.png`); upload via repo Settings → Social preview pending (#351)	n/a	n/a
Demo binaries for arm64 / Apple Silicon	⏳ planned (currently linux-x86_64 only)	n/a	n/a
Multi-tenant company configs ("Gaia firmen-konfigurator")	⏳ tracked as roadmap issue (#266)	n/a	n/a

See docs/known-limitations.md for the full caveat list.

Repository Layout

Path	Contents
`crates/`	15 Rust crates (ECS, bio, physics, sandbox, eBPF, …)
`services/sentinel-daemon/`	Daemon + controlplane
`services/sentinel-judge/`	Quality / drift monitor (Go)
`services/sentinel-nightrun/`	Nightly consolidation (Rust)
`services/sentinel-nats-bridge/`	NATS event bridge (Go)
`cmd/cortex-gateway/`	LLM proxy + synthesis (Go)
`dashboard/`	Bun + Hono real-time UI
`pkg/sentinel-go/`	Shared Go package (judge heuristics, eventstore, messaging)
`config/`	Agent TOMLs, room layout, simulation parameters
`docs/`	Architecture, governance, gap, deviations, glossary
`deploy/`	systemd units, release manifest schema
`.github/workflows/`	16 CI workflows (build, test, security, supply chain)

Documentation

Doc	Purpose
llms.txt	LLM-friendly project index (read first)
docs/architecture/togaf-architecture-guide.html	Authoritative architecture reference (v22.1)
docs/governance.md	Governance mechanisms ↔ code path mapping
docs/togaf-gap-v22.md	Per-cluster implementation status
docs/togaf-deviations-v22.md	Intentional deviations from the spec
docs/glossary.md	Agent-persona narrative + agent-layer glossary
docs/security-test-report.md	Sandbox breakout test results
docs/workshop-agent-runtime-governance.md	45-min hands-on workshop: how to evaluate runtime governance for LLM coding agents
docs/research-context.md	Synthetic-workload personality model + role taxonomy + ethics
examples/	Copy-pasteable runtime-governance walkthroughs (sandbox policy, audit replay, control-plane isolation)
CONTRIBUTING.md	How to contribute
SECURITY.md	Reporting vulnerabilities
CHANGELOG.md	Release history

Architecture Details

Plain-text alternative to the Mermaid diagram above, useful for terminal-only viewers and screen-readers. Same data flow, lower fidelity:

Deterministic (ECS)              Probabilistic (LLM)
┌─────────────────────┐          ┌──────────────────────────────────┐
│ bevy_ecs World      │          │ Cortex Gateway                   │
│ Bio / Physics       │ ───────> │ 7-step pipeline                  │
│ 60 agent slots      │ <─────── │ Synthesis engine                 │
│ Event Store         │          │ Self-recognition pattern detector│
└─────────────────────┘          └──────────────────────────────────┘
          │                                   │
          └─────────── Event Sourcing ────────┘
                 (sentinel-limbo, append-only)

For full architectural depth (clusters, controlplane internals, deviation register) see the TOGAF v22.1 architecture guide and the gap report in docs/togaf-gap-v22.md.

Release status

This is the first public release boundary. The project was developed privately prior to v0.1.0-alpha; the tag marks the boundary between private development and public visibility, not the start of the project.

CI on main: ci, lint, coverage, supply-chain (cargo-deny, npm-audit, go-vuln, rust-audit), conventional-commits, dependency-freshness — green. CodeQL goes green on the first scheduled run after the public flip (GHAS gating). Security: dependency audit + gitleaks + trufflehog clean, 9/9 sandbox breakout tests passing on a privileged host.

See docs/known-limitations.md for full caveats and the Status table above for the per-feature picture.

Research Context

The synthetic office workload is a deliberate stress-test for the runtime layer. The personality model, role taxonomy, and bio-state mechanism are documented in docs/research-context.md. The platform underneath is the work; the workload is the evaluation.

Why this proof matters

When customers evaluate AI coding agent deployment, three runtime questions come back:

"How is the agent isolated from production?" — sandbox stack (bwrap + Landlock + cgroups + netns), 9/9 breakout tests passing.
"What evidence remains for review?" — event sourcing on Limbo SQLite, deterministic replay, hash-chained audit trail.
"Who decides what the agent can do?" — three independent control planes (Agent CP, Platform CP, API CP), each owning a single decision domain.

This repo is not a product. It is a reference implementation that makes those questions concrete. The TOGAF v22.1 architecture is the contract; the docker demo is a reduced behavioral subset (see Demo section above).

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 760 Commits
.cargo		.cargo
.githooks		.githooks
.github		.github
bitnet		bitnet
cmd/cortex-gateway		cmd/cortex-gateway
config		config
crates		crates
dashboard		dashboard
deploy		deploy
docs		docs
examples		examples
pkg/sentinel-go		pkg/sentinel-go
schemas		schemas
scripts		scripts
services		services
tests		tests
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.make.local.example		.make.local.example
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
TESTING.md		TESTING.md
clippy.toml		clippy.toml
deny.toml		deny.toml
docker-compose.demo.yml		docker-compose.demo.yml
go.work		go.work
llms.txt		llms.txt
renovate.json		renovate.json
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml
test-fake-api.py		test-fake-api.py
typos.toml		typos.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Sentinel

Why It Exists

Architecture at a Glance

Quick Start

Prerequisites

Configure

Build

Demo (one command)

What the docker demo shows — and what it does not

Customer Workshop Path

Demo: What it proves and what it doesn't

What the demo proves

What the demo does not exercise

Verified by external tests

Status — what works in this alpha, what doesn't yet

Repository Layout

Documentation

Architecture Details

Release status

Research Context

Why this proof matters

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Sentinel

Why It Exists

Architecture at a Glance

Quick Start

Prerequisites

Configure

Build

Demo (one command)

What the docker demo shows — and what it does not

Customer Workshop Path

Demo: What it proves and what it doesn't

What the demo proves

What the demo does not exercise

Verified by external tests

Status — what works in this alpha, what doesn't yet

Repository Layout

Documentation

Architecture Details

Release status

Research Context

Why this proof matters

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages