A local-first, security-native AI agent framework built on LangGraph.
Security is enforced in the execution path — not layered on afterward.
Important
The source repository is not yet public — this is a pre-release preview. LegionForge v0.7.1-alpha is complete and verified (2247 smoke tests, 41 integration tests), currently in final UAT before the public repository goes live. Installation instructions and the full codebase will be available at release.
Watch / star this repo to be notified the moment it opens. Questions or early access requests: jp@legionforge.org
LegionForge is an open-source framework for running hardened AI agent systems on your own hardware. It runs local LLMs via Ollama or cloud APIs (OpenAI, Anthropic), with a full security stack baked into every layer of the execution pipeline — not bolted on afterward.
The one-line pitch: The hardened, self-hosted alternative to cloud agent platforms — and a security layer that other agent frameworks can plug into.
Why it exists: In January 2026, OpenClaw hit 60,000 GitHub stars in 72 hours and 300,000+ users in weeks. Kaspersky found 512 vulnerabilities (8 critical). Cisco found active data exfiltration in third-party skills. LegionForge is built in the opposite order: security first, product on top.
| Metric | Value |
|---|---|
| Smoke tests (no services required) | 2247 / 2247 passing (~21s) |
| Integration tests (PostgreSQL) | 41 / 41 passing |
| Kerberos live-KDC tests | 5 / 5 passing |
| UI tests (Playwright) | 40 / 40 passing |
| Tool accuracy tests | 79 / 79 passing |
| Guardian security checks per tool call | 7 (deterministic, 0 LLM calls) |
| Threat classes covered | 11 |
| Prompt injection detection patterns | 29 (2-tier: halt + log) |
| Auth backends | 5 (API key, OIDC, GitHub, LDAP, Kerberos) |
| Channel connectors | 4 (Discord, Telegram, Slack, Webhook) |
| Principle | Implementation |
|---|---|
| Fail-safe tiering | halt → sandbox/retry → degrade — never silent failure |
| Human gates on all mutations | No component autonomously changes security rules, promotes tools, or escalates privileges |
| Replace AI with determinism | Repeated deterministic tasks crystallize into signed, containerized tools with zero LLM overhead |
| Validate at trust boundaries | Guardian enforces checks at every inbound/outbound boundary — agents are processing nodes, not trust boundaries |
| Privilege tied to tasks | Short-lived JWT task tokens scoped to exactly what the current task requires |
Submit tasks from wherever you already are:
| Interface | How |
|---|---|
| Web UI | http://localhost:8080/ui — live SSE streaming, tool call blocks, session history |
| Discord | !<task> in any channel the bot can see |
| Telegram | /<task> to your bot |
| Slack | !<task> via Socket Mode (no public URL needed) |
| Webhook | POST :8081/inbound — HMAC-SHA256 verified, works with n8n / Zapier / any HTTP client |
| REST API | POST /tasks + GET /tasks/{id}/stream — full SSE streaming |
A standalone FastAPI process (:9766) running a deterministic-only 7-check pipeline on every tool call. No LLM calls in the hot path. Unpoisonable. Fails safe to halt on any error.
Check 0 Tool revocation — REVOKED tools blocked before any other check (cache TTL: 10s)
Check 1 Registry + Hash — tool must be APPROVED; SHA-256 hash must match registration
Check 2 Capability boundary — negative capability list enforced per agent profile
Check 3 Destructive patterns — regex detection of destructive command patterns in tool args
Check 4 Sequence contract — agent tool sequences validated; deviations → sandbox retry
Check 5 Ed25519 signature — crystallized tools must carry a valid signing-keypair signature
Check 6 Adaptive rules — human-approved threat rules hot-reload every 10s; no restart
When agents solve the same deterministic problem repeatedly, LegionForge crystallizes the solution into a signed, containerized tool — zero LLM inference cost for routine work.
Observer ──▶ Crystallizer ──▶ Pre-HITL Analyzer ──▶ Human Gate ──▶ Ed25519-Signed Tool
The Pre-HITL Analyzer runs AST guards (subscript bypass, MRO traversal, globals()/locals() hijack) and behavioral diffs before any human reviews a proposal.
The gateway supports five auth backends — swap without touching agent code:
| Backend | Scheme | Use case |
|---|---|---|
ApiKeyBackend |
Bearer | Default — bcrypt API keys in PostgreSQL |
OIDCBackend |
Bearer | Google, Okta, Auth0, Azure AD, Keycloak, Cognito |
GitHubOAuthBackend |
Bearer | GitHub OAuth app tokens |
LDAPBackend |
Basic | OpenLDAP, Active Directory |
KerberosBackend |
Negotiate | Kerberos/GSSAPI (requires KDC + keytab) |
Set gateway.auth_provider in your hardware profile YAML.
| Threat | Defense |
|---|---|
| Tool Poisoning | SHA-256 hash validation at registration + Ed25519 cryptographic signing |
| Rug-Pull | Hash mismatch detection + signed tool versioning |
| Prompt Injection (direct + indirect) | 29-pattern sanitizer (Tier 1 halt / Tier 2 log) + NFKC normalization + zero-width char stripping + RAG provenance scoring |
| Capability Amplification | Negative capability list enforced by Guardian Check 2 |
| Resource Bomb / Economic DOS | Pre-execution token cost estimator + per-user daily budgets + per-provider rate limits |
| Credential Theft | macOS Keychain / ~/.pgpass storage + PII redaction from all outbound API calls |
| RAG / Memory Poisoning | Document provenance scoring + embedding trust threshold flagging |
| Multi-Agent Cascade | Orchestrator-only routing + signed inter-agent messages |
| Supply Chain | AI-BOM + signed tool library + SHA-256 GGUF model integrity verification |
| TOCTOU | approved_snapshot stored pre-execution; post-execution result verified against snapshot |
| AST Bypass | Subscript, MRO traversal, globals()/locals() hijack detection in crystallization analyzer |
All phases complete.
| Phase | What Was Built |
|---|---|
| 0 | PostgreSQL + pgvector, async LLM factory (Ollama/OpenAI/Anthropic), health server |
| 1 | Researcher agent, tool registry + SHA-256 hash validation, capability boundaries, threat event logging |
| 2 | Docker containerization, Guardian security sidecar, immutable audit log (SHA-256 hash chain), RAG provenance |
| 3 | JWT task tokens + ACLs, sub-agent orchestrator, sandbox retry tier |
| 4 | Threat Analyst agent, adaptive Guardian rules, AI Bill of Materials |
| 5 | Crystallization Pipeline — Observer + Crystallizer agents, pre-HITL AST analyzer, Ed25519-signed tools |
| 5.5 | DB RBAC (legionforge_app restricted role), AST bypass guards, tool revocation, TOCTOU mitigation, Ollama model integrity |
| 6 | PentestAgent — air-gapped red-team bot, 8 attack classes × 3 variants, stop-at-proof mode |
| 7 | Guardian feedback loop (pentest → threat rules → Guardian), SECURITY.md, v1.0 hardening |
| 8 | FastAPI gateway (:8080), PostgreSQL task queue, SSE streaming, web UI, A2A + MCP endpoints, Discord connector |
| 9 | langchain 1.x migration, 5-tool library, parallel agent fan-out, Phase 9.5 hardening sprint |
| 10 | Multi-user auth — DB-backed stream tokens, per-user daily token budgets, /usage/me, user management CLI |
| 11 | SecureToolNode fix, 38 integration tests, AuthBackend protocol, Dockerfile.gateway, docs/SCALING.md |
| 12 | OIDCBackend, GitHubOAuthBackend, LDAPBackend, KerberosBackend; multi-scheme require_user |
| 13 | Real GSSAPI Kerberos backend, Redis-backed stream tokens, multi-instance docker-compose + Nginx |
| 14 | Redis global budget counters, Prometheus /metrics endpoint, X-Request-ID trace middleware |
| 15 | Polished web UI — localStorage key + task history, cancel, tool call blocks, live timer, copy, keyboard shortcuts |
| 16 | Telegram (polling), Slack (Socket Mode), generic Webhook (HMAC-SHA256 + async callback) connectors |
| 60–381 | 381-tool operator dashboard — every gateway API endpoint surfaced as a UI function with smoke tests |
| Security sprint | Extended exfiltration detection, NFKC normalization, DESTRUCTIVE_PATTERN DB logging, PostgreSQL scram-sha-256 |
| Web + Browser tools | web_fetch_js Playwright headless browser tool for JS-rendered sites; two-layer SSRF guard; private-IP PII regex fix |
| Lazy-load Dashboard | 296 operator tool cards moved into <template> — injected on first click; eliminates startup parse cost |
| Guardian spinoff G1–G4 | packages/legionforge_guardian standalone package; python -m legionforge_guardian entry point; legionforge-guardian v0.2.0 on PyPI; public repo at LegionForge/LegionForge-Guardian with auto-sync Action |
| Agent Memory — all 5 gaps | Persona bootstrap (Gap 1, DB-backed SOUL.md), user prefs (Gap 5), memory_write/memory_recall tools (Gap 3), daily episodic summaries (Gap 2), pre-compaction flush (Gap 4) — OpenClaw parity |
| UI polish | 4 color themes (Solarized, Warm, Nord, High-Contrast) + multi-theme cycler + favicon; session continuity sidebar with per-session turn count badge |
| Phase I — Multi-modal input | Paste or drag images directly into the task input; vision API routing to Ollama vision models or Anthropic Claude (auto-detected by model capability) |
| HITL approval flow | LangGraph interrupt_before operator gate — destructive tasks pause for human approval before execution; GET /hitl/pending + POST /hitl/{id}/approve |
| Phase J — WhatsApp | WhatsApp Business Cloud API connector — webhook ingestion, HMAC signature verification, message routing to gateway task queue |
| Security hardening | 8 targeted fixes: timing oracle, SSRF hardening, log injection guard, prompt injection tightening, pgvector isolation, budget atomicity, concurrency fix, admin audit trail |
| Component | Version | Notes |
|---|---|---|
| Python | 3.11+ | via pyenv recommended |
| PostgreSQL | 16 or 17 | with pgvector extension |
| Ollama | latest | for local LLM inference |
| Docker | 24+ | for Guardian sidecar + analyzer container |
| macOS | 14+ (Apple Silicon) | primary target; Linux support planned |
Recommended hardware: Mac Mini M4 16GB for single-user; Mac Mini M4 Pro 24GB for household (2–4 concurrent users).
# 1. Clone and set up
git clone https://github.com/LegionForge/LegionForge.git
cd LegionForge
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# 2. Set hardware profile
export AGENT_HARDWARE_PROFILE=mac_m4_mini_16gb
echo 'export AGENT_HARDWARE_PROFILE=mac_m4_mini_16gb' >> ~/.zshrc
# 3. Store PostgreSQL admin password
echo "localhost:5432:*:$(whoami):yourpassword" >> ~/.pgpass && chmod 0600 ~/.pgpass
# First install with default Homebrew trust auth? Use: export POSTGRES_TRUST_AUTH=true
# 4. Pull Ollama models
ollama pull llama3.1:8b # ~4.7 GB — primary agent model
ollama pull qwen2.5:3b # ~2.0 GB — fast router model
ollama pull nomic-embed-text # ~0.3 GB — RAG embeddings
# 5. Initialize the database and generate secrets
make db-init
make setup-task-token-secret
make setup-signing-key
# 6. Run smoke tests (no services required)
make test-smoke
# Expected: 2247 passed in ~21s
# 7. Start services (three terminals)
make health-server # Operator API at :8765
make gateway-start # User API + Web UI at :8080
make guardian-start # Security sidecar at :9766 (requires Docker)
# 8. Create a user and open the web UI
make create-user USERNAME=myname
open http://localhost:8080/ui→ Full setup guide: docs/quick-start.md
| Document | What It Covers |
|---|---|
docs/quick-start.md |
Step-by-step setup, connecting channels, first task, troubleshooting |
docs/architecture.md |
All components, ports, ASCII diagram, connection rationale |
docs/SCALING.md |
Horizontal scaling, Redis, Kerberos KDC, multi-instance Docker |
SECURITY.md |
Threat model, HITL policy, injection detection architecture |
TLDR.md |
Project orientation — what was built and why |
CHANGELOG.md |
Full version history |
docs/VISION.md |
Product vision, architecture rationale, design decisions |
| File | Purpose |
|---|---|
src/base_graph.py |
LangGraph agent template — copy to create new agents |
packages/guardian/src/legionforge_guardian/app.py |
Guardian sidecar — canonical source; deterministic 7-check security pipeline |
src/security/guardian.py |
Backward-compat shim — re-exports everything from legionforge_guardian.app |
src/security/core.py |
Keychain loader, PII redaction (8 patterns), injection detection (29 patterns), I/O sanitizer |
src/database.py |
Async PostgreSQL pool, LangGraph checkpointer, pgvector, audit log hash chain |
src/safeguards.py |
Three-layer loop protection (step counter, action history, token budget) |
src/gateway/app.py |
FastAPI gateway (:8080) — task queue, SSE streaming, web UI, A2A, MCP |
src/gateway/backends/ |
Auth backend package — ApiKey, OIDC, GitHub, LDAP, Kerberos |
src/connectors/ |
Channel connectors — Discord, Telegram, Slack, Webhook |
src/tools/signing.py |
Ed25519 keypair management + tool manifest signing |
src/tools/crystallization_analyzer.py |
Pre-HITL AST + behavioral diff analyzer |
config/settings.py |
Pydantic settings singleton (loaded from hardware YAML profile) |
Makefile |
All development, test, and operational commands |
# Environment
make check # Verify environment before starting
make start # Full startup (Ollama + PostgreSQL + model warmup)
make stop # Graceful shutdown
# Testing
make test-smoke # 2247 smoke tests, ~21s, no services required
make test-integration # 41 integration tests (requires PostgreSQL)
make test-kerberos # 5 Kerberos live-KDC tests (requires KDC)
make test-ui # 40 UI tests (Playwright)
make test-fast # All tests except slow ones
# Code quality
make lint # Black formatter check
make format # Auto-format
make security-audit # Smoke + bandit + URI secret scan
# Services
make health-server # Operator health API at :8765
make gateway-start # User-facing gateway at :8080
make guardian-start # Guardian sidecar at :9766 (requires Docker)
make discord-start # Discord bot connector
make telegram-start # Telegram bot connector
make slack-start # Slack Socket Mode connector
make webhook-start # Generic webhook connector at :8081
# Users & auth
make create-user USERNAME=<name>
make create-user USERNAME=<name> DAILY_LIMIT=<tokens>
# Security
make pentest # Air-gapped red-team suite (stop-at-proof)
make pentest-report # View most recent pentest findings
make audit-log-verify # Verify SHA-256 hash chain integrity
make revoke-tool TOOL_ID=<id> # Emergency tool revocation- Embedding-level anomaly detection — RAG poisoning at the semantic vector level is an open research problem. Provenance scoring and trust flagging exist; embedding-level detection is deferred.
- pip-audit / dependency hash pinning — Managed via Dependabot.
pip-auditreports no known CVEs; transitive hash pinning is accepted residual risk. - langchain 1.x migration — Planned for a future phase; current stack runs langchain 0.3.x. A LOW-severity SSRF CVE in
langchain-coreis accepted risk (LegionForge never calls the affected method with image content).
LegionForge exists in a space shaped by several projects and thinkers worth calling out directly.
OpenClaw (Peter Steinberger — née Clawd → Clawdbot → Moltbot → OpenClaw) — the primary inspiration for LegionForge. Proved the demand (60,000 GitHub stars in 72 hours), proved the architecture (six-component structure, workspace-as-files memory), and proved what happens when security is an afterthought (512 vulnerabilities, active data exfiltration in third-party skills). LegionForge is building in the opposite order: security first, product on top.
LangGraph — the graph execution engine underneath everything. Checkpoint-based state persistence, loop protection, and graph resumption are LangGraph primitives that LegionForge builds on heavily.
LATM — Learning to Use Tools by Making Them (Cai et al., ICLR 2024) and Voyager (Wang et al., NVIDIA 2023) — the foundational academic work closest to LegionForge-Anneal's tool crystallization pipeline.
Anchor Engine by Robert S. Balch II — deterministic semantic memory using graph traversal (the STAR algorithm). Anchor's insight that agent memory should be explainable and deterministic directly informed LegionForge's temporal decay memory recall. The STAR gravity formula is adapted from Anchor's published whitepaper.
The AI-Human Engineering Stack by Hayen Mill and Henrique Jr. Sanchez (March 2026) — a five-layer framework for AI engineering (Prompt, Context, Intent, Judgment, Coherence). The KV-cache stability insight from this paper directly motivated the context ordering in LegionForge's agent message assembly.
The security-first design here is a direct response to watching these ecosystems grow fast and ship security as an afterthought. That's not a criticism — it's the reality of how open-source moves. This project is an attempt to show what the stack looks like when security is the first constraint, not the last.
For the full canonical record of design influences, academic inspirations, and third-party attributions, see CREDITS.md. For machine-readable citation data, see CITATION.cff.
AGPL-3.0 with Section 7(b) attribution requirement.
Copyright 2026 John Paul "Jp" Cruz. Commercial licensing available — contact via GitHub Issues.
v0.7.1-alpha — Active Development. This project is currently under active development and is not yet at a stable 1.0 release. The security stack, gateway, and tool library are functionally complete and well-tested, but the project is still evolving toward its v1.0.0 public release.
| Version | v0.7.1-alpha |
| Smoke tests | 2247/2247 passing |
| Integration tests | 41/41 |
| Kerberos tests | 5/5 |
| UI tests | 40/40 |
| Pre-v1.0 security blockers | All resolved |
| APIs / config formats | May change before v1.0.0 |
Contributions, issues, and commercial licensing inquiries welcome via GitHub Issues.