Agency Core

The autonomous AI platform for engineering teams — self-hosted, privacy-first, runs anywhere.

What is Agency Core?

Agency Core is a self-hosted autonomous AI platform that turns any server — your laptop, a $10 VPS, or a GPU box — into a private AI team. It ships a CEO orchestrator agent and a fleet of domain specialists that work together on real engineering and business tasks: writing code, opening pull requests, running tests, updating docs, and managing recurring operations — all without sending your data to the cloud.

At its core it is also a drop-in OpenAI-compatible proxy for Ollama, so Cursor, Continue, Aider, Claude Code, and any other AI coding tool can point at http://localhost:8000 and use your local models through a single authenticated endpoint.

Why self-hosted autonomous agents?

The problem	What Agency Core does instead
Frontier AI tools upload your code to third-party servers	Everything runs on hardware you control; data never leaves your perimeter
ChatGPT / Copilot give one-shot answers, not persistent work	Agents plan, execute, verify, and loop back only when a human decision is needed
Managing multiple AI tools means multiple accounts, keys, and bills	One platform, one API key, one dashboard — unlimited local inference
AI "agents" are demo toys that can't commit code or open PRs	Full git integration: branch → commit → PR → CI watch → HITL approval gate → merge
No visibility into what the AI did or why	Langfuse observability: every LLM call, token count, latency, cost, and decision trace
Cloud AI pricing scales with usage — costs explode at team scale	Marginal inference cost is electricity; scale a 50-person team for the same server bill

The autonomous agency — what your agents can do

Once onboarded, Agency Core runs a fleet of specialists coordinated by a CEO agent. You describe what you want in plain English; the CEO decomposes it into a structured plan, assigns subtasks to the right specialist, and returns results with evidence — PR links, test output, diffs, and reasoning traces.

Engineering agents

Bug fixing: analyse a bug report, write a fix, open a PR, watch CI, wait for your approval before merging
Dependency audit: scan for CVEs, create a safe upgrade PR with passing tests
Code review: check any PR for security holes, N+1 queries, missing error handling, and injection risks
Test generation: write unit and integration tests for new or existing code
Refactoring: identify tech debt hotspots, propose a refactor plan, execute on approval
Release management: bump version, draft changelog, tag, verify CI, open the release PR
Documentation: keep API docs, architecture records, and runbooks in sync with code changes

Content & knowledge agents

Write product descriptions, blog posts, or wiki articles from a brief
Keep your internal knowledge base accurate — agents update docs when code changes
Summarise and classify incoming GitHub issues, Slack threads, and support tickets
Schedule weekly trend digests and release notes automatically

Operations agents

Monitor CI/CD pipelines and alert you when something needs a human decision
Manage recurring schedules: daily summaries, weekly audits, on-call handoffs
Classify every request to the optimal local model (code → Qwen3-Coder, reasoning → DeepSeek-R1)
Provide real-time health diagnostics for all running agents, runtimes, and providers

From onboarding to autonomous work — step by step

Step 1 — Boot and activate

Deploy Agency Core (Docker, Render, or uvicorn locally). On first boot, open the web UI and run the Setup Wizard — it walks you through five steps:

Connect your Ollama instance (or enter a cloud provider key — Nvidia NIM, AWS Bedrock, Anthropic)
Generate your first API key
Create your admin account
Pull a model (qwen2.5-coder:7b for starters — free, no GPU required via Nvidia NIM)
Run a health check — the system confirms every dependency is reachable

The Doctor screen (accessible any time from the sidebar) repeats this check live: git binary, GitHub token, repo access, Langfuse connectivity, and all registered runtimes. Green across the board means you're ready.

No local GPU? Set LLM_PROVIDER=nvidia-nim and NVIDIA_API_KEY=<your-key> to use Nvidia's free-tier hosted models. Zero local hardware required.

Step 2 — Describe your company

Open the Company screen. Paste your repository URL and answer a short set of tailored questions about your stack, team size, and goals. Agency Core builds an internal knowledge graph so agents give context-aware answers ("use Pydantic v2 for this, that's what your codebase uses") instead of generic advice.

You can update this profile any time. Agents re-index it on each task cycle.

Step 3 — Talk to the CEO agent

Open Chat. Describe what you want the way you'd brief a senior engineer:

"There's a memory leak in the session manager reported in issue #142. Find the root cause, write a fix, and open a PR for my review."

The CEO agent:

Reads the issue and the relevant source files
Produces a structured plan — you can review and edit it before execution starts
Delegates to the Dev specialist
Returns a PR link, a summary of the fix, and the test results

Every conversation is persisted. Pick up where you left off across sessions.

Step 4 — Watch the Task Board

Every agent job appears on the Task Board with live status:

queued → planning → executing → verifying → awaiting approval → done

Drill into any task to see:

The original plan the CEO agent produced
Every step the executing agent took (with diffs and tool call logs)
The verification result (did the tests pass?)
The judge's verdict (is the output production-ready?)
A plain-English summary you can paste into Slack

Step 5 — HITL approval gates

Agency Core never merges code, deploys to production, or sends external messages without your sign-off. When an agent reaches a gate it:

Pauses and surfaces the decision in your dashboard
Shows you exactly what will happen — the diff, the deploy command, the message body
Waits for your Approve, Deny, or Redirect (send back with comments)

Gates are configurable per task type: auto-approve low-risk operations like reformatting docs, require explicit sign-off on anything touching production.

Step 6 — Schedule recurring work

Open Schedules and set up recurring agent tasks:

Daily: summarise open PRs and surface anything blocked
Weekly: dependency CVE audit, changelog draft, code quality report
Per-commit: trigger a doc-sync agent on every merge to master
On-demand: one-click "run all agents" for a sprint review

Agents run on schedule, push results to the Task Board, and only interrupt you when a human decision is needed.

The V5 Control Plane — every screen

Screen	What it does
Dashboard	Live health of all agents, recent activity, and system metrics at a glance
Chat	Conversational interface to the CEO agent; full persistent history per session
Task Board	Kanban view of all agent jobs: queued → planning → executing → review → done
Agents	All registered specialists with capabilities, current workload, runtime, and model
Providers	Connected LLM providers (Ollama, AWS Bedrock, Nvidia NIM) with health and cost data
Runtimes	Execution substrates — internal loop, Docker agent, external harnesses (OpenCode, Aider, Goose)
Knowledge	Internal wiki maintained by agents from your code, docs, and past decisions
Schedules	Recurring agent tasks with cron-style timing and run history
Skills	The agent skill library — what each specialist knows how to do and when it activates
Intelligence	Routing policy editor — control which model handles which task type and at what cost tier
Logs	Full trace of every LLM call: token count, latency, provider, cost, and decision context
Company	Your organisation profile, tech stack, and knowledge graph seed data
Admin	User management, role assignment, instance activation, audit log, onboarding controls
Doctor	Self-diagnostics — checks every dependency, connectivity, and configuration item live

Screens

A visual tour of the dashboard. Screenshots reflect the most recent captured UI and are regenerated from scripts/sync_readme_gallery.py.

🛰 Control Plane

The command center: live agent health, recent activity, and system metrics at a glance.

🛬 Login

People can sign in through a simple starting page instead of touching raw config files.

🧙 Setup Wizard

The wizard helps you choose providers, models, runtimes, a default agent, and a cost policy.

💬 Chat

This is where people talk to the CEO agent directly, using the providers and rules you set up.

🗂 Task Board

This makes AI work visible. You can see what is waiting, running, blocked, in review, or done.

🤖 Agent Roster

This is your cast of AI helpers. Each agent can have its own model, runtime, specialty, and rules.

⚙️ Runtimes

This shows the engines behind the scenes that actually run your AI work.

🛣 Routing Policy

This is where you decide how smart, cheap, fast, or private the system should be when picking a model.

🔌 Providers and Models

This is where you connect local and cloud AI sources and decide what models are available.

📚 Knowledge

This is your team's memory: wiki pages, source material, and reusable context.

🔭 Logs and activity

This helps you answer, ‘what just happened?’ — every LLM call, token count, latency, and cost.

🗓 Schedules

This is how you make AI jobs run later or run again automatically.

🧭 Settings and guardrails

Central settings keep defaults, policies, and integrations in one place instead of scattered config files.

🛡 Admin portal

This gives admins a simpler place to manage access, instance activation, and system behavior.

📱 Mobile

The dashboard is responsive — sign in, run the setup wizard, and monitor agents from a phone.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│   React V5 SPA (GitHub Pages)     Remote Admin (Vercel)          │
└──────────────────┬───────────────────────────────────────────────┘
                   │ HTTPS / JWT Bearer
┌──────────────────▼───────────────────────────────────────────────┐
│  FastAPI Backend (Render / Docker)                                │
│  ├─ /v1/chat/completions     OpenAI-compatible proxy             │
│  ├─ /api/chat/send           Agency Core conversational API      │
│  ├─ /api/tasks/*             Task CRUD + async dispatcher        │
│  ├─ /api/agent/*             Agent job management + HITL gates   │
│  ├─ /api/doctor              Live system health diagnostics      │
│  ├─ /api/activation/*        Instance licensing + user mgmt      │
│  └─ /mcp-internal            MCP server for agent tool calls     │
├──────────────────────────────────────────────────────────────────┤
│  ModelRouter — task classification → optimal model selection     │
│  ├─ Code tasks      → Qwen3-Coder / DeepSeek-Coder              │
│  ├─ Reasoning       → DeepSeek-R1                                │
│  └─ Fast / chat     → smallest capable model                     │
├──────────────────────────────────────────────────────────────────┤
│  AgentRunner — plan → execute → verify → judge → summarise       │
│  ├─ CEO agent (orchestrator + domain classifier)                 │
│  ├─ Dev / Release / Content / Analytics / Infra specialists      │
│  └─ Workflow engine (persisted state machine, HITL gates)        │
├──────────────────────────────────────────────────────────────────┤
│  Task Dispatcher — async poll loop + crash-recovery reconciler   │
│  ├─ Per-task git worktree isolation (concurrent-safe execution)  │
│  └─ Opt-in external runtimes: Docker, OpenCode, Aider, Goose    │
├──────────────────────────────────────────────────────────────────┤
│  Storage (swappable at runtime)                                  │
│  ├─ MongoDB (default) — Motor async driver                       │
│  └─ SQLite (STORAGE_BACKEND=sqlite) — zero external deps         │
│  Observability — Langfuse traces + local TCO cost model          │
└──────────────────────────────────────────────────────────────────┘

Quickstart

Prerequisites

Python 3.13+
Ollama with at least one model — or a free Nvidia NIM API key (no local GPU needed)
Node 20+ (for the web UI)
MongoDB — or set STORAGE_BACKEND=sqlite to skip it entirely

1. Clone and install

git clone https://github.com/strikersam/local-llm-server.git
cd local-llm-server
python -m venv .venv && source .venv/bin/activate
pip install -r backend/requirements.txt

2. Configure

cp .env.example .env
# Minimum required:
#   SECRET_KEY=$(openssl rand -hex 32)
#   STORAGE_BACKEND=sqlite          # skip MongoDB
#   ADMIN_EMAIL=you@example.com
#   ADMIN_PASSWORD=changeme
#
# Add one of:
#   OLLAMA_BASE_URL=http://localhost:11434   # local GPU
#   NVIDIA_API_KEY=nvapi-...                 # free cloud inference

3. Start the backend

uvicorn backend.server:app --reload --port 8001

4. Start the frontend (development)

cd frontend
npm install
REACT_APP_BACKEND_URL=http://localhost:8001 npm start

Visit http://localhost:3000 — the setup wizard appears on first boot.

5. Connect your AI coding tools

// Cursor — settings.json
{
  "cursor.ai.openaiBaseUrl": "http://localhost:8000",
  "cursor.ai.openaiApiKey": "your-api-key-here"
}

See client-configs/ for Aider, Continue, Zed, VSCode, and Claude Code configs.

Cloud deployment (Render + GitHub Pages)

Push to master — GitHub Actions does the rest automatically:

CI: Python 3.13 tests, frontend build, lint, SAST, secret scan, CVE audit
Backend: Docker build → Render deploy hook → health check
Frontend: React build → GitHub Pages

Required repository secrets:

Secret	Where to get it
`RENDER_DEPLOY_HOOK_URL`	Render dashboard → service → Settings → Deploy Hook
`RENDER_BACKEND_URL`	Your Render service URL (e.g. `https://my-service.onrender.com`)

Live demo:

Frontend: https://strikersam.github.io/local-llm-server/
Backend API: https://local-llm-server.onrender.com/docs

Render free tier note: the backend sleeps after 15 minutes of inactivity and takes ~30 s to wake. Upgrade to Starter ($7/mo) to eliminate cold starts in production.

Configuration reference

Variable	Default	Description
`SECRET_KEY`	(required)	JWT signing key — `openssl rand -hex 32`
`STORAGE_BACKEND`	`mongo`	Set to `sqlite` for zero-dependency storage
`MONGO_URL`	`mongodb://localhost:27017`	MongoDB connection string
`OLLAMA_BASE_URL`	`http://localhost:11434`	Local Ollama server
`LLM_PROVIDER`	`ollama`	`ollama` · `nvidia-nim` · `deepseek` · `bedrock` · `anthropic`
`NVIDIA_API_KEY`	(optional)	Nvidia NIM free-tier models — no local GPU required
`AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY`	(optional)	AWS Bedrock (Claude Opus, Titan)
`ANTHROPIC_API_KEY`	(optional)	Direct Anthropic API
`DEEPSEEK_API_KEY`	(optional)	DeepSeek cloud API
`GITHUB_TOKEN`	(optional)	Required for agents that open PRs, review code, or read issues
`LANGFUSE_HOST` + `_PUBLIC_KEY` + `_SECRET_KEY`	(optional)	Observability traces
`TELEGRAM_BOT_TOKEN`	(optional)	Remote control via Telegram
`ADMIN_EMAIL` + `ADMIN_PASSWORD`	(optional)	First admin — created on first boot
`RUNTIME_DOCKER_ENABLED`	`false`	Enable Docker agent runtime
`RUNTIME_OPENHANDS_ENABLED`	`false`	Enable OpenHands runtime
`RUNTIME_AIDER_ENABLED`	`false`	Enable Aider runtime

Full reference: docs/configuration.md

Provider priority chain

Agency Core tries providers in order until one responds:

AWS Bedrock (15) → Nvidia NIM (10) → DeepSeek (8) → Anthropic (7) → HuggingFace (5) → Ollama (3)

Only providers with keys configured are tried. Set just the keys you have.

Security

No secrets in source — all configuration via environment variables; nothing hardcoded
Ed25519 instance activation — tamper-evident licensing signatures
RBAC: three roles — user, power_user, admin
Bearer token auth on every API endpoint; JWT with configurable expiry
Audit log for all admin actions (user creation, key generation, role changes)
Bandit SAST + CodeQL + GitHub secret scanning on every push
Dependency CVE audit on every PR via pip-audit
Per-task git worktree isolation — concurrent agents cannot clobber each other's in-flight edits
Crash-recovery reconciler — stranded IN_PROGRESS tasks are automatically re-queued on restart

Found a vulnerability? Open a security advisory — please don't file a public issue.

Development

# Run tests — always before committing
pytest -x            # fast-fail mode
pytest -v            # verbose with full output

# Activate git hooks (blocks commits missing changelog entries)
git config core.hooksPath .claude/hooks

# Generate a new API key
python generate_api_key.py

# AI session watchdog (auto-resume AI coding sessions)
python scripts/ai_runner.py start
python scripts/ai_runner.py status
python scripts/ai_runner.py resume

See CLAUDE.md for the full contributor guide, skill map, risky-module policy, and AI agent working rules.

Roadmap

Phase	Status	Description
Phase 1 — Typed agent contract	✅ Done	`AgentJobRequest` / `AgentJobResult` Pydantic contract, E2E tests
Phase 2 — ModelRouter wiring	✅ Done	Single router for all request types; classification → model hint
Phase 3 — SQLite + one backend	✅ Done	Swappable storage adapter, dead-router removal, zero-dep option
Phase 4 — Runtime resilience	✅ Done	Crash-recovery reconciler, worktree isolation, opt-in external runtimes
Phase 5 — Doctor & dashboard resilience	✅ Done	`/api/doctor` endpoint, `useSafeData` hook, live DoctorScreen
Phase 6 — Workflow engine	🔄 In progress	Persisted state machine, safe CEO agency (branch/PR safety)
Phase 7 — Onboarding engine	📋 Planned	URL → stack inference → tailored questions → specialist provisioning
Phase 8 — Multi-tenant	📋 Planned	Organisation isolation, per-tenant model budgets, SSO

License

MIT — see LICENSE

_{Built for engineers who want the power of frontier AI without the cloud bill or the privacy compromise.}

Name		Name	Last commit message	Last commit date
Latest commit History 1,108 Commits
.agents/skills		.agents/skills
.claude		.claude
.codeql		.codeql
.devcontainer		.devcontainer
.emergent		.emergent
.githooks		.githooks
.github		.github
agent		agent
agents		agents
backend		backend
client-configs		client-configs
config-export		config-export
db		db
docker		docker
docs		docs
features		features
frontend		frontend
handlers		handlers
hardware		hardware
mcp_server		mcp_server
memory		memory
models		models
prompts		prompts
remote-admin		remote-admin
router		router
runtimes		runtimes
schedules		schedules
scratch		scratch
scripts		scripts
services		services
setup		setup
sync		sync
tasks		tasks
templates		templates
tests		tests
webui		webui
worker		worker
workflow		workflow
workspace		workspace
.dockerignore		.dockerignore
.gitconfig		.gitconfig
.gitignore		.gitignore
.python-version		.python-version
.replit		.replit
AGENCY_CORE_V5_PROGRESS.md		AGENCY_CORE_V5_PROGRESS.md
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Dockerfile.backend		Dockerfile.backend
Dockerfile.dashboard.frontend		Dockerfile.dashboard.frontend
Dockerfile.frontend		Dockerfile.frontend
Dockerfile.runtime		Dockerfile.runtime
Makefile		Makefile
README.md		README.md
REVIEW_AND_FIXES.md		REVIEW_AND_FIXES.md
TOOLS.md		TOOLS.md
activation.py		activation.py
activation_api.py		activation_api.py
admin_auth.py		admin_auth.py
admin_gui.py		admin_gui.py
audit.py		audit.py
backend_test.py		backend_test.py
backend_test_iteration3.py		backend_test_iteration3.py
build-workflow		build-workflow
chat_handlers.py		chat_handlers.py
check_auto.py		check_auto.py
claude-code.bat		claude-code.bat
commercial_equivalent.py		commercial_equivalent.py
cost_insights.py		cost_insights.py
design_guidelines.json		design_guidelines.json
direct_chat.py		direct_chat.py
docker-compose.yml		docker-compose.yml
download_models.ps1		download_models.ps1
fix-pr271-remaining-bugs.patch		fix-pr271-remaining-bugs.patch
generate_api_key.py		generate_api_key.py
get_tunnel_url.ps1		get_tunnel_url.ps1
get_tunnel_url.sh		get_tunnel_url.sh
github-pages-index.html		github-pages-index.html
github-pages-setup.html		github-pages-setup.html
index.html		index.html
infra_cost.py		infra_cost.py
install.ps1		install.ps1
install.sh		install.sh
key_store.py		key_store.py
langfuse_obs.py		langfuse_obs.py
launch-claude-code.ps1		launch-claude-code.ps1
launch-claude-code.sh		launch-claude-code.sh
launcher.py		launcher.py
netlify.toml		netlify.toml
provider_router.py		provider_router.py
proxy.py		proxy.py
pytest.ini		pytest.ini
rbac.py		rbac.py
register_task.ps1		register_task.ps1
render.yaml		render.yaml
requirements.txt		requirements.txt
run-claude-code.py		run-claude-code.py
run.bat		run.bat
run.sh		run.sh

Folders and files

Latest commit

History

Repository files navigation

Agency Core

The autonomous AI platform for engineering teams — self-hosted, privacy-first, runs anywhere.

What is Agency Core?

Why self-hosted autonomous agents?

The autonomous agency — what your agents can do

Engineering agents

Content & knowledge agents

Operations agents

From onboarding to autonomous work — step by step

Step 1 — Boot and activate

Step 2 — Describe your company

Step 3 — Talk to the CEO agent

Step 4 — Watch the Task Board

Step 5 — HITL approval gates

Step 6 — Schedule recurring work

The V5 Control Plane — every screen

Screens

🛰 Control Plane

🛬 Login

🧙 Setup Wizard

💬 Chat

🗂 Task Board

🤖 Agent Roster

⚙️ Runtimes

🛣 Routing Policy

🔌 Providers and Models

📚 Knowledge

🔭 Logs and activity

🗓 Schedules

🧭 Settings and guardrails

🛡 Admin portal

📱 Mobile

Architecture

Quickstart

Prerequisites

1. Clone and install

2. Configure

3. Start the backend

4. Start the frontend (development)

5. Connect your AI coding tools

Cloud deployment (Render + GitHub Pages)

Configuration reference

Provider priority chain

Security

Development

Roadmap

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages