StackBilt LLM Gateway

A local routing layer that sits between Claude Code (or Codex) and the upstream model providers. Claude Code thinks it is talking to Anthropic. The gateway intercepts each request, classifies the cognitive load, and routes cheap work to Groq or Cerebras — keeping Anthropic in reserve for tool execution and large-context reasoning.

The result: Anthropic tokens spent only when Anthropic is actually needed.

How it works

Every Claude Code request passes through /v1/messages. The gateway classifies the request and routes it:

Route class	Signal	Provider
`tool_loop`	`tool_result` message in history — Claude is mid-execution	Anthropic
`long_context`	Estimated input >12k tokens	Anthropic
`planning`	Tools present but no tool loop, reasoning about approach	Groq (`llama-3.3-70b-versatile`)
`code_draft`	Code generation intent, no tool loop	Groq
`summary`	Summarize / explain / extract	Cerebras (`llama3.1-8b`)
`fallback_safe`	Unknown	Anthropic

Shadow mode (on by default): the gateway routes everything to Anthropic but logs what it would have routed, and projects savings per turn. Check /shadow/stats after a session to see the breakdown. Flip shadowMode: false in config to go live.

Setup

1. Add API keys

Create .env at the repo root (already .gitignored):

CEREBRAS_API_KEY=csk-...
GROQ_API_KEY=gsk_...

# Optional: route through your own Cloudflare Workers AI account from local Node.
# When these are set, the gateway builds an AI binding shim automatically.
CLOUDFLARE_ACCOUNT_ID=...
CLOUDFLARE_API_TOKEN=...
# Optional override for API base (default: https://api.cloudflare.com/client/v4)
# CLOUDFLARE_API_BASE_URL=https://api.cloudflare.com/client/v4

# Optional: add your personal Anthropic key if you want tool_loop / long_context
# turns to actually hit Anthropic instead of erroring.
# Do NOT use the aegis-web worker key here — that account has no credits.
ANTHROPIC_API_KEY=sk-ant-...

STACKBILT_GATEWAY_KEY defaults to local-dev-key if not set. Override it here if you want a stronger local auth secret.

2. Bootstrap setup

npm run setup

3. Install global launcher (run once)

npm run install:global

One-command launch (from any repo)

From your project repo (any directory):

stackbilt-gw claude       # start gateway if needed, then launch Claude via gateway
stackbilt-gw codex        # start gateway if needed, then launch Codex via gateway

First-time key setup (interactive prompts):

stackbilt-gw init

If you prefer local repo scripts (while in llm-gateway):

npm run claude
npm run codex

Gateway lifecycle commands:

npm start                        # gateway up
npm run stop                     # gateway down
npm run status                   # up/down + pid/log path
npm run logs                     # tail gateway logs
npm run doctor                   # validate cli/tools/env/provider setup
npm run install:global           # install stackbilt-gw into ~/.local/bin
npm run uninstall:global         # remove global launcher
npm run gateway -- restart       # restart
npm run gateway -- up            # explicit up
npm run gateway -- down          # explicit down

Direct script usage:

./gateway.sh up
./gateway.sh init
./gateway.sh claude
./gateway.sh codex

Optional environment overrides:

STACKBILT_GATEWAY_PORT=9000 npm run claude
STACKBILT_GATEWAY_KEY=my-local-key npm run codex

Before first run, check setup (already included in npm run setup):

npm run doctor

Manual env mode (optional)

If you prefer manual terminal wiring instead of npm run claude:

export ANTHROPIC_BASE_URL=http://localhost:8787
export ANTHROPIC_API_KEY=local-dev-key
claude

Observability

All endpoints except /health require the header x-api-key: local-dev-key (or whatever STACKBILT_GATEWAY_KEY is set to).

Provider health

curl http://localhost:8787/providers?live=1 -H "x-api-key: local-dev-key"

Shows each provider's circuit-breaker state, error count, and available models.

Request metrics

curl http://localhost:8787/metrics -H "x-api-key: local-dev-key"

Aggregate counts by provider and route class, average latency, estimated cost.

Shadow stats (the important one)

curl http://localhost:8787/shadow/stats -H "x-api-key: local-dev-key"

Shows — per route class — how many turns would have been offloaded and the projected USD savings. Use this after a real Claude Code session to decide whether the routing is safe to enable live.

Example output:

{
  "shadowMode": true,
  "totalRequests": 47,
  "shadowedRequests": 31,
  "totalProjectedSavingsUsd": 0.043,
  "byRoute": {
    "planning": { "count": 18, "projectedSavingsUsd": 0.024, "confidence": { "medium": 18 } },
    "summary":  { "count": 13, "projectedSavingsUsd": 0.019, "confidence": { "high": 13 } }
  }
}

Recent events

curl http://localhost:8787/events/recent -H "x-api-key: local-dev-key"

Last 100 requests with full routing metadata.

Enabling live routing

Once you have shadow data you trust, open packages/llm-gateway/src/config.ts and set:

shadowMode: false,

Rebuild and restart:

npm run build && npm start

planning and summary turns now hit Groq/Cerebras live. tool_loop and long_context still go to Anthropic.

Context compaction

The gateway exposes a dedicated endpoint that distills a Claude Code session transcript into structured facts. Runs on Cerebras (free).

curl -X POST http://localhost:8787/v1/context/compact \
  -H "Content-Type: application/json" \
  -H "x-api-key: local-dev-key" \
  -d '{
    "messages": [
      {"role": "user", "content": "..."},
      {"role": "assistant", "content": "..."}
    ]
  }'

Returns:

{
  "ok": true,
  "provider": "cerebras",
  "compact": {
    "durable_facts": [],
    "decisions_made": [],
    "files_changed": [],
    "open_questions": [],
    "next_actions": [],
    "context_to_discard": ""
  }
}

Use this at the end of a long session to extract what actually matters before starting a fresh context.

Troubleshooting

`stackbilt-gw: command not found`

Install the global launcher and ensure ~/.local/bin is in your PATH:

npm run install:global
export PATH="$HOME/.local/bin:$PATH"

Persist the PATH line in ~/.bashrc or ~/.zshrc.

`doctor` fails with missing provider keys

Run interactive setup:

stackbilt-gw init

This writes missing keys to llm-gateway/.env.

`claude` or `codex` binary not found

Install the missing CLI, then re-run:

stackbilt-gw doctor

Port 8787 already in use

Use another port:

STACKBILT_GATEWAY_PORT=9000 stackbilt-gw claude

Gateway appears stuck or unhealthy

Check status/logs, then restart:

stackbilt-gw status
stackbilt-gw logs
stackbilt-gw restart

Configuration

The gateway merges config in this order (later wins):

Defaults in packages/llm-gateway/src/config.ts
gateway.config.json or stackbilt.gateway.json at the working directory
CLI flags (--port)

Example gateway.config.json to flip shadow mode off and adjust routing:

{
  "routing": {
    "shadowMode": false,
    "routes": {
      "planning": ["groq", "cerebras"],
      "summary":  ["cerebras", "groq"]
    }
  }
}

To prefer Cloudflare for low-cost classes:

{
  "routing": {
    "routes": {
      "planning": ["cloudflare", "groq", "cerebras"],
      "code_draft": ["cloudflare", "groq", "cerebras"],
      "summary": ["cloudflare", "cerebras", "groq"]
    }
  }
}

Endpoints

Method	Path	Auth	Description
GET	`/health`	none	Service status, provider availability
GET	`/health?live=1`	none	Same + live provider probe
GET	`/providers`	key	Provider health snapshot
GET	`/providers?live=1`	key	Live provider health
GET	`/metrics`	key	Aggregate request metrics
GET	`/events/recent`	key	Last 100 request events
GET	`/shadow/stats`	key	Shadow routing summary + projected savings
POST	`/v1/messages`	key	Anthropic Messages API (Claude Code)
POST	`/v1/responses`	key	OpenAI Responses API (Codex)
POST	`/v1/chat/completions`	key	OpenAI Chat Completions API
POST	`/v1/context/compact`	key	Distill a session transcript to structured facts

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.ai		.ai
.charter		.charter
.github/workflows		.github/workflows
packages/llm-gateway		packages/llm-gateway
.cursorrules		.cursorrules
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
PRD.md		PRD.md
README.md		README.md
agents.md		agents.md
claude-gateway.sh		claude-gateway.sh
copilot-instructions.md		copilot-instructions.md
gateway.sh		gateway.sh
install-global.sh		install-global.sh
package-lock.json		package-lock.json
package.json		package.json
start-gateway.sh		start-gateway.sh
tsconfig.base.json		tsconfig.base.json
uninstall-global.sh		uninstall-global.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StackBilt LLM Gateway

How it works

Setup

1. Add API keys

2. Bootstrap setup

3. Install global launcher (run once)

One-command launch (from any repo)

Manual env mode (optional)

Observability

Provider health

Request metrics

Shadow stats (the important one)

Recent events

Enabling live routing

Context compaction

Troubleshooting

`stackbilt-gw: command not found`

`doctor` fails with missing provider keys

`claude` or `codex` binary not found

Port 8787 already in use

Gateway appears stuck or unhealthy

Configuration

Endpoints

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StackBilt LLM Gateway

How it works

Setup

1. Add API keys

2. Bootstrap setup

3. Install global launcher (run once)

One-command launch (from any repo)

Manual env mode (optional)

Observability

Provider health

Request metrics

Shadow stats (the important one)

Recent events

Enabling live routing

Context compaction

Troubleshooting

stackbilt-gw: command not found

doctor fails with missing provider keys

claude or codex binary not found

Port 8787 already in use

Gateway appears stuck or unhealthy

Configuration

Endpoints

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`stackbilt-gw: command not found`

`doctor` fails with missing provider keys

`claude` or `codex` binary not found

Packages