Skip to content

BenItBuhner/Model-Proxy

Repository files navigation

Model-Proxy

OpenAI- and Anthropic-compatible LLM proxy on Bun and TypeScript. Route logical model names to multiple upstream providers with API-key rotation, cooldowns, format conversion, streaming, optional tool-call enforcement, and a built-in admin UI.

Model Proxy Banner

Features

  • Logical modelsconfig/models/<name>.json maps a client-facing id to one or more provider routes and fallbacks
  • Multi-provider routing — OpenAI-compatible and Anthropic wire protocols (Groq, Cerebras, Gemini, OpenRouter, Nahcrof, etc.)
  • API key fallback — rotate keys and cool down failed keys per provider
  • Format conversion — OpenAI ↔ Anthropic at the proxy boundary
  • Streaming — SSE for chat completions
  • Context window metadataGET /v1/models exposes context_window, context_length, and limit.context for harness compaction
  • Audio — OpenAI-style /v1/audio/transcriptions with provider routing
  • Admin UI — Next.js static app at /setup/ (models, providers, env, test bench, bundle import/export)

Requirements

  • Bun ≥ 1.1 (runtime and tests)
  • Docker optional (recommended for production)

Quick start (Docker)

cp .env.example .env
# Edit .env: set CLIENT_API_KEY and provider API keys

docker compose build
docker compose up -d

curl -s http://127.0.0.1:9876/health
curl -s -H "Authorization: Bearer $CLIENT_API_KEY" http://127.0.0.1:9876/v1/models

Default listen address: http://127.0.0.1:9876
Admin UI: http://127.0.0.1:9876/setup/

Quick start (local dev)

cp .env.example .env
bun install

# Terminal 1 — API server (hot reload)
bun run dev

# Terminal 2 — admin UI (optional; or rely on Docker-built web-static)
cd web && bun install && bun run dev

With only the API process, open /setup/ after building the UI once:

cd web && bun install && bun run build
# Serves from web/out when MODEL_PROXY_WEB_ROOT is unset

Project layout

Path Purpose
src/ Hono server, routing, providers, CLI entry
shared/schemas/ Zod schemas for config and wire formats
web/ Current Next.js admin UI (exported to web/out, copied as web-static in Docker)
config/providers/ Provider endpoint + auth JSON (often gitignored locally; samples may ship in repo)
config/models/ Per logical model routing JSON (gitignored locally)
config/templates/ Templates for new provider/model files
config/audio-models/ Audio transcription routing
tests/ bun test integration tests

There is no Python application in this tree. The v1 FastAPI codebase was replaced by this v2 TypeScript implementation.

Configuration

Environment (.env)

Variable Description
CLIENT_API_KEY Required. Bearer token clients must send
HOST / PORT Bind address (default 127.0.0.1:9876)
CORS_ORIGINS Comma-separated origins or *
LOG_LEVEL debug | info | warn | error
DEFAULT_CONTEXT_WINDOW Fallback context size (tokens) when upstream/config omit it
UPSTREAM_MODELS_CACHE_TTL_SECONDS Cache TTL for provider /v1/models catalogs (default 3600)
UPSTREAM_MODELS_FETCH_TIMEOUT_MS Max wait on first upstream catalog fetch (default 2000)
KEY_COOLDOWN_SECONDS API key cooldown after failures
ENFORCE_TOOL_CALL_* Global tool-call enforcement defaults
Provider keys e.g. GROQ_API_KEY, CEREBRAS_API_KEY, ANTHROPIC_API_KEY

See .env.example for the full list.

Logical model example

config/models/turbo.json:

{
  "logical_name": "turbo",
  "timeout_seconds": 20,
  "default_cooldown_seconds": 10,
  "context_window": 131072,
  "model_routings": [
    { "provider": "cerebras", "model": "zai-glm-4.7" }
  ],
  "fallback_model_routings": []
}

Optional context_window on the model or on a route overrides discovery when upstream metadata is missing.

API surface

Method Path Auth Notes
GET /health No Liveness
GET /health/detailed No Models/providers counts
GET /v1/models Bearer OpenAI list + context metadata
POST /v1/chat/completions Bearer OpenAI chat
POST /v1/chat/completions/stream Bearer Forces stream: true
POST /v1/messages Bearer Anthropic messages
POST /v1/audio/transcriptions Bearer Audio STT
GET /setup/* Session or Bearer Admin UI static assets
/v1/admin/* Session or Bearer Config CRUD, logs, bundle import

Chat responses keep the logical model id the client requested.

Context window resolution (GET /v1/models)

For each logical model (primary route model_routings[0]):

  1. Upstream provider GET /v1/models (cached)
  2. provider.models.<id>.context_length in provider JSON
  3. Route or model context_window in config
  4. DEFAULT_CONTEXT_WINDOW env
  5. 128000 system default

CLI

The process entrypoint is Bun, not a separate Python package:

bun run start
# or
bun run ./src/cli/main.ts --host 0.0.0.0 --port 9876 --log-level info

Docker CMD uses the same entrypoint. Supported flags: --host, --port, --log-level. The optional start positional argument is accepted for compatibility.

Scripts

bun run dev          # API with --hot
bun run start        # API production mode
bun test             # test suite
bun run typecheck    # tsc --noEmit
bun run build:web    # build admin UI → web/out

Docker

docker compose up -d --build
docker compose -f docker-compose.prod.yml up -d --build

Development

bun test
bun run typecheck

Tests live under tests/. Config loaders use a temp directory in tests; production config is read from config/ search paths (cwd, ~/.model-proxy/config, package config/).

License

MIT (see repository license file if present).

About

A model proxy to allow for multiple API key, provider, and model fallbacks, that also translates OpenAI methods to Anthropic, enabling support for the likes of Claude Code and other tools with ease.

Resources

Stars

Watchers

Forks

Contributors

Languages