LLM Router - Open-Source AI Gateway for Local and Cloud LLM Infrastructure

Version: 0.5.1

LLM Router is a service that can be deployed on‑premises or in the cloud. It adds a layer between any application and the LLM provider. In real time it controls traffic, distributes load among providers of a specific LLM, and enables analysis of outgoing requests from a security perspective (masking, anonymization, prohibited content). It is an open‑source solution (Apache 2.0) that can be launched instantly by running a ready‑made image in your own infrastructure.

🌐 Ecosystem Overview

The LLM‑Router project is split across five dedicated repositories:

Repository	Description
llm-router (this repo)	Core gateway — unified REST proxy, Python SDK, and configuration management
llm-router-api	REST proxy that routes requests to any supported LLM backend (OpenAI‑compatible, Ollama, vLLM, LM Studio, Anthropic), with built‑in load‑balancing, health checks, streaming responses and optional Prometheus metrics
llm-router-lib	Python SDK that wraps the API with typed request/response models, automatic retries, token handling and a rich exception hierarchy
llm-router-web	Ready‑to‑use Flask UIs — a Config Manager for model/user settings and an Anonymizer UI that masks sensitive data
llm-router-plugins	Pluggable anonymizers (maskers), guardrails, semantic routing and RAG plugins
llm-router-services	HTTP services that power the plugin ecosystem (NASK‑PIB/Sojka guardrails, PII masker)
llm-router-utils	CLI tools, batch translation, GenAI classification and ready‑made deployment configs (Speakleash models)

✨ Key Features

Feature	Description
Unified REST interface	One endpoint schema works for OpenAI‑compatible, Ollama, vLLM, LM Studio and Anthropic.
Provider‑agnostic streaming	The `stream` flag (default `true`) controls whether the proxy forwards chunked responses as they arrive or returns a single aggregated payload. Streaming responses include proper Cache‑Control, Pragma, Expires and Vary headers.
Built‑in prompt library	Language‑aware system prompts stored under `resources/prompts` can be referenced automatically.
Dynamic model configuration	JSON file (`models-config.json`) defines providers, model name, default options and per‑model overrides.
Request validation	Pydantic models guarantee correct payloads; errors are returned with clear messages.
Structured logging	Configurable log level, filename, and optional JSON formatting.
Health & metadata endpoints	`/ping` (simple 200 OK) and `/tags` (available model tags/metadata).
Embeddings support	Dedicated endpoints for generating text embeddings across all supported providers.
Simple deployment	One‑liner run script, Docker image, or Helm chart for Kubernetes.
Extensible conversation formats	Basic chat, conversation with system prompt, and extended conversation with richer options (temperature, top‑k, custom system prompt).
Multi‑provider model support	Each model can be backed by multiple providers (VLLM, Ollama, OpenAI, Anthropic) defined in `models-config.json`.
Load‑balanced default strategy	`LoadBalancedStrategy` distributes requests evenly across providers using in‑memory usage counters.
Dynamic model handling	`ModelHandler` loads model definitions at runtime and resolves the appropriate provider per request.
Pluggable endpoint architecture	Automatic discovery and registration of all concrete `EndpointI` implementations via `EndpointAutoLoader`.
Prometheus metrics integration	Optional `/metrics` endpoint for latency, error counts, and provider usage statistics.
Docker & Kubernetes ready	Dockerfile (non‑root user) and Helm charts for containerised deployment.

🧩 Plugin System Architecture

LLM Router uses a registry-based pipeline pattern. Each plugin implements a tiny, well‑defined apply method and can be composed in an ordered list to form a pipeline. Pipelines are instantiated by the MaskerPipeline, GuardrailPipeline and UtilsPipeline classes and are driven automatically by the endpoint logic in endpoint_i.py.

Data flow

Request → MaskerPipeline → GuardrailPipeline → UtilsPipeline → Model Provider

Masker Plugins

Plugin ID	Type	Description
`fast_masker`	Local	Regex‑based PII masker with 30+ rule types (emails, IPs, URLs, phone numbers, PESEL, NIP, KRS, REGON, monetary amounts, dates, credit cards, JWTs, passports and more).
`pii_masker`	HTTP (remote)	ML‑based PII masker using a token‑classification model with an in‑memory cache to avoid redundant model calls for identical text inputs.

Guardrail Plugins

Plugin ID	Type	Description
`nask_guard`	HTTP (remote)	Safety check using the HerBERT‑PL‑Guard model (NASK‑PIB).
`sojka_guard`	HTTP (remote)	Safety check using the Bielik‑Guard‑0.1B model from SpeakLeash.

Utility Plugins

Plugin ID	Type	Description
`langchain_rag`	Local	Retrieves relevant document chunks from a FAISS vector store and injects them into the payload for Retrieval‑Augmented Generation.
`simple_semantic_routing`	Local	Two‑stage heuristic model selection: intent classification + complexity analysis. Activated when `payload["model"] == "auto"`.

Configuration

Pipelines are configured via environment variables:

# Comma-separated list of masker plugins to apply
export LLM_ROUTER_MASKING_STRATEGY_PIPELINE="fast_masker,pii_masker"

# Enable guardrails
export LLM_ROUTER_FORCE_GUARDRAIL_REQUEST=1

# Enable masking entirely
export LLM_ROUTER_FORCE_MASKING=1

# Record masking operations in audit log
export LLM_ROUTER_MASKING_WITH_AUDIT=1

🛡️ Monitoring

Component	Description
KeepAliveMonitor	Periodically pings model endpoints to keep them warm (prevents cold‑start latency).
ProviderMonitor	Tracks per‑provider availability using Redis as a shared state store.
ServicesMonitor	Periodically health‑checks the llm-router-services endpoints (guardrails, maskers).

📦 Quick Start

1️⃣ Create & activate a virtual environment

python3 -m venv .venv
source .venv/bin/activate

# Only the core library (llm-router-lib).
pip install .

# Core library + API wrapper (llm-router-api).
pip install .[api]

# Core library + API wrapper + Prometheus metrics.
pip install .[api,metrics]

Note: When Prometheus metrics are enabled, LLM_ROUTER_USE_PROMETHEUS=1 must be set and Redis is required ( used for provider availability state).

Then start the application with the environment variable set:

export LLM_ROUTER_USE_PROMETHEUS=1

When LLM_ROUTER_USE_PROMETHEUS is enabled, the router automatically registers a /metrics endpoint (under the API prefix, e.g. /api/metrics). This endpoint exposes Prometheus‑compatible metrics such as request counts, latencies, and any custom counters defined by the application.

2️⃣ Run the REST API

./run-rest-api.sh
# or
LLM_ROUTER_MINIMUM=1 python3 -m llm_router_api.rest_api

3️⃣ Quick‑start guides for local models

Gemma 3 12B‑IT – README
Bielik 11B‑v2.3‑Instruct – README

4️⃣ Integration boilerplates

Integration examples for popular LLM libraries (LlamaIndex, LangChain, OpenAI, LiteLLM, Haystack) are in the examples/ directory. See examples README for details.

🔐 Auditing

The router can record request‑level events (guard‑rail checks, payload masking, custom logs) in a tamper‑evident, encrypted form. All audit entries are written by the auditor module and stored under logs/auditor/ as GPG‑encrypted files.

For a complete guide — including key generation, encryption workflow, and decryption utilities — see:

➡️ Auditing subsystem documentation

Utility scripts:

scripts/gen_and_export_gpg.sh — generate and export GPG keys
scripts/decrypt_auditor_logs.sh — decrypt encrypted audit logs

📦 Docker

Run the container with the default configuration:

docker run -p 5555:8080 quay.io/radlab/llm-router:rc1

For more advanced usage you can use a custom launch script:

#!/bin/bash

PWD=$(pwd)

docker run \
  -p 5555:8080 \
  -e LLM_ROUTER_TIMEOUT=500 \
  -e LLM_ROUTER_IN_DEBUG=1 \
  -e LLM_ROUTER_MINIMUM=1 \
  -e LLM_ROUTER_EP_PREFIX="/api" \
  -e LLM_ROUTER_SERVER_TYPE=gunicorn \
  -e LLM_ROUTER_SERVER_PORT=8080 \
  -e LLM_ROUTER_SERVER_WORKERS_COUNT=4 \
  -e LLM_ROUTER_DEFAULT_EP_LANGUAGE="pl" \
  -e LLM_ROUTER_LOG_FILENAME="llm-proxy-rest.log" \
  -e LLM_ROUTER_EXTERNAL_TIMEOUT=300 \
  -e LLM_ROUTER_BALANCE_STRATEGY=balanced \
  -e LLM_ROUTER_REDIS_HOST="192.168.100.67" \
  -e LLM_ROUTER_REDIS_PORT=6379 \
  -e LLM_ROUTER_MODELS_CONFIG=/srv/cfg.json \
  -e LLM_ROUTER_PROMPTS_DIR="/srv/prompts" \
  -v "${PWD}/resources/configs/models-config.json":/srv/cfg.json \
  -v "${PWD}/resources/prompts":/srv/prompts \
  quay.io/radlab/llm-router:rc1

Kubernetes (Helm)

Helm charts for Kubernetes deployment are available in the helm_charts/ directory.

🛠️ Configuration (via environment)

A full list of environment variables is available at: API README

Core variables

Variable	Default	Description
`LLM_ROUTER_PROMPTS_DIR`	`resources/prompts`	Directory containing predefined system prompts.
`LLM_ROUTER_MODELS_CONFIG`	`resources/configs/models-config.json`	Path to the models configuration JSON file.
`LLM_ROUTER_DEFAULT_EP_LANGUAGE`	`pl`	Default language for endpoint prompts.
`LLM_ROUTER_TIMEOUT`	`0`	Timeout (seconds) for llm-router API calls.
`LLM_ROUTER_EXTERNAL_TIMEOUT`	`300`	Timeout (seconds) for external model API calls.
`LLM_ROUTER_LOG_FILENAME`	`llm-router.log`	Name of the log file.
`LLM_ROUTER_LOG_LEVEL`	`INFO`	Logging level (e.g., INFO, DEBUG).
`LLM_ROUTER_EP_PREFIX`	`/api`	Prefix for all API endpoints.
`LLM_ROUTER_MINIMUM`	`False`	Run service in proxy‑only mode.
`LLM_ROUTER_IN_DEBUG`	`False`	Run server in debug mode.
`LLM_ROUTER_BALANCE_STRATEGY`	`balanced`	Load‑balancing strategy: `balanced`, `weighted`, `dynamic_weighted`, `first_available`, `first_available_optim`.
`LLM_ROUTER_SERVER_TYPE`	`flask`	Server implementation: `flask`, `gunicorn`, `waitress`.
`LLM_ROUTER_SERVER_PORT`	`8080`	Port on which the server listens.
`LLM_ROUTER_SERVER_HOST`	`localhost`	Host address for the server.
`LLM_ROUTER_SERVER_WORKERS_COUNT`	`2`	Number of workers.
`LLM_ROUTER_SERVER_THREADS_COUNT`	`8`	Number of worker threads.
`LLM_ROUTER_SERVER_WORKER_CLASS`	`None`	Worker class for servers that support it.
`LLM_ROUTER_USE_PROMETHEUS`	`False`	Enable Prometheus metrics (`/metrics` endpoint).

Masking & guardrail variables

Variable	Default	Description
`LLM_ROUTER_FORCE_MASKING`	`False`	Enable force‑masking of every endpoint's payload.
`LLM_ROUTER_MASKING_STRATEGY_PIPELINE`	`["fast_masker"]`	Ordered list of masker plugins (e.g. `fast_masker,pii_masker`).
`LLM_ROUTER_MASKING_WITH_AUDIT`	`False`	Record each masking operation in the audit log.
`LLM_ROUTER_FORCE_GUARDRAIL_REQUEST`	`False`	Force guardrail evaluation on every request.
`LLM_ROUTER_MASKER_PII_HOST`	—	Host URL for the PII masker service.
`LLM_ROUTER_GUARDRAIL_SOJKA_GUARD_HOST`	—	Host URL for the Sojka guardrail service.

Redis variables

Variable	Default	Description
`LLM_ROUTER_REDIS_HOST`	(empty)	Redis host for load‑balancing across multi‑provider models.
`LLM_ROUTER_REDIS_PORT`	`6379`	Redis port.
`LLM_ROUTER_REDIS_PASSWORD`	(not set)	Redis password.
`LLM_ROUTER_REDIS_DB`	`0`	Redis database number.

Redis is now mandatory. The router raises RuntimeError at startup if Redis is unavailable.

⚖️ Load Balancing Strategies

The current list of available strategies, the interface description, and an example extension can be found at: Load‑Balancing Strategies

Strategies: balanced, weighted, dynamic_weighted, first_available, first_available_optim.

🛣️ Endpoints Overview

The list of endpoints — categorized into built‑in, provider‑dependent, and utility endpoints — and a description of the streaming mechanisms can be found at: Endpoints Overview

Highlights

Endpoint	Method	Description
`/ping`	GET	Health‑check
`/tags`	GET	List Ollama model tags
`/models`	GET	List OpenAI‑compatible models
`/api/chat/completions`	POST	OpenAI‑style chat completion
`/api/v1/chat/completions`	POST	vLLM‑like chat completion
`/v1/messages`	POST	Anthropic‑compatible messages endpoint (Claude)
`/v1/responses`	POST	OpenAI‑like responses endpoint
`/api/embeddings`	POST	Standard embeddings
`/api/conversation_with_model`	POST	Built‑in standard chat
`/api/extended_conversation_with_model`	POST	Built‑in chat with extended fields
`/api/generative_answer`	POST	Answer a question using provided context
`/api/translate`	POST	Translate texts
`/api/generate_questions`	POST	Generate questions from texts
`/api/simplify_text`	POST	Simplify input texts

🌐 Web Applications

Config Manager (port 8081)

Full web UI for managing LLM Router model configurations:

Multi‑user with authentication and role‑based access (admin/user)
Projects — group configurations by project
Model configuration — create, edit, import/export JSON configs; manage providers across families (Google, OpenAI, Qwen)
Version control — snapshot history with restore capability
Active model selection — choose which models to activate per config
Drag‑and‑drop provider reordering (HTMX)
Light/dark themes (Alpine.js)
26+ API endpoints under /configs

Run: ./run-configs-manager.sh

Anonymizer (port 8082)

Web UI for text anonymization and interactive chat:

3 anonymization algorithms: fast (regex), pii_masking (ML model), fast+pii (hybrid)
Interactive chat with streaming SSE responses and session persistence
Dynamic model selection from the router
i18n — Polish and English translations (122 keys)
Privacy warnings when anonymization is disabled
Privacy policy & terms pages

Run: ./run-anonymizer.sh

🧰 llm-router-utils

The llm-router-utils repository provides CLI tools and ready‑made deployment configs:

CLI Tools

Tool	Description
`translate-texts`	Batch translate texts in JSON/JSONL datasets via LLM Router
`genai-classifier`	Classify dataset texts using LLM prompts with multi‑threading and XLSX export

Speakleash Deployment Configs

The resources/llm-router-speakleash/ directory contains ready‑made configs for deploying Speakleash models:

speakleash-models.json — configures Bielik-11B-v2.3-Instruct across 8 vLLM providers on 3 hosts
run-bielik-*.sh — vLLM launch scripts for each GPU (cuda:0, cuda:1, cuda:2)
run-rest-api-gunicorn.sh — full LLM Router server with masking, guardrails, Redis balancing and Prometheus metrics
run-sojka-guardrail.sh — guardrail service with Bielik‑Guard model

⚙️ Configuration Details

Config File / Variable	Meaning
`resources/configs/models-config.json`	JSON map of provider → model → default options (e.g., `keep_alive`, `options.num_ctx`).
`LLM_ROUTER_PROMPTS_DIR`	Directory containing prompt templates (`*.prompt`). Sub‑folders are language‑specific (`en/`, `pl/`).
`LLM_ROUTER_DEFAULT_EP_LANGUAGE`	Language code used when a prompt does not explicitly specify one.
`LLM_ROUTER_TIMEOUT`	Upper bound for any request to an upstream LLM (seconds).
`LLM_ROUTER_LOG_FILENAME` / `LLM_ROUTER_LOG_LEVEL`	Logging destinations and verbosity.
`LLM_ROUTER_IN_DEBUG`	When set, enables DEBUG‑level logs and more verbose error payloads.

🔧 Development

Python 3.10+ (project is tested on 3.10.6)
All dependencies are listed in requirements.txt. Install them inside the virtualenv.
To add a new provider, create a class in llm_router_api/core/api_types that implements the BaseProvider interface and register it in llm_router_api/register/__init__.py.

📚 Changelog

See the CHANGELOG for a complete history of changes.

📜 License

See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 1,040 Commits
docker		docker
examples		examples
helm_charts		helm_charts
llm_router_api		llm_router_api
llm_router_cli		llm_router_cli
llm_router_lib		llm_router_lib
logs/auditor		logs/auditor
resources		resources
scripts		scripts
.bandit		.bandit
.gitignore		.gitignore
.pylintrc		.pylintrc
.version		.version
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
check-dev.sh		check-dev.sh
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
rest_api_test.py		rest_api_test.py
run-docker.sh		run-docker.sh
run-rest-api-gunicorn.sh		run-rest-api-gunicorn.sh
run-rest-api.sh		run-rest-api.sh
setup.py		setup.py
tox.ini		tox.ini

Folders and files

Latest commit

History

Repository files navigation

LLM Router - Open-Source AI Gateway for Local and Cloud LLM Infrastructure

🌐 Ecosystem Overview

✨ Key Features

🧩 Plugin System Architecture

Data flow

Masker Plugins

Guardrail Plugins

Utility Plugins

Configuration

🛡️ Monitoring

📦 Quick Start

1️⃣ Create & activate a virtual environment

2️⃣ Run the REST API

3️⃣ Quick‑start guides for local models

4️⃣ Integration boilerplates

🔐 Auditing

📦 Docker

Kubernetes (Helm)

🛠️ Configuration (via environment)

Core variables

Masking & guardrail variables

Redis variables

⚖️ Load Balancing Strategies

🛣️ Endpoints Overview

Highlights

🌐 Web Applications

Config Manager (port 8081)

Anonymizer (port 8082)

🧰 llm-router-utils

CLI Tools

Speakleash Deployment Configs

⚙️ Configuration Details

🔧 Development

📚 Changelog

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages