A self-hosted LLM chat UI that runs entirely on your local network. Built with FastAPI, HTMX, and PostgreSQL. Streams responses token-by-token from any model installed in Ollama.
Designed to run on a Raspberry Pi 5 alongside Pi-hole, with no external dependencies after first boot.
- Real-time token streaming via SSE
- Persistent conversation history (PostgreSQL)
- Markdown rendering with syntax highlighting
- Model selection per conversation (any model installed in Ollama)
- Single shared password auth — no accounts needed
- Fully offline after
docker compose up(all JS vendored locally)
- Docker and Docker Compose
- Ollama installed and running on the host machine
- At least one model pulled in Ollama (e.g.
ollama pull llama3.2:3b)
1. Clone the repo and create your .env file:
cp .env.example .envFill in the required values:
APP_SECRET= # generate with: python3 -c "import secrets; print(secrets.token_hex(32))"
APP_PASSWORD= # the password used to log in to the UI
DB_PASSWORD= # password for the Postgres container
2. Start the app:
docker compose up -dVisit http://localhost:8000.
Docker and UFW have a known networking conflict on Linux that prevents containers from reaching the host. Use the Pi-specific override file instead:
docker compose -f docker-compose.yml -f docker-compose.pi.yml up -dAlso set OLLAMA_HOST=http://localhost:11434 in your .env on the Pi.
| Variable | Required | Default | Description |
|---|---|---|---|
APP_SECRET |
Yes | — | Secret key for signing session cookies (min 32 chars) |
APP_PASSWORD |
Yes | — | Shared password to access the UI |
DB_PASSWORD |
Yes | — | Postgres container password |
OLLAMA_HOST |
No | http://host.docker.internal:11434 |
URL of the Ollama instance |
OLLAMA_DEFAULT_MODEL |
No | — | Pre-select a model for new conversations |
OLLAMA_TIMEOUT_S |
No | 300 |
Max seconds to wait for a response |
LOG_LEVEL |
No | INFO |
Uvicorn log level |
The docker-compose.override.yml (gitignored) enables live reload and mounts the source directory into the container:
services:
app:
volumes:
- .:/app
command: sh -c "alembic upgrade head && uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload"Create this file locally to enable hot reload during development.
- FastAPI + Jinja2 templates
- HTMX + SSE for streaming (vendored, no CDN)
- SQLAlchemy 2.0 async + asyncpg
- Alembic for migrations
- PostgreSQL 16
- markdown-it-py + Pygments for server-side markdown rendering
- DOMPurify + marked.js for client-side streaming render