Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ All UI ports below are **internal** (container-network). Operators reach them vi
./compose up -d
```

**CPU-only / minimal services:** bring up a subset after init, e.g. `./compose up -d ollama dashboard open-webui`.
**CPU-only / minimal services:** bring up a subset after init, e.g. `./compose up -d llamacpp dashboard open-webui`.

## Installation

Expand Down Expand Up @@ -158,7 +158,7 @@ Large optional downloads on demand; first run can take a long time. Pull via the

### GPU / compute

Hardware detection writes **`overrides/compute.yml`**. The `compose` wrapper runs detection before commands. **No GPU:** use a minimal service set (`./compose up -d ollama dashboard open-webui`); ComfyUI will be slower.
Hardware detection writes **`overrides/compute.yml`**. The `compose` wrapper runs detection before commands. **No GPU:** use a minimal service set (`./compose up -d llamacpp dashboard open-webui`); ComfyUI will be slower.

### Architecture

Expand All @@ -171,7 +171,7 @@ Tailnet device → Caddy :443 (TLS) → oauth2-proxy (Google SSO + email allowli
├── /comfy/ → ComfyUI
└── /hermes/ → Hermes dashboard
├── Model Gateway → LiteLLM → llama.cpp / Ollama / (vLLM)
├── Model Gateway → LiteLLM → llama.cpp
├── MCP Gateway → shared tools (SearXNG, n8n, ComfyUI, …)
└── Ops Controller → Docker Compose lifecycle (token-auth, no host port)
```
Expand All @@ -180,7 +180,7 @@ Local-first AI; operator-deployed front door. Dashboard does not mount `docker.s

### Data

Bind mounts only. Set **`BASE_PATH`** (and optionally **`DATA_PATH`**). Ollama blobs under **`models/ollama`**. See [docs/data.md](docs/data.md).
Bind mounts only. Set **`BASE_PATH`** (and optionally **`DATA_PATH`**). See [docs/data.md](docs/data.md).

### MCP (Model Context Protocol)

Expand Down Expand Up @@ -231,7 +231,7 @@ Optional: `DOCTOR_DEPS_TIMEOUT_SEC`; `DASHBOARD_AUTH_TOKEN` from `.env` when pro
## Troubleshooting

1. **Services won’t start or images are stale** — Rebuild affected images and recreate, e.g. `docker compose build dashboard model-gateway` (or the `compose` wrapper), then `up -d`. Doctor **WARN** on missing `/api/dependencies` or `/ready` often indicates an old image.
2. **Doctor warns on Ollama (11434) or MCP (8811)** — Expected if those ports are not published; use `overrides/ollama-expose.yml` / `overrides/mcp-expose.yml` or set `DOCTOR_STRICT=1` only when you intend strict probes (see doctor script comments in repo).
2. **Doctor warns on MCP (8811)** — Expected if that port is not published; use `overrides/mcp-expose.yml` or set `DOCTOR_STRICT=1` only when you intend strict probes (see doctor script comments in repo).
3. **No GPU** — Use a minimal service set or CPU-oriented overrides; ComfyUI will be slower.
4. **Exposing to a network** — Enable **Open WebUI** auth (`WEBUI_AUTH=True`), set `DASHBOARD_AUTH_TOKEN`, and harden **n8n** — see [SECURITY.md](SECURITY.md).

Expand Down
2 changes: 1 addition & 1 deletion SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,4 @@ All runtime data is stored under `BASE_PATH/data/` via bind mounts. Ensure appro
1. **Reset OPS_CONTROLLER_TOKEN:** Generate new token, update `.env`, restart dashboard + ops-controller
2. **Restore data:** Restore `data/` from a local backup
3. **Disable MCP tools:** Clear `data/mcp/servers.txt` or set to a single safe server
4. **Safe mode:** Stop `mcp-gateway` and `hermes-gateway`; use `ollama` + `open-webui` only
4. **Safe mode:** Stop `mcp-gateway` and `hermes-gateway`; use `llamacpp` + `open-webui` only
6 changes: 2 additions & 4 deletions compose
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,11 @@
#
# Examples:
# ./compose up -d # start all services
# ./compose up -d ollama dashboard open-webui # start core only
# ./compose up -d llamacpp dashboard open-webui # start core only
# ./compose down # stop all
# ./compose logs -f ollama # tail logs
# ./compose run --rm model-puller # pull Ollama models
# ./compose logs -f llamacpp # tail logs
#
# Compose overrides (in overrides/):
# ./compose -f docker-compose.yml -f overrides/ollama-expose.yml up -d
# ./compose -f docker-compose.yml -f overrides/vllm.yml --profile vllm up -d
set -e

Expand Down
6 changes: 2 additions & 4 deletions compose.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,11 @@
#
# Examples:
# .\compose.ps1 up -d # start all services
# .\compose.ps1 up -d ollama dashboard open-webui # start core only
# .\compose.ps1 up -d llamacpp dashboard open-webui # start core only
# .\compose.ps1 down # stop all
# .\compose.ps1 logs -f ollama # tail logs
# .\compose.ps1 run --rm model-puller # pull Ollama models
# .\compose.ps1 logs -f llamacpp # tail logs
#
# Compose overrides (in overrides/):
# .\compose.ps1 -f docker-compose.yml -f overrides/ollama-expose.yml up -d
# .\compose.ps1 -f docker-compose.yml -f overrides/vllm.yml --profile vllm up -d

param([Parameter(ValueFromRemainingArguments)][string[]]$PassThrough)
Expand Down
19 changes: 8 additions & 11 deletions docs/GETTING_STARTED.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ Quick paths to common workflows for a single homelab operator. The stack assumes

### I want to chat

1. Start: `docker compose up -d caddy oauth2-proxy ollama dashboard open-webui`
1. Start: `docker compose up -d caddy oauth2-proxy llamacpp dashboard open-webui`
2. Pull a model via the dashboard (`https://${CADDY_TAILNET_HOSTNAME}/dash/` → Starter pack, or pick one)
3. Open `https://${CADDY_TAILNET_HOSTNAME}/` — Open WebUI

No GPU required for chat (Ollama runs on CPU, slower but works).
No GPU required for chat (llama.cpp runs on CPU, slower but works).

### I want to generate images (LTX-2)

Expand All @@ -20,7 +20,7 @@ No GPU required for chat (Ollama runs on CPU, slower but works).

### I want workflow automation

1. Start: `docker compose up -d caddy oauth2-proxy ollama n8n`
1. Start: `docker compose up -d caddy oauth2-proxy llamacpp n8n`
2. Open `https://${CADDY_TAILNET_HOSTNAME}/n8n/` — n8n

### Full stack
Expand All @@ -35,7 +35,7 @@ Alternatively: `docker compose up -d` — same services without the full bootstr

Use local files as context in **Open WebUI** via Qdrant + the `rag-ingestion` service.

1. **Pull the embedding model** (once): use the dashboard or `docker compose run --rm model-puller` so **`nomic-embed-text`** (or your `EMBED_MODEL`) is available in Ollama.
1. **Provide the embedding model** (once): place the embedding GGUF (**`nomic-embed-text`**, or your `EMBED_MODEL`) under `models/gguf/` so the `llamacpp-embed` service can serve it.
2. **Start the RAG profile** (adds Qdrant + `rag-ingestion`):
```bash
docker compose --profile rag up -d
Expand All @@ -48,15 +48,12 @@ Env knobs (optional, in `.env`): `EMBED_MODEL`, `RAG_COLLECTION`, `RAG_CHUNK_SIZ

**Optional — [Agentic Design Patterns](https://github.com/Mathews-Tom/Agentic-Design-Patterns) (MIT book text):** clone or copy the `.md` tree into `data/rag-input/` (for example `git clone --depth 1 https://github.com/Mathews-Tom/Agentic-Design-Patterns.git data/rag-input/agentic-design-patterns`), then run the steps above so `rag-ingestion` can index it.

### Direct Ollama (Cursor, CLI on the host machine)
### Host tools (Cursor, CLI on the host machine)

By default Ollama is backend-only (no host port — host MCP clients should go through `127.0.0.1:11435` model-gateway instead). To expose Ollama directly on the host for tools that speak Ollama's native API:
The llama.cpp backend is internal (no host port). Host tools reach the models through the model-gateway's OpenAI-compatible API on `127.0.0.1:11435`:

- Start with the Ollama-expose override:
`docker compose -f docker-compose.yml -f overrides/ollama-expose.yml up -d`
- Use `http://localhost:11434` in Cursor or run `ollama run <model>` locally.

Note: this exposes Ollama on `127.0.0.1` to the host machine only — not to the tailnet. Tailnet peers reach models through the SSO-gated front door (Open WebUI at `/`, or via the dashboard's model surface).
- Point Cursor or any OpenAI-compatible client at `http://localhost:11435/v1`.
- This is bound to `127.0.0.1` on the host machine only — not to the tailnet. Tailnet peers reach models through the SSO-gated front door (Open WebUI at `/`, or via the dashboard's model surface).

### Optional: vLLM (OpenAI-compatible server)

Expand Down
6 changes: 2 additions & 4 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Copy `.env.example` to `.env` and set at least `BASE_PATH`. Everything else has
|---|---|---|
| `DATA_PATH` | `${BASE_PATH}/data` | Override data directory location |
| `DEFAULT_MODEL` | `local-chat` | Canonical model alias used by Open WebUI, Hermes, and LiteLLM |
| `MODELS` | *(see `.env.example`)* | Comma-separated Ollama models to pull on first start |
| `GGUF_MODELS` | *(see `.env.example`)* | Hugging Face repo(s) of GGUF files to pull for llama.cpp (`docker compose --profile models run --rm gguf-puller`) |
| `OPS_CONTROLLER_TOKEN` | *(empty)* | Required for dashboard-driven service lifecycle (`openssl rand -hex 32`) |
| `DASHBOARD_AUTH_TOKEN` | *(empty)* | Optional Bearer auth on dashboard `/api/*` |
| `HF_TOKEN` | *(empty)* | Hugging Face token for gated model downloads |
Expand Down Expand Up @@ -217,7 +217,6 @@ All `data/` and `models/` directories are bind-mounted and persist across contai
| `data/mcp/` | `servers.txt`, `registry.json`, `registry-custom.yaml` |
| `data/dashboard/` | Dashboard throughput / benchmark data |
| `data/comfyui-storage/` | ComfyUI outputs, custom nodes, local configs |
| `models/ollama/` | Ollama model blobs |
| `models/gguf/` | llama.cpp GGUF files |
| `models/comfyui/` | ComfyUI checkpoints, LoRAs, VAEs, encoders |

Expand All @@ -234,7 +233,6 @@ All `data/` and `models/` directories are bind-mounted and persist across contai
| n8n | `5678` | Workflow automation |
| Hermes dashboard | `9119` | Overridable via `HERMES_DASHBOARD_PORT` |
| MCP Gateway | `8811` | Published on host so external clients (Cursor, Claude Desktop) can reach it |
| Ollama | `11434` | **Backend-only by default.** Expose via `overrides/ollama-expose.yml` |
| Qdrant | `6333` | RAG profile only |
| Ops Controller | internal `9000` | Not published on the host |

Expand All @@ -244,7 +242,7 @@ All `data/` and `models/` directories are bind-mounted and persist across contai

```json
{"timestamp":"2026-03-22T10:00:00Z","action":"model_pulled","model":"qwen3:8b","status":"success"}
{"timestamp":"2026-03-22T10:01:00Z","action":"service_started","service":"ollama","status":"success"}
{"timestamp":"2026-03-22T10:01:00Z","action":"service_started","service":"llamacpp","status":"success"}
```

## Minimal `.env`
Expand Down
18 changes: 7 additions & 11 deletions docs/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ Reference for where data lives, how it moves, and what survives a restart / rebu
| `data/mcp/registry.json` | MCP server metadata, `allow_clients`, rate limits | `mcp-gateway`, dashboard |
| `data/mcp/registry-custom.yaml` | Custom catalog fragment (e.g. ComfyUI MCP) | `mcp-gateway` |
| `data/rag-input/` | Drop zone for RAG documents | `rag-ingestion` watch directory |
| `models/ollama/` | Ollama model blobs | `ollama` bind mount |
| `models/gguf/` | llama.cpp GGUF files | `llamacpp` / `llamacpp-embed` bind mount |
| `models/comfyui/` | ComfyUI checkpoints, LoRAs, VAEs, encoders | `comfyui` bind mount |

Expand All @@ -36,7 +35,7 @@ Reference for where data lives, how it moves, and what survives a restart / rebu

```json
{"timestamp":"2026-03-22T10:00:00Z","action":"model_pulled","model":"qwen3:8b","status":"success"}
{"timestamp":"2026-03-22T10:01:00Z","action":"service_started","service":"ollama","status":"success"}
{"timestamp":"2026-03-22T10:01:00Z","action":"service_started","service":"llamacpp","status":"success"}
```

| Field | Type | Description |
Expand Down Expand Up @@ -108,9 +107,7 @@ All directories created this way persist across restarts and rebuilds.

### Model Pull

**Ollama:** `docker compose run --rm model-puller` reads `MODELS` from `.env` and pulls each into `models/ollama/`. Also exposed from the dashboard.

**llama.cpp GGUF:** `docker compose --profile models run --rm gguf-puller` with `GGUF_MODELS=org/repo` fetches GGUF files into `models/gguf/`.
**llama.cpp GGUF:** `docker compose --profile models run --rm gguf-puller` with `GGUF_MODELS=org/repo` fetches GGUF files into `models/gguf/`. Also exposed from the dashboard.

**ComfyUI:** `docker compose run --rm comfyui-model-puller` downloads the pack defined by `COMFYUI_PACKS` (default includes LTX-2 variants) into `models/comfyui/`. First run can be tens of GB.

Expand Down Expand Up @@ -145,7 +142,6 @@ Hermes maintains its own state under `data/hermes/` — session records, Discord
| `data/dashboard/` | Throughput / benchmarks | yes | yes |
| `data/comfyui-storage/` | ComfyUI outputs + custom nodes | yes | yes |
| `data/n8n-data/` | n8n workflows | yes | yes |
| `models/ollama/` | Ollama blobs | yes | yes |
| `models/gguf/` | llama.cpp GGUF files | yes | yes |
| `models/comfyui/` | ComfyUI weights | yes | yes |

Expand All @@ -161,7 +157,7 @@ Hermes maintains its own state under `data/hermes/` — session records, Discord
### What to back up

1. `data/hermes/` — agent state
2. `models/ollama/`, `models/gguf/`, `models/comfyui/` — expensive to re-download
2. `models/gguf/`, `models/comfyui/` — expensive to re-download
3. `data/ops-controller/audit.log*` — audit history
4. `data/qdrant/` — RAG collection
5. `.env` — environment configuration (**do not commit**)
Expand Down Expand Up @@ -210,13 +206,13 @@ docker compose up -d
| `data/ops-controller/audit.log` | Archive rotated files (`audit.log.1` etc.) | Monthly |
| `data/rag-input/` | Remove processed files | As needed |
| `data/comfyui-storage/output/` | Prune old outputs | As needed |
| `models/ollama/` | Remove unused models | Quarterly |
| `models/gguf/` | Remove unused models | Quarterly |

```bash
# Archive current audit log
mv data/ops-controller/audit.log data/ops-controller/audit.log.$(date +%Y%m%d)

# Prune Ollama
docker compose exec ollama ollama list
docker compose exec ollama ollama rm <model-name>
# Prune GGUF models (delete unused GGUF files)
ls models/gguf/
rm models/gguf/<model-file>.gguf
```
5 changes: 2 additions & 3 deletions docs/product requirements docs/appendix-env-vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,8 @@
|----------|---------|-------------|---------|
| `BASE_PATH` | compose | Project root path | `.` |
| `DATA_PATH` | compose | Data directory | `${BASE_PATH}/data` |
| `OLLAMA_URL` | model-gateway, dashboard | Ollama internal URL | `http://ollama:11434` |
| `LLAMACPP_URL` | model-gateway, dashboard | llama.cpp internal URL | `http://llamacpp:8080` |
| `VLLM_URL` | model-gateway | vLLM internal URL (optional) | *(empty)* |
| `DEFAULT_PROVIDER` | model-gateway | Provider for unprefixed models | `ollama` |
| `MODEL_CACHE_TTL_SEC` | model-gateway | Model list cache TTL seconds | `60` |
| `DASHBOARD_URL` | model-gateway | Dashboard for throughput recording | `http://dashboard:8080` |
| `OPS_CONTROLLER_URL` | dashboard | Ops controller URL | `http://ops-controller:9000` |
Expand All @@ -20,7 +19,7 @@
| `MODEL_GATEWAY_PORT` | model-gateway | Model gateway host port | `11435` |
| `WEBUI_AUTH` | open-webui | Enable Open WebUI auth | `False` (target `True` in M6) |
| `OPENAI_API_BASE` | open-webui, n8n | OpenAI-compat base URL | `http://model-gateway:11435/v1` |
| `MODELS` | model-puller | Models to pull on startup | `deepseek-r1:7b,...` |
| `GGUF_MODELS` | gguf-puller | Hugging Face repo(s) of GGUF files to pull | *(empty)* |
| `COMPUTE_MODE` | compose | CPU/nvidia/amd | auto-detected |
| `QDRANT_PORT` | qdrant | Qdrant host port | `6333` |
| `EMBED_MODEL` | rag-ingestion | Embedding model for RAG | `nomic-embed-text` |
Expand Down
4 changes: 2 additions & 2 deletions docs/product requirements docs/appendix-quality-bar.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
## Performance Targets

- Model list (cached): `<100ms` after first call
- Model list (cold): `<2s` when Ollama healthy
- Model list (cold): `<2s` when llama.cpp healthy
- RAG embedding: `<5s` per document chunk (depends on model)
- Tool invocation: `<30s` default timeout
- Ops restart: `<60s` for most services
Expand All @@ -42,4 +42,4 @@
3. Disable all tools: `echo "" > data/mcp/servers.txt`
4. Invalidate model cache: `curl -X DELETE http://localhost:11435/v1/cache`
5. Disable unsafe services: `docker compose stop mcp-gateway hermes-gateway comfyui rag-ingestion`
6. Safe mode: `docker compose up -d ollama model-gateway dashboard open-webui qdrant`
6. Safe mode: `docker compose up -d llamacpp model-gateway dashboard open-webui qdrant`
6 changes: 3 additions & 3 deletions docs/product requirements docs/appendix-rollback.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Appendix: Rollback Procedures

1. **Model gateway:** Point services directly to Ollama (`OLLAMA_BASE_URL=http://ollama:11434`); `docker compose stop model-gateway`. Restart affected services.
1. **Model gateway:** Point services directly to llama.cpp (`OPENAI_API_BASE=http://llamacpp:8080/v1`); `docker compose stop model-gateway`. Restart affected services.
2. **Ops controller:** Remove controller from compose or set no token; ops buttons show "unavailable" in dashboard. No data loss.
3. **MCP registry:** Delete `registry.json`; dashboard falls back to `servers.txt` only. Policy metadata disabled.
4. **cap_drop / read_only:** Remove from compose; `docker compose up -d --force-recreate <service>`.
5. **Reset OPS_CONTROLLER_TOKEN:** `openssl rand -hex 32` → update `.env` → `docker compose up -d dashboard ops-controller`.
6. **MCP tools:** Clear `data/mcp/servers.txt` or set to single safe server → gateway hot-reloads within 10s.
7. **RAG:** `docker compose stop rag-ingestion qdrant`; remove `VECTOR_DB=qdrant` from Open WebUI env → Open WebUI uses built-in vector store. Qdrant data preserved in `data/qdrant/`.
8. **Invalidate model cache:** `curl -X DELETE http://localhost:11435/v1/cache` — forces fresh fetch from Ollama on next `/v1/models` call.
9. **Safe mode:** `docker compose stop mcp-gateway hermes-gateway comfyui rag-ingestion` → Ollama + Open WebUI + dashboard only.
8. **Invalidate model cache:** `curl -X DELETE http://localhost:11435/v1/cache` — forces fresh fetch from llama.cpp on next `/v1/models` call.
9. **Safe mode:** `docker compose stop mcp-gateway hermes-gateway comfyui rag-ingestion` → llama.cpp + Open WebUI + dashboard only.
Loading
Loading