Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions compose
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@
# ./compose up -d llamacpp dashboard open-webui # start core only
# ./compose down # stop all
# ./compose logs -f llamacpp # tail logs
#
# Compose overrides (in overrides/):
# ./compose -f docker-compose.yml -f overrides/vllm.yml --profile vllm up -d
set -e

if [[ $# -eq 0 || "$1" == "--help" || "$1" == "-h" ]]; then
Expand Down
3 changes: 0 additions & 3 deletions compose.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
# .\compose.ps1 up -d llamacpp dashboard open-webui # start core only
# .\compose.ps1 down # stop all
# .\compose.ps1 logs -f llamacpp # tail logs
#
# Compose overrides (in overrides/):
# .\compose.ps1 -f docker-compose.yml -f overrides/vllm.yml --profile vllm up -d

param([Parameter(ValueFromRemainingArguments)][string[]]$PassThrough)

Expand Down
12 changes: 0 additions & 12 deletions docs/GETTING_STARTED.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,18 +55,6 @@ The llama.cpp backend is internal (no host port). Host tools reach the models th
- Point Cursor or any OpenAI-compatible client at `http://localhost:11435/v1`.
- This is bound to `127.0.0.1` on the host machine only — not to the tailnet. Tailnet peers reach models through the SSO-gated front door (Open WebUI at `/`, or via the dashboard's model surface).

### Optional: vLLM (OpenAI-compatible server)

Use vLLM as an additional model provider (e.g. for Llama, Mistral via Hugging Face):

1. Start with the vLLM profile:
`docker compose -f docker-compose.yml -f overrides/vllm.yml --profile vllm up -d`
2. Set in `.env`: `VLLM_URL=http://vllm:8000`
3. Restart model-gateway: `docker compose restart model-gateway`
4. In clients (Open WebUI, Hermes), choose models with prefix `vllm/<model-id>` (e.g. `vllm/meta-llama/Llama-3.2-3B-Instruct`).

See [overrides/vllm.yml](../overrides/vllm.yml) for `VLLM_MODEL` and resource limits.

## Tailscale + SSO front door

Single homelab operator with a small Google-account allowlist for friends / family / co-workers — that's the deployment model. UI services don't publish host ports; everything goes through Caddy on the tailnet.
Expand Down
1 change: 0 additions & 1 deletion docs/product requirements docs/appendix-env-vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
| `BASE_PATH` | compose | Project root path | `.` |
| `DATA_PATH` | compose | Data directory | `${BASE_PATH}/data` |
| `LLAMACPP_URL` | model-gateway, dashboard | llama.cpp internal URL | `http://llamacpp:8080` |
| `VLLM_URL` | model-gateway | vLLM internal URL (optional) | *(empty)* |
| `MODEL_CACHE_TTL_SEC` | model-gateway | Model list cache TTL seconds | `60` |
| `DASHBOARD_URL` | model-gateway | Dashboard for throughput recording | `http://dashboard:8080` |
| `OPS_CONTROLLER_URL` | dashboard | Ops controller URL | `http://ops-controller:9000` |
Expand Down
7 changes: 3 additions & 4 deletions docs/product requirements docs/architecture-and-principles.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,9 @@
│ │ │ servers.txt │ │ → ops ctrl API │ │ data/rag- │ │ │
│ │ │ registry.json │ │ registry.json │ │ input/ │ │ │
│ │ └─────────────────┘ └─────────────────┘ └──────────────┘ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ vLLM (opt) │ │ ComfyUI :8188 │ │ │
│ │ │ overrides/ │ │ (frontend net) │ │ │
│ │ │ vllm.yml │ └─────────────────┘ │ │
│ │ ┌─────────────────┐ │ │
│ │ │ ComfyUI :8188 │ │ │
│ │ │ (frontend net) │ │ │
│ │ └─────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────────────┘
Expand Down
23 changes: 0 additions & 23 deletions docs/product requirements docs/component-model-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,29 +69,6 @@ model-gateway:
- DASHBOARD_URL=http://dashboard:8080
```

### vLLM Compose Profile (Optional)

```yaml
# overrides/vllm.yml
services:
vllm:
profiles: [vllm]
image: vllm/vllm-openai:latest
ports:
- "8000:8000"
environment:
- MODEL=${VLLM_MODEL:-meta-llama/Llama-3.2-3B-Instruct}
deploy:
resources:
limits:
memory: 16G
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
```

## Non-Goals
- Direct UI rendering. UI components are separate and consume the gateway.
- Persistent storage of model results — the gateway only forwards results.
Expand Down
1 change: 0 additions & 1 deletion docs/product requirements docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ A self-hosted AI platform that any developer can run with `./compose up -d`. Cor
| llama.cpp backend-only (no host port) | Live | `docker-compose.yml` |
| SSRF egress block scripts | Live | `scripts/ssrf-egress-block.sh`, `.ps1` |
| Hermes agent (gateway + dashboard) | Live | `docker-compose.yml`, `hermes/` |
| vLLM optional compose profile | Live | `overrides/vllm.yml` |
| Contract + smoke tests | Live | `tests/` |

## Open Risks
Expand Down
5 changes: 2 additions & 3 deletions docs/product requirements docs/milestones-and-roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
| **M1** | Done | Model Gateway: OpenAI-compat, llama.cpp, streaming, embeddings, throughput |
| **M2** | Done | Ops Controller: start/stop/restart/logs/pull/audit; dashboard calls controller; bearer auth |
| **M3** | Done | MCP registry.json + health API; cap_drop/read_only hardening; model list cache; Open WebUI → gateway default |
| **M4** | Done | Explicit Docker networks (frontend/backend); correlation IDs (X-Request-ID → audit); vLLM compose profile; smoke tests |
| **M4** | Done | Explicit Docker networks (frontend/backend); correlation IDs (X-Request-ID → audit); smoke tests |
| **M5** | Done | Dashboard MCP health dots (green/yellow/red); SSRF egress scripts; hardware stats; throughput benchmark; default-model management |
| **M5-ext** | Done | RAG pipeline (Qdrant + rag-ingestion); Open WebUI → Qdrant; RAG status endpoint; Responses API + completions compat; cache-bust endpoint |
| **M6** | Partial | **Done:** mcp-gateway backend-only; CI; audit log rotation. **Deferred:** MCP per-client / `X-Client-ID` (upstream). **Skipped:** `WEBUI_AUTH` default → True |
Expand All @@ -30,12 +30,11 @@

---

## M4 — Networks + Correlation + vLLM + Smoke Tests (Done)
## M4 — Networks + Correlation + Smoke Tests (Done)

**User-visible outcomes:**
- Explicit `ordo-ai-stack-frontend` / `ordo-ai-stack-backend` networks; llama.cpp/ops-controller on backend only
- Request IDs: `X-Request-ID` forwarded dashboard → ops-controller and stored in audit entries
- vLLM: `overrides/vllm.yml` with profile `vllm`
- Smoke tests: `tests/test_compose_smoke.py`

---
Expand Down
1 change: 0 additions & 1 deletion docs/product requirements docs/risks-and-questions.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
| 3 | **MCP gateway policy:** Does Docker MCP Gateway support `X-Client-ID` for per-client allowlist? | Open — not yet; deferred to M6 |
| 5 | **llama.cpp host port:** Remove to reduce attack surface? | Resolved — backend-only; no host port |
| 6 | **Audit log rotation** | Resolved — size-based rotation (`AUDIT_LOG_MAX_BYTES`) |
| 7 | **vLLM timing** | Resolved — `overrides/vllm.yml` with `--profile vllm` |
| 8 | **ComfyUI non-root** | Open — `yanwk/comfyui-boot` runs as root; image limitation |
| 9 | **Smoke test in CI** | Resolved — see `.github/workflows/ci.yml` |
| 10 | **N8N LLM node** | Open — use OpenAI-compat node with `baseURL: http://model-gateway:11435/v1`; needs example workflow doc |
Expand Down
12 changes: 1 addition & 11 deletions tests/test_compose_smoke.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Compose config and optional runtime smoke tests.

- Config tests: validate docker-compose.yml (and optional vllm override) parse and merge.
- Config tests: validate docker-compose.yml parses and merges.
- Runtime smoke: set RUN_COMPOSE_SMOKE=1 to run 'compose up -d' and assert key services healthy
(requires Docker daemon; use in CI or locally).
"""
Expand All @@ -15,7 +15,6 @@

REPO_ROOT = Path(__file__).resolve().parent.parent
COMPOSE_FILE = REPO_ROOT / "docker-compose.yml"
COMPOSE_VLLM = REPO_ROOT / "overrides" / "vllm.yml"

# Services that must be healthy for "smoke" (long-running core stack)
SMOKE_SERVICES = ["llamacpp", "llamacpp-embed", "model-gateway", "dashboard"]
Expand All @@ -33,8 +32,6 @@

def _compose_cmd(*args, extra_env=None, timeout=120):
cmd = ["docker", "compose", "-f", str(COMPOSE_FILE)]
if COMPOSE_VLLM.exists():
cmd += ["-f", str(COMPOSE_VLLM)]
cmd += list(args)
env = {**os.environ, **_COMPOSE_REQUIRED_PLACEHOLDERS, **(extra_env or {})}
return subprocess.run(
Expand Down Expand Up @@ -62,13 +59,6 @@ def test_compose_config_includes_networks():
assert "ordo-ai-stack-backend" in out or "backend" in out


@pytest.mark.skipif(not COMPOSE_VLLM.exists(), reason="overrides/vllm.yml not present")
def test_compose_vllm_override_config_valid():
"""With vllm override, compose config still valid (vllm profile)."""
r = _compose_cmd("config", "--quiet", extra_env={"COMPOSE_PROFILES": "vllm"})
assert r.returncode == 0, f"vllm config failed: {r.stderr or r.stdout}"


def _has_nvidia_gpu() -> bool:
"""Return True iff `nvidia-smi` is available and exits 0."""
try:
Expand Down
Loading