docs/scripts: remove stale Ollama references (llama.cpp is sole backend)#65
Merged
Merged
Conversation
…ackend) The inference backend was switched from Ollama to llama.cpp (service `llamacpp` fronted by the LiteLLM `model-gateway`), but Ollama references lingered across docs and wrapper scripts. This PR scrubs the runtime/infra-level references so declared config matches reality. - Delete dead overrides/ollama-expose.yml (added a port to a nonexistent service) - Wrappers/probes: compose, compose.ps1, doctor.ps1/.sh, smoke_test.ps1, detect_hardware.py docstring — drop ollama service examples + health probes - Docs (README, GETTING_STARTED, configuration, data, SECURITY, and the PRD set) updated to llama.cpp/LiteLLM; inference chain reduced to llama.cpp Deferred to a follow-up PR (kept in sync with the code they describe): - dashboard/app.py + static/index.html Ollama model-management subsystem + tests - docker-compose.yml OLLAMA_* env shims, README "Ollama models" section, component-dashboard-ui.md Untouched: CHANGELOG.md (historical). Note: stale vLLM references also exist and warrant a separate cleanup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jul 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
The inference backend was switched from Ollama to llama.cpp (service
llamacpp, fronted by the LiteLLMmodel-gateway), but stale Ollama references lingered across docs and wrapper scripts — declared config drifting from reality. This is PR1 of 2: scrub the runtime/infra-level references. PR2 will decommission the dashboard's dead Ollama model-management subsystem (dashboard/app.py,static/index.html, tests) and thedocker-compose.ymlOLLAMA_*env shims.Changes
overrides/ollama-expose.yml— dead override (added a host port to a service that no longer exists).compose,compose.ps1,doctor.ps1,doctor.sh,smoke_test.ps1,detect_hardware.py(docstring) — dropollamaservice examples + the:11434health probes; remove now-orphaned helper functions.README.md,SECURITY.md,docs/GETTING_STARTED.md,docs/configuration.md,docs/data.md, and the PRD set — reflect llama.cpp/LiteLLM as the sole backend; inference chain reduced from "llama.cpp / Ollama / vLLM" to llama.cpp; host-tools reach models via the gateway at127.0.0.1:11435/v1.Verification
gguf-puller,GGUF_MODELS,llamacpp-embed,LLAMACPP_URL,LLAMACPP_EMBED_URL,CLAUDE_CODE_LOCAL_MODEL,models/gguf/, port11435) verified to exist in the real compose/config.model-gatewayconfirmed to be LiteLLM (litellm_config.yaml, nomain.py) — the old doc'smain.py/provider-prefix description was itself stale and is corrected.detect_hardware.pystill parses.Deferred / follow-ups
docker-compose.ymlOLLAMA_*env vars + README "Ollama models" section +component-dashboard-ui.md.overrides/vllm.yml/ removed profile) — warrant a separate cleanup.CHANGELOG.mdintentionally untouched (historical record).🤖 Generated with Claude Code