A Docker-based collection of services for running local AI workflows.
I run this at home on a machine that has an AMD Radeon 7900XTX with 24GB of VRAM.
You'll be able to use locally hosted models to chat with Open WebUI, work with opencode, and run other agentic workflows with hermes.
In addition, when web searches are used by the agents, they are configured to use a locally hosted searxng instance.
| Service | Port | Description |
|---|---|---|
| llama.cpp server | 8080 | Local LLM inference backend |
| Open WebUI | 3000 | Chat interface |
| SearXNG | 9000 | Private, metasearch engine for web search tool calls |
docker compose up -dServices will be available at:
- LLM API:
http://localhost:8080 - Web UI:
http://localhost:3000 - Search:
http://localhost:9000
Models are configured via models.ini and autoloaded by the llama.cpp server. The current setup includes:
| Model | Alias | Context | Quantization |
|---|---|---|---|
| Qwen3.6-27B | qwen3.6-27b |
151k | Q4_K_XL |
| Qwen3.6-27B-MTP | qwen3.6-27b-mtp |
47k | Q4_K_XL |
| Qwen3.6-35B-A3B (MoE) | qwen3.6-35b-a3b |
139k | Q4_K_XL |
| Qwen3.6-35B-A3B-MTP (MoE) | qwen3.6-35b-a3b-mtp |
103k | Q4_K_XL |
| Gemma 4-26B-A4B (MoE) | gemma-4-26b-a4b |
256k | Q4_K_XL |
| Gemma 4-31B | gemma-4-31b |
67k | Q4_K_XL |
The context sizes were determined by running llama-server with --fit on --fit-target 1024 on a headless Ubuntu 26.04 system.
Models will be downloaded into models/ by llama-server using the Hugging Face API. After the initial download, they will persist in the 'models/' directory. Only one model is loaded at a time by llama-server (--models-max 1), but they are "hot swappable" and do not require a server restart to change to a different one.
SearXNG settings are stored in data/searxng-core-config/settings.yml. After the first docker compose up, edit this file directly to customize engines, search behavior, and other options. Then, restart the container.
use_default_settings:
engines:
keep_only:
- google
- duckduckgo
- bing
- brave
general:
enable_metrics: true
server:
limiter: true
secret_key: search123
bind_address: 0.0.0.0
port: 8080
search:
safe_search: 0
formats:
- html
- jsonTo set up Open WebUI, navigate to http://localhost:3000 and set up an admin username/password. Then follow the below instructions.
- In Open WebUI, go to Admin Panel > Settings > Connections > OpenAI API
- Enable the OpenAI API integration
- Set the API Base URL to:
http://llama-server:8080/v1with no authentication. - Check the box to enable Direct Connections
- Check the box to enable Cache Base Model List
- Click Save
- Navigate to Models on the left navigation. (you should now see 5 models)
- Enable or disable your models as desired. For each model you want to enable, I'd recommend disabling thinking for a better chat experience:
- Click "Edit" icon next to the model
- Next to "Advanced Params" click "Show"
- Add a custom parameter with name
chat_template_kwargsand value{"enable_thinking":false}
- In Open WebUI, go to Admin Panel > Settings > Web Search
- Set Search Engine to
SearXNG - Set SearXNG Query URL to:
http://searxng-core:8080/search?q=<query> - Configure Result Count to 10 and Concurrent Requests to 10
- Check the box to enable Bypass Embedding and Retrieval, Bypass Web Loader and Trust Proxy Environment
- Click Save
This project includes an example opencode.json configuration that connects opencode to the local llama.cpp server. Models defined in models.ini are available as providers with their respective context limits.
Add the following config to ~/.hermes/config.yaml:
custom_providers:
- name: local
base_url: http://localhost:8080/v1
api_key: none
format: openai
Then type hermes model and scroll down until you see the local provider. You should see the 5 models running on llama.cpp show up here. Select one of them.
-
Add
SEARXNG_URL=http://localhost:9000to the bottom of~/.hermes/.env -
Modify the 'web' section of
~/.hermes/config.yaml:web: backend: searxng search_backend: searxng -
Run
hermes tools enable web
You should now see web search tool calls utilize your local searxng instance.