myai: A personal collection of services for running local AI workflows.

A Docker-based collection of services for running local AI workflows.

I run this at home on a machine that has an AMD Radeon 7900XTX with 24GB of VRAM.

You'll be able to use locally hosted models to chat with Open WebUI, work with opencode, and run other agentic workflows with hermes.

In addition, when web searches are used by the agents, they are configured to use a locally hosted searxng instance.

Services

Service	Port	Description
llama.cpp server	8080	Local LLM inference backend
Open WebUI	3000	Chat interface
SearXNG	9000	Private, metasearch engine for web search tool calls

Getting Started

docker compose up -d

Services will be available at:

LLM API: http://localhost:8080
Web UI: http://localhost:3000
Search: http://localhost:9000

Models

Models are configured via models.ini and autoloaded by the llama.cpp server. The current setup includes:

Model	Alias	Context	Quantization
Qwen3.6-27B	`qwen3.6-27b`	151k	Q4_K_XL
Qwen3.6-27B-MTP	`qwen3.6-27b-mtp`	47k	Q4_K_XL
Qwen3.6-35B-A3B (MoE)	`qwen3.6-35b-a3b`	139k	Q4_K_XL
Qwen3.6-35B-A3B-MTP (MoE)	`qwen3.6-35b-a3b-mtp`	103k	Q4_K_XL
Gemma 4-26B-A4B (MoE)	`gemma-4-26b-a4b`	256k	Q4_K_XL
Gemma 4-31B	`gemma-4-31b`	67k	Q4_K_XL

The context sizes were determined by running llama-server with --fit on --fit-target 1024 on a headless Ubuntu 26.04 system.

Models will be downloaded into models/ by llama-server using the Hugging Face API. After the initial download, they will persist in the 'models/' directory. Only one model is loaded at a time by llama-server (--models-max 1), but they are "hot swappable" and do not require a server restart to change to a different one.

Integrations

Modifying SearXNG settings

SearXNG settings are stored in data/searxng-core-config/settings.yml. After the first docker compose up, edit this file directly to customize engines, search behavior, and other options. Then, restart the container.

Recommended settings.yml

use_default_settings:
  engines:
    keep_only:
      - google
      - duckduckgo
      - bing
      - brave

general:
  enable_metrics: true

server:
  limiter: true
  secret_key: search123
  bind_address: 0.0.0.0
  port: 8080

search:
  safe_search: 0
  formats:
    - html
    - json

Open WebUI

To set up Open WebUI, navigate to http://localhost:3000 and set up an admin username/password. Then follow the below instructions.

llama.cpp integration

In Open WebUI, go to Admin Panel > Settings > Connections > OpenAI API
Enable the OpenAI API integration
Set the API Base URL to: http://llama-server:8080/v1 with no authentication.
Check the box to enable Direct Connections
Check the box to enable Cache Base Model List
Click Save
Navigate to Models on the left navigation. (you should now see 5 models)
Enable or disable your models as desired. For each model you want to enable, I'd recommend disabling thinking for a better chat experience:
- Click "Edit" icon next to the model
- Next to "Advanced Params" click "Show"
- Add a custom parameter with name chat_template_kwargs and value {"enable_thinking":false}

SearXNG integration

In Open WebUI, go to Admin Panel > Settings > Web Search
Set Search Engine to SearXNG
Set SearXNG Query URL to: http://searxng-core:8080/search?q=<query>
Configure Result Count to 10 and Concurrent Requests to 10
Check the box to enable Bypass Embedding and Retrieval, Bypass Web Loader and Trust Proxy Environment
Click Save

Use with Opencode

This project includes an example opencode.json configuration that connects opencode to the local llama.cpp server. Models defined in models.ini are available as providers with their respective context limits.

Use with Hermes

Add models

Add the following config to ~/.hermes/config.yaml:

custom_providers:
  - name: local
    base_url: http://localhost:8080/v1
    api_key: none
    format: openai

Then type hermes model and scroll down until you see the local provider. You should see the 5 models running on llama.cpp show up here. Select one of them.

Use searxng for web search tool calls

Add SEARXNG_URL=http://localhost:9000 to the bottom of ~/.hermes/.env

Modify the 'web' section of ~/.hermes/config.yaml:

web:  
  backend: searxng
  search_backend: searxng

Run hermes tools enable web

You should now see web search tool calls utilize your local searxng instance.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
models.ini		models.ini
opencode.json		opencode.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

myai: A personal collection of services for running local AI workflows.

Services

Getting Started

Models

Integrations

Modifying SearXNG settings

Recommended settings.yml

Open WebUI

llama.cpp integration

SearXNG integration

Use with Opencode

Use with Hermes

Add models

Use searxng for web search tool calls

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

myai: A personal collection of services for running local AI workflows.

Services

Getting Started

Models

Integrations

Modifying SearXNG settings

Recommended settings.yml

Open WebUI

llama.cpp integration

SearXNG integration

Use with Opencode

Use with Hermes

Add models

Use searxng for web search tool calls

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages