macOS (Apple Silicon): launcher never starts Ollama, no model installed, runaway context — with fixes

Thanks for this project! While getting it running on **macOS (Apple Silicon, M1 Max, Ollama 0.24)** I hit a few Mac-specific issues. Reporting them together with the fixes that worked for me — happy to open a PR for the launcher + Modelfile changes if useful.

## 1. `start-mac.command` never starts the engine on current Ollama (blocker)

The launcher serves via the **GUI** binary:

```bash
"$MAC_OLLAMA_DIR/Ollama.app/Contents/MacOS/Ollama" serve
```

On current Ollama releases this prints **`serve command not supported, use ollama`** and exits, so no server runs and AnythingLLM has nothing to connect to. The CLI that actually serves is `Ollama.app/Contents/Resources/ollama`.

**Fix** (prefer the CLI binary, keep old layouts as fallback):

```bash
if [ -f "$MAC_OLLAMA_DIR/Ollama.app/Contents/Resources/ollama" ]; then
    "$MAC_OLLAMA_DIR/Ollama.app/Contents/Resources/ollama" serve > /dev/null 2>&1 &
elif [ -f "$MAC_OLLAMA_DIR/ollama" ]; then
    "$MAC_OLLAMA_DIR/ollama" serve > /dev/null 2>&1 &
elif [ -f "$MAC_OLLAMA_DIR/Ollama.app/Contents/MacOS/Ollama" ]; then
    "$MAC_OLLAMA_DIR/Ollama.app/Contents/MacOS/Ollama" serve > /dev/null 2>&1 &
fi
```

## 2. No model gets installed on macOS

The interactive model menu only exists for Windows (`install.bat` / `install-core.ps1`) and Linux (`linux/install-core.sh`). On macOS, `start-mac.command` downloads Ollama + AnythingLLM but **never pulls a model**, and `models/installed-models.txt` is never created, so it defaults to a non-existent `nemomix-local`. Result: AnythingLLM has no usable model.

**Suggestion:** add a macOS model step mirroring Linux — either `ollama pull` from the registry, or download a GGUF + `ollama create <name> -f Modelfile`, then write `models/installed-models.txt` (`local_name|nice_name|label`).

## 3. Runaway context → CPU offload → ~1.4 tok/s (performance)

Importing a GGUF with a bare `FROM ./model.gguf` makes Ollama adopt the model's *declared* context (NemoMix advertises ~1,024,000). Ollama then allocates a huge KV cache (observed: **256K context, ~40 GB KV cache**), spills layers to CPU (`21%/79% CPU/GPU`), and crawls at **~1.4 tok/s** on an M1 Max.

**Fix:** pin a sane context in the Modelfile, e.g. `PARAMETER num_ctx 8192`. Same machine afterwards: **100% GPU, ~31 tok/s** (≈22× faster).

**Optional speed/memory env in the launcher** — Flash Attention + KV-cache quantization halves KV memory and lets you double the context for free:

```bash
export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_KV_CACHE_TYPE=q8_0
export OLLAMA_KEEP_ALIVE=30m   # keep the model warm, avoid cold-reload lag
```

(Measured: ctx 8192 → 16384, KV cache ~2.7 GB → ~1.36 GB, still 100% GPU, ~31 tok/s.)

## 4. GGUF imports need a chat template (quality)

A bare `FROM` import leaves the Ollama template at the default `{{ .Prompt }}`, so chat output is incoherent (the model just continues raw text). For NemoMix (Mistral-Nemo base) a Mistral `[INST] … [/INST]` `TEMPLATE` plus `</s>` / `[INST]` / `[/INST]` stop params fixes it.

---

Environment: macOS, Apple Silicon (M1 Max, 64 GB), Ollama 0.24 (bundled `ollama-darwin.zip`).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

macOS (Apple Silicon): launcher never starts Ollama, no model installed, runaway context — with fixes #58

1. `start-mac.command` never starts the engine on current Ollama (blocker)

2. No model gets installed on macOS

3. Runaway context → CPU offload → ~1.4 tok/s (performance)

4. GGUF imports need a chat template (quality)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

macOS (Apple Silicon): launcher never starts Ollama, no model installed, runaway context — with fixes #58

Description

1. start-mac.command never starts the engine on current Ollama (blocker)

2. No model gets installed on macOS

3. Runaway context → CPU offload → ~1.4 tok/s (performance)

4. GGUF imports need a chat template (quality)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `start-mac.command` never starts the engine on current Ollama (blocker)