cjus · cjus · May 15, 2026 · May 15, 2026
diff --git a/.env.example b/.env.example
@@ -4,55 +4,60 @@ TELEGRAM_BOT_TOKEN=REPLACE_ME
 ALLOWLIST_BOOTSTRAP=123456789
 
 # ── Engine routing ──────────────────────────────────────────────────────────
-# Default engine for messages with no `@` or `!` prefix. PR-B inversion: the
-# default is `ollama` (free, local). Set to `primary` (Sonnet) or `secondary`
-# (Opus) for Claude-only deploys without an Ollama daemon.
-#   ollama    → no-prefix routes to local Ollama (recommended; requires daemon)
+# Default engine for messages with no `@` or `!` prefix.
+#   local     → no-prefix routes to the local-engine backend (recommended; free)
 #   primary   → no-prefix routes to Anthropic Sonnet (Claude-only deploys)
 #   secondary → no-prefix routes to Anthropic Opus
-SOLRAC_DEFAULT_ENGINE=ollama
+SOLRAC_DEFAULT_ENGINE=local
 SOLRAC_PRIMARY_MODEL=claude-sonnet-4-6    # `@` prefix
 SOLRAC_SECONDARY_MODEL=claude-opus-4-7    # `!` prefix
 
-# ── Ollama ──────────────────────────────────────────────────────────────────
-# Required when SOLRAC_DEFAULT_ENGINE=ollama. Boot fails loud otherwise.
-# `gpt-oss:20b` is the current default. Alternatives: `gemma4:e4b`
-# (native function-calling, ~9.6GB, 128K context), `qwen2.5`, `llama3.2`.
-OLLAMA_ENABLED=true
-OLLAMA_URL=http://localhost:11434
-OLLAMA_MODEL=gpt-oss:20b
+# ── Local engine (Ollama / LMStudio) ────────────────────────────────────────
+# Required when SOLRAC_DEFAULT_ENGINE=local. Boot fails loud otherwise.
+#
+# LOCAL_BACKEND picks the wire protocol:
+#   ollama   → POST /api/chat with NDJSON streaming, probe /api/tags
+#   lmstudio → POST /v1/chat/completions with SSE streaming, probe /v1/models
+#
+# LOCAL_URL default is backend-aware:
+#   LOCAL_BACKEND=ollama   → http://localhost:11434
+#   LOCAL_BACKEND=lmstudio → http://localhost:1234
+# Explicit LOCAL_URL always wins.
+#
+# LOCAL_MODEL is the model id the backend exposes. Examples:
+#   Ollama:    `gemma4:e4b` (native tool-calling, ~9.6GB, 128K ctx), `qwen2.5`, `llama3.2`
+#   LMStudio:  `qwen2.5-7b`, `llama-3.2-3b-instruct` (whatever's loaded via the UI/`lms load`)
+LOCAL_ENABLED=true
+LOCAL_BACKEND=ollama
+# LOCAL_URL=http://localhost:11434
+LOCAL_MODEL=gemma4:e4b
 # Total turn timeout. Default 60s when tools are off; bumps to 120s when
-# OLLAMA_TOOLS_ENABLED=true (one mid-loop confirm prompt can consume 60s on
-# its own, leaving zero budget for model rounds otherwise). Explicit override
-# here always wins.
-OLLAMA_TIMEOUT_MS=60000
-OLLAMA_HISTORY_LIMIT=6
-# Ollama tool-calling. When true, the local model can call the same
+# LOCAL_TOOLS_ENABLED=true (one mid-loop confirm prompt can consume 60s on
+# its own). Explicit override here always wins.
+LOCAL_TIMEOUT_MS=60000
+LOCAL_HISTORY_LIMIT=6
+# Local tool-calling. When true, the local model can call the same
 # `mcp__solrac__*` integration tools the Claude tiers see. Requires
-# SOLRAC_INTEGRATIONS_ENABLED=true and SOLRAC_DEFAULT_ENGINE=ollama
-# (boot validation: tools-on with Claude as default is unreachable since
-# PR-B removed the `>` prefix). Recommended `true` for the default deploy.
-OLLAMA_TOOLS_ENABLED=true
+# SOLRAC_INTEGRATIONS_ENABLED=true and SOLRAC_DEFAULT_ENGINE=local.
+LOCAL_TOOLS_ENABLED=true
 # Hard ceiling on tool-loop rounds per turn. Loop detector fires earlier on
 # duplicate calls; this is the runaway-loop backstop.
-OLLAMA_MAX_TOOL_ITERATIONS=8
+LOCAL_MAX_TOOL_ITERATIONS=8
 
-# ── Integrations (precondition for OLLAMA_TOOLS_ENABLED=true) ───────────────
+# ── Integrations (precondition for LOCAL_TOOLS_ENABLED=true) ────────────────
 # Operator-authored TS modules + blessed built-ins. When true, both the
 # blessed integrations bundled with solrac (`src/integrations-builtin/`) and
-# any operator integrations under SOLRAC_INTEGRATIONS_DIR are loaded. Effective
-# for Claude tiers (`@`, `!`) and for Ollama when OLLAMA_TOOLS_ENABLED=true.
-# Recommended `true` to pair with the default Ollama tools-on deploy.
+# any operator integrations under SOLRAC_INTEGRATIONS_DIR are loaded.
 SOLRAC_INTEGRATIONS_ENABLED=true
 SOLRAC_INTEGRATIONS_DIR=./integrations
 
 # ── Claude-only deploy alternative ──────────────────────────────────────────
-# Uncomment this block (and comment out the Ollama section above) for hosts
-# that can't run Ollama. No-prefix messages then route to Anthropic Sonnet.
+# Uncomment this block (and comment out the local-engine section above) for
+# hosts that can't run a local model. No-prefix messages then route to Sonnet.
 # `@`/`!` prefixes still work as before.
 # SOLRAC_DEFAULT_ENGINE=primary
-# OLLAMA_ENABLED=false
-# OLLAMA_TOOLS_ENABLED=false
+# LOCAL_ENABLED=false
+# LOCAL_TOOLS_ENABLED=false
 # SOLRAC_INTEGRATIONS_ENABLED=true   # still useful for Claude tiers
 
 # ── Operational ─────────────────────────────────────────────────────────────

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,28 @@
 # Changelog
 
+## Unreleased — local LLM backend abstraction: Ollama + LMStudio (BREAKING)
+
+Replaces the Ollama-specific path with a generic `local` engine that supports multiple backends behind a unified driver interface (`src/local-driver.ts`). Hard cutover — every `OLLAMA_*` env var, `engine: ollama` / `tier: ollama` frontmatter value, and `/clear ollama` / `>` slash alias is rejected with a rename hint. The audit-row tag becomes three-segment `local:<backend>:<modelId>` and matches the `claude:<tier>:<modelId>` shape so cross-engine queries are symmetric. LMStudio joins Ollama as a first-class backend with its own SSE wire format, `parallel_tool_calls: false` Gemma-4 workaround, and tool-call argument-delta accumulation.
+
+- **Env vars.** All `OLLAMA_*` → `LOCAL_*`. New `LOCAL_BACKEND` (required when `LOCAL_ENABLED=true`): `ollama` or `lmstudio`. `LOCAL_URL` default is backend-aware (Ollama → `:11434`, LMStudio → `:1234`). Boot fails loud on any legacy `OLLAMA_*` env var with the rename mapping, and on `SOLRAC_DEFAULT_ENGINE=ollama` with a hint pointing at `local` + `LOCAL_BACKEND=ollama`.
+- **Audit `model` column format.** `ollama:<modelId>` → `local:<backend>:<modelId>`. Migration runs idempotent retag at boot (`UPDATE audit SET model = 'local:ollama:' || substr(model, 8) WHERE model LIKE 'ollama:%'`) BEFORE the column rename below, so a crash between steps still leaves audit queries (dual-pattern reads, see next bullet) working.
+- **Dual-pattern reads for one release.** `outOfBandForEngine` and `hasLocalTurnsSince` match BOTH `local:%` and legacy `ollama:%`. Mitigates rollback / partial-migration risk. The legacy clause is removed in a follow-up release once the migration has propagated.
+- **Sessions schema.** Column rename `ollama_cutoff_ms` → `local_cutoff_ms` via `ALTER TABLE ... RENAME COLUMN` (SQLite 3.25+). Idempotent: legacy column → rename, neither → add new.
+- **Slash commands.** `/clear ollama` → `/clear local`. Aliases `o` and `>` dropped; `l` is the new short form. `/status` line "ollama turns (24h)" → "local turns (24h)". The "Cleared <b>ollama</b>" reply text becomes "Cleared <b>local</b>".
+- **Operator-edited markdown.** `tasks/*.md` `engine: ollama` and `skills/*.md` `tier: ollama` are **hard-rejected at parse** with rename hints. Replace with `engine: local` / `tier: local` before redeploying. Same hard-reject for `SOLRAC_DEFAULT_ENGINE=ollama`.
+- **Web UI pill label.** `defaultEngineLabel` returns `local (<backend>)` for the local engine (e.g. `local (ollama)`, `local (lmstudio)`) so the operator sees the backend at a glance.
+- **LMStudio driver hardening.** Sends `parallel_tool_calls: false` (Gemma-4 lmstudio-bug-tracker #1756 workaround) and dedupes identical `(name, args)` tool calls within one assistant message. Accumulates `function.arguments` deltas across SSE chunks before emitting one parsed `tool_call` event. Captures `usage` chunk for `inputTokens`/`outputTokens` whether it arrives inline or on a trailing dedicated chunk.
+- **LMStudio silent-substitution detection.** LMStudio's `POST /v1/chat/completions` returns HTTP 200 with the *loaded* model when the requested id isn't loaded, rather than 404'ing. Caught during the carlos/solrac-local-llm-backend smoke run: a fake-model request returned a normal completion instead of erroring. Driver now compares `chunk.model` (echoed by the OpenAI streaming protocol) against the requested model on the first chunk that carries it; mismatch throws `LocalDriverError("lmstudio", "model_missing", ...)` with the served-model id surfaced in the message + `lms load <requested>` hint. Closes the mid-session hole that `probe()` (boot-only) doesn't cover. New tests in `local-driver.test.ts`: substitution detected, exact-match passes through.
+- **Test coverage.** New `local-driver.test.ts` covers NDJSON partial-line buffering, SSE multi-event-per-chunk and single-event-split, `[DONE]` terminator, optional trailing `usage` chunk, tool-call args split across deltas, dedup behavior, and 404/5xx/network/abort error paths for both backends. New `local-tools.test.ts` covers `mcpToLocalTools` converter, `stripThoughts`, and `runToolLoop` via a scripted fake driver. New `local.test.ts` covers the capability-note matrix, audit-tag invariant (verified for both `local:ollama:%` and `local:lmstudio:%`), driver-error rendering, and token capture.
+- **Smoke.** `test/smokes/ollama.ts` → `test/smokes/local.ts`. `npm run smoke:ollama` → `npm run smoke:local`. Switches on `LOCAL_BACKEND` env (defaults to `ollama` for back-compat with the historical smoke target). Backend-aware pull/load hint check (`ollama pull` vs `lms load`).
+- **Pre-deploy backup recommendation.** Document in operator deploy procedure: `cp data/solrac.db data/solrac.db.pre-local-migration` before service restart. Rollback SQL is commented in `src/db.ts` next to the migration.
+- **No SDK pin bump.** No new runtime deps. No anti-goal reversals.
+
+Files renamed/added:
+- `src/ollama.ts` → `src/local.ts`, `src/ollama-tools.ts` → `src/local-tools.ts`, new `src/local-driver.ts`.
+- `src/ollama.test.ts` + `src/ollama-tools.test.ts` → `src/local.test.ts`, `src/local-tools.test.ts`, new `src/local-driver.test.ts`.
+- `test/smokes/ollama.ts` → `test/smokes/local.ts`.
+
 ## Unreleased — scheduler: switch to unix cron (BREAKING TASK.md format)
 
 Replaces the three-form schedule grammar (`every <dur>` / `daily_at HH:MM` / `at <ISO8601>`) with 5-field unix cron + optional per-task `tz:` (default: `$TZ` env / host runtime tz). One grammar closes four real gaps in a single change: time-of-day windows, day-of-week filtering, local-timezone scheduling, and anchored cadence. Predicate: the live stretch trigger on 2026-05-15 ("every 30m between 12:00 and 18:00 weekdays Denver") required thirteen separate `daily_at` TASK.md files under the old grammar.

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -32,7 +32,7 @@ For changes that touch policy, cost cap, audit, or shutdown semantics, also run
 
 ```sh
 npm run smoke:flood
-npm run smoke:ollama   # only if you have Ollama running locally
+LOCAL_BACKEND=ollama npm run smoke:local   # or LOCAL_BACKEND=lmstudio; only if the backend is running locally
 ```
 
 ## Style