Skip to content

add multi-backend local engine; deprecate OLLAMA_* (breaking)#23

Merged
cjus merged 1 commit into
mainfrom
carlos/solrac-local-llm-backend
May 15, 2026
Merged

add multi-backend local engine; deprecate OLLAMA_* (breaking)#23
cjus merged 1 commit into
mainfrom
carlos/solrac-local-llm-backend

Conversation

@cjus
Copy link
Copy Markdown
Owner

@cjus cjus commented May 15, 2026

Summary

Replaces the Ollama-specific engine path with a generic local engine fronted by a LocalDriver interface and two implementations: Ollama (NDJSON /api/chat) and LMStudio (SSE /v1/chat/completions). Hard cutover — every OLLAMA_* env var, engine: ollama / tier: ollama frontmatter value, and /clear ollama / > / o alias is hard-rejected at boot or parse time with a rename hint. Audit row tag becomes three-segment local:<backend>:<modelId>, mirroring claude:<tier>:<modelId>.

Why. Solrac is a local-first deploy; the engine layer needs to support more than one local backend without abstraction debt. LMStudio joins as a first-class backend, the migration runway covers both directions, and operator-facing surfaces (env, frontmatter, slash commands, web UI label) all reflect the new shape consistently.

Breaking changes

  • Env vars. All OLLAMA_*LOCAL_*. New required LOCAL_BACKEND (ollama | lmstudio) when LOCAL_ENABLED=true. LOCAL_URL default is backend-aware (:11434 Ollama, :1234 LMStudio). Boot fails loud on legacy OLLAMA_* keys and SOLRAC_DEFAULT_ENGINE=ollama.
  • Audit model format. ollama:<m>local:<backend>:<m>. Idempotent retag migration at boot — order is load-bearing (retag before column rename). Dual-pattern reads (local:% + ollama:%) in outOfBandForEngine + hasLocalTurnsSince keep cross-engine queries correct during the rollback window. Legacy clause drop scheduled for the next release.
  • Schema. sessions.ollama_cutoff_mssessions.local_cutoff_ms via ALTER TABLE … RENAME COLUMN.
  • Slash commands. /clear ollama / /clear > / /clear o/clear local (alias: l). /status line "ollama turns (24h)""local turns (24h)".
  • Operator markdown. tasks/*.md engine: ollama and skills/*.md tier: ollama hard-rejected at parse with rename hints.
  • Web UI label. local (<backend>) — e.g. local (ollama), local (lmstudio).
  • Thinking-stub emoji. 🦙 → 💻 (backend-neutral).

Driver hardening (Ollama + LMStudio parity)

  • LMStudio: parallel_tool_calls: false + identical-(name, args) dedup (Gemma-4 lmstudio-bug-tracker #1756 workaround). Tool-call arguments delta accumulation across SSE chunks. usage capture whether inline or trailing.
  • LMStudio silent-substitution detection. LMStudio's OpenAI-compatible endpoint returns 200 OK with the loaded model when the requested id isn't loaded. Caught during the live smoke run; driver now compares chunk.model (case-insensitive) against the requested model on the first chunk that carries it, throws model_missing with the served-model id surfaced + lms load <requested> hint. Closes the mid-session hole that boot-time probe() doesn't cover.
  • Ollama driver instanceof LocalDriverError guard in stream-catch for symmetry with LMStudio — future defensive throws inside the stream loop won't get clobbered by the generic unreachable wrap.

Post-review hardening

  • LOCAL_* scrubbed from SDK subprocess env (agent.ts::sanitizedSubprocessEnv). LOCAL_URL in particular could leak internal network topology via auto-allowed Bash(echo $LOCAL_URL).
  • /clear ollama / /clear o / /clear > now returns an explicit → use /clear local hint instead of silent "Unknown command".
  • audit.tool_calls capped at 64KB (AUDIT_TOOL_CALLS_MAX_LEN) in db.ts::updateAuditEnd — defends against runaway local-model arg blobs (8 iterations × hallucinated 100KB args = potential MB-sized audit rows). Centralized so all audit writers (Claude SDK, local engine, skills) get the protection.

Test plan

  • npm run typecheck — clean
  • bun test755/755 pass across 30 files (+8 net vs main)
  • Live Ollama smoke (pure inference) — 16/16 with gemma4:e4b
  • Live Ollama smoke (tools-on, time_now round-trip) — 21/21
  • Live LMStudio smoke (pure inference) — 17/17 with gemma-4-31b-it-mlx
  • Live LMStudio smoke (tools-on, SSE delta + Gemma-4 dedup) — 21/21
  • Migration snapshot on synthetic 250-row prod-like db with 84 legacy ollama:% rows + 2 sessions on the legacy ollama_cutoff_ms column — first boot retags + renames with correct row count log, second boot silent (idempotent), cutoff values preserved
  • Frontmatter rejection (engine: ollama, tier: ollama) — asserts the rename hint via scheduler.test.ts:446 and skills.test.ts:311
  • /clear local end-to-end — dispatcher + reply text + setLocalCutoff + cross-engine bridge dual-pattern read all covered by existing tests
  • Pre-deploy: cp data/solrac.db data/solrac.db.pre-local-migration before service restart
  • First boot logs db.migrated: audit.ollama_retagged_to_local rowsChanged: N then sessions.ollama_cutoff_ms_renamed_to_local. Subsequent boots silent on these migrations.

Follow-ups (next release)

  • Drop the legacy ollama:% dual-pattern clause in outOfBandForEngine + hasLocalTurnsSince after one release cycle. Removes the dual-LIKE clauses from operator SQL examples in docs/OPERATIONS.md, docs/SCHEMA.md, docs/RUNBOOK.md.
  • Add explicit sanitizedSubprocessEnv test coverage — src/agent.test.ts doesn't exist yet and none of the scrub lines (incl. NOTION_API_KEY, STATS_BEARER_TOKEN) have direct coverage today.

Anti-goals

No reversals. No SDK pin bump. No new runtime deps.

replace the ollama-specific engine path with a generic `local` engine fronted
by a driver interface and two implementations: ollama (NDJSON /api/chat) and
lmstudio (SSE /v1/chat/completions). hard cutover — every OLLAMA_* env var,
`engine: ollama` / `tier: ollama` frontmatter value, and `/clear ollama`/`>`/`o`
alias is rejected at boot or parse time with a rename hint.

key changes:
- audit model column: `ollama:<m>` → `local:<backend>:<m>` (idempotent retag
  migration at boot; load-bearing order — retag before sessions column rename)
- sessions.ollama_cutoff_ms → sessions.local_cutoff_ms via RENAME COLUMN
- dual-pattern reads (local:% + ollama:%) for one release cycle in
  outOfBandForEngine + hasLocalTurnsSince
- LOCAL_BACKEND required when LOCAL_ENABLED=true; URL default is backend-aware
- web UI pill label: `local (<backend>)`
- thinking-stub emoji: 🦙 → 💻 (backend-neutral)
- lmstudio driver: parallel_tool_calls=false + identical-(name,args) dedup
  (gemma-4 lmstudio-bug-tracker #1756 workaround), arg-delta accumulation
  across SSE chunks, usage chunk capture (inline or trailing)

post-review hardening:
- lmstudio silent-substitution detection: chunk.model mismatch (case-insensitive)
  throws model_missing with served-model id surfaced + `lms load` hint. closes
  mid-session hole probe() didn't cover.
- LOCAL_* scrubbed from SDK subprocess env (LOCAL_URL could leak network topology)
- /clear ollama|o|> returns explicit rename hint instead of silent "unknown"
- audit.tool_calls capped at 64KB to defend runaway local-model arg blobs
- ollama driver stream-catch gains instanceof LocalDriverError guard for symmetry

verification: typecheck clean, bun test 755/755 pass (+8 net new tests across
local-driver, local-tools, local, db, commands). live smokes against ollama
gemma4:e4b 21/21 and lmstudio gemma-4-31b-it-mlx 21/21 (pure + tools-on).
migration snapshot verified on synthetic 250-row prod-like db with 84 legacy
ollama:% rows + 2 sessions on the legacy column — first boot retags + renames,
second boot is silent (idempotent).

no SDK pin bump. no anti-goal reversals. pre-deploy: cp data/solrac.db
data/solrac.db.pre-local-migration before service restart.
@cjus cjus merged commit cdb7fe5 into main May 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant