add multi-backend local engine; deprecate OLLAMA_* (breaking)#23
Merged
Conversation
replace the ollama-specific engine path with a generic `local` engine fronted by a driver interface and two implementations: ollama (NDJSON /api/chat) and lmstudio (SSE /v1/chat/completions). hard cutover — every OLLAMA_* env var, `engine: ollama` / `tier: ollama` frontmatter value, and `/clear ollama`/`>`/`o` alias is rejected at boot or parse time with a rename hint. key changes: - audit model column: `ollama:<m>` → `local:<backend>:<m>` (idempotent retag migration at boot; load-bearing order — retag before sessions column rename) - sessions.ollama_cutoff_ms → sessions.local_cutoff_ms via RENAME COLUMN - dual-pattern reads (local:% + ollama:%) for one release cycle in outOfBandForEngine + hasLocalTurnsSince - LOCAL_BACKEND required when LOCAL_ENABLED=true; URL default is backend-aware - web UI pill label: `local (<backend>)` - thinking-stub emoji: 🦙 → 💻 (backend-neutral) - lmstudio driver: parallel_tool_calls=false + identical-(name,args) dedup (gemma-4 lmstudio-bug-tracker #1756 workaround), arg-delta accumulation across SSE chunks, usage chunk capture (inline or trailing) post-review hardening: - lmstudio silent-substitution detection: chunk.model mismatch (case-insensitive) throws model_missing with served-model id surfaced + `lms load` hint. closes mid-session hole probe() didn't cover. - LOCAL_* scrubbed from SDK subprocess env (LOCAL_URL could leak network topology) - /clear ollama|o|> returns explicit rename hint instead of silent "unknown" - audit.tool_calls capped at 64KB to defend runaway local-model arg blobs - ollama driver stream-catch gains instanceof LocalDriverError guard for symmetry verification: typecheck clean, bun test 755/755 pass (+8 net new tests across local-driver, local-tools, local, db, commands). live smokes against ollama gemma4:e4b 21/21 and lmstudio gemma-4-31b-it-mlx 21/21 (pure + tools-on). migration snapshot verified on synthetic 250-row prod-like db with 84 legacy ollama:% rows + 2 sessions on the legacy column — first boot retags + renames, second boot is silent (idempotent). no SDK pin bump. no anti-goal reversals. pre-deploy: cp data/solrac.db data/solrac.db.pre-local-migration before service restart.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the Ollama-specific engine path with a generic
localengine fronted by aLocalDriverinterface and two implementations: Ollama (NDJSON/api/chat) and LMStudio (SSE/v1/chat/completions). Hard cutover — everyOLLAMA_*env var,engine: ollama/tier: ollamafrontmatter value, and/clear ollama/>/oalias is hard-rejected at boot or parse time with a rename hint. Audit row tag becomes three-segmentlocal:<backend>:<modelId>, mirroringclaude:<tier>:<modelId>.Why. Solrac is a local-first deploy; the engine layer needs to support more than one local backend without abstraction debt. LMStudio joins as a first-class backend, the migration runway covers both directions, and operator-facing surfaces (env, frontmatter, slash commands, web UI label) all reflect the new shape consistently.
Breaking changes
OLLAMA_*→LOCAL_*. New requiredLOCAL_BACKEND(ollama|lmstudio) whenLOCAL_ENABLED=true.LOCAL_URLdefault is backend-aware (:11434Ollama,:1234LMStudio). Boot fails loud on legacyOLLAMA_*keys andSOLRAC_DEFAULT_ENGINE=ollama.modelformat.ollama:<m>→local:<backend>:<m>. Idempotent retag migration at boot — order is load-bearing (retag before column rename). Dual-pattern reads (local:%+ollama:%) inoutOfBandForEngine+hasLocalTurnsSincekeep cross-engine queries correct during the rollback window. Legacy clause drop scheduled for the next release.sessions.ollama_cutoff_ms→sessions.local_cutoff_msviaALTER TABLE … RENAME COLUMN./clear ollama//clear >//clear o→/clear local(alias:l)./statusline"ollama turns (24h)"→"local turns (24h)".tasks/*.mdengine: ollamaandskills/*.mdtier: ollamahard-rejected at parse with rename hints.local (<backend>)— e.g.local (ollama),local (lmstudio).Driver hardening (Ollama + LMStudio parity)
parallel_tool_calls: false+ identical-(name, args)dedup (Gemma-4 lmstudio-bug-tracker #1756 workaround). Tool-callargumentsdelta accumulation across SSE chunks.usagecapture whether inline or trailing.chunk.model(case-insensitive) against the requested model on the first chunk that carries it, throwsmodel_missingwith the served-model id surfaced +lms load <requested>hint. Closes the mid-session hole that boot-timeprobe()doesn't cover.instanceof LocalDriverErrorguard in stream-catch for symmetry with LMStudio — future defensive throws inside the stream loop won't get clobbered by the genericunreachablewrap.Post-review hardening
LOCAL_*scrubbed from SDK subprocess env (agent.ts::sanitizedSubprocessEnv).LOCAL_URLin particular could leak internal network topology via auto-allowedBash(echo $LOCAL_URL)./clear ollama//clear o//clear >now returns an explicit→ use /clear localhint instead of silent "Unknown command".audit.tool_callscapped at 64KB (AUDIT_TOOL_CALLS_MAX_LEN) indb.ts::updateAuditEnd— defends against runaway local-model arg blobs (8 iterations × hallucinated 100KB args = potential MB-sized audit rows). Centralized so all audit writers (Claude SDK, local engine, skills) get the protection.Test plan
npm run typecheck— cleanbun test— 755/755 pass across 30 files (+8 net vs main)gemma4:e4btime_nowround-trip) — 21/21gemma-4-31b-it-mlxollama:%rows + 2 sessions on the legacyollama_cutoff_mscolumn — first boot retags + renames with correct row count log, second boot silent (idempotent), cutoff values preservedengine: ollama,tier: ollama) — asserts the rename hint viascheduler.test.ts:446andskills.test.ts:311/clear localend-to-end — dispatcher + reply text +setLocalCutoff+ cross-engine bridge dual-pattern read all covered by existing testscp data/solrac.db data/solrac.db.pre-local-migrationbefore service restartdb.migrated: audit.ollama_retagged_to_local rowsChanged: Nthensessions.ollama_cutoff_ms_renamed_to_local. Subsequent boots silent on these migrations.Follow-ups (next release)
ollama:%dual-pattern clause inoutOfBandForEngine+hasLocalTurnsSinceafter one release cycle. Removes the dual-LIKEclauses from operator SQL examples indocs/OPERATIONS.md,docs/SCHEMA.md,docs/RUNBOOK.md.sanitizedSubprocessEnvtest coverage —src/agent.test.tsdoesn't exist yet and none of the scrub lines (incl.NOTION_API_KEY,STATS_BEARER_TOKEN) have direct coverage today.Anti-goals
No reversals. No SDK pin bump. No new runtime deps.