Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 36 additions & 31 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -4,55 +4,60 @@ TELEGRAM_BOT_TOKEN=REPLACE_ME
ALLOWLIST_BOOTSTRAP=123456789

# ── Engine routing ──────────────────────────────────────────────────────────
# Default engine for messages with no `@` or `!` prefix. PR-B inversion: the
# default is `ollama` (free, local). Set to `primary` (Sonnet) or `secondary`
# (Opus) for Claude-only deploys without an Ollama daemon.
# ollama → no-prefix routes to local Ollama (recommended; requires daemon)
# Default engine for messages with no `@` or `!` prefix.
# local → no-prefix routes to the local-engine backend (recommended; free)
# primary → no-prefix routes to Anthropic Sonnet (Claude-only deploys)
# secondary → no-prefix routes to Anthropic Opus
SOLRAC_DEFAULT_ENGINE=ollama
SOLRAC_DEFAULT_ENGINE=local
SOLRAC_PRIMARY_MODEL=claude-sonnet-4-6 # `@` prefix
SOLRAC_SECONDARY_MODEL=claude-opus-4-7 # `!` prefix

# ── Ollama ──────────────────────────────────────────────────────────────────
# Required when SOLRAC_DEFAULT_ENGINE=ollama. Boot fails loud otherwise.
# `gpt-oss:20b` is the current default. Alternatives: `gemma4:e4b`
# (native function-calling, ~9.6GB, 128K context), `qwen2.5`, `llama3.2`.
OLLAMA_ENABLED=true
OLLAMA_URL=http://localhost:11434
OLLAMA_MODEL=gpt-oss:20b
# ── Local engine (Ollama / LMStudio) ────────────────────────────────────────
# Required when SOLRAC_DEFAULT_ENGINE=local. Boot fails loud otherwise.
#
# LOCAL_BACKEND picks the wire protocol:
# ollama → POST /api/chat with NDJSON streaming, probe /api/tags
# lmstudio → POST /v1/chat/completions with SSE streaming, probe /v1/models
#
# LOCAL_URL default is backend-aware:
# LOCAL_BACKEND=ollama → http://localhost:11434
# LOCAL_BACKEND=lmstudio → http://localhost:1234
# Explicit LOCAL_URL always wins.
#
# LOCAL_MODEL is the model id the backend exposes. Examples:
# Ollama: `gemma4:e4b` (native tool-calling, ~9.6GB, 128K ctx), `qwen2.5`, `llama3.2`
# LMStudio: `qwen2.5-7b`, `llama-3.2-3b-instruct` (whatever's loaded via the UI/`lms load`)
LOCAL_ENABLED=true
LOCAL_BACKEND=ollama
# LOCAL_URL=http://localhost:11434
LOCAL_MODEL=gemma4:e4b
# Total turn timeout. Default 60s when tools are off; bumps to 120s when
# OLLAMA_TOOLS_ENABLED=true (one mid-loop confirm prompt can consume 60s on
# its own, leaving zero budget for model rounds otherwise). Explicit override
# here always wins.
OLLAMA_TIMEOUT_MS=60000
OLLAMA_HISTORY_LIMIT=6
# Ollama tool-calling. When true, the local model can call the same
# LOCAL_TOOLS_ENABLED=true (one mid-loop confirm prompt can consume 60s on
# its own). Explicit override here always wins.
LOCAL_TIMEOUT_MS=60000
LOCAL_HISTORY_LIMIT=6
# Local tool-calling. When true, the local model can call the same
# `mcp__solrac__*` integration tools the Claude tiers see. Requires
# SOLRAC_INTEGRATIONS_ENABLED=true and SOLRAC_DEFAULT_ENGINE=ollama
# (boot validation: tools-on with Claude as default is unreachable since
# PR-B removed the `>` prefix). Recommended `true` for the default deploy.
OLLAMA_TOOLS_ENABLED=true
# SOLRAC_INTEGRATIONS_ENABLED=true and SOLRAC_DEFAULT_ENGINE=local.
LOCAL_TOOLS_ENABLED=true
# Hard ceiling on tool-loop rounds per turn. Loop detector fires earlier on
# duplicate calls; this is the runaway-loop backstop.
OLLAMA_MAX_TOOL_ITERATIONS=8
LOCAL_MAX_TOOL_ITERATIONS=8

# ── Integrations (precondition for OLLAMA_TOOLS_ENABLED=true) ───────────────
# ── Integrations (precondition for LOCAL_TOOLS_ENABLED=true) ───────────────
# Operator-authored TS modules + blessed built-ins. When true, both the
# blessed integrations bundled with solrac (`src/integrations-builtin/`) and
# any operator integrations under SOLRAC_INTEGRATIONS_DIR are loaded. Effective
# for Claude tiers (`@`, `!`) and for Ollama when OLLAMA_TOOLS_ENABLED=true.
# Recommended `true` to pair with the default Ollama tools-on deploy.
# any operator integrations under SOLRAC_INTEGRATIONS_DIR are loaded.
SOLRAC_INTEGRATIONS_ENABLED=true
SOLRAC_INTEGRATIONS_DIR=./integrations

# ── Claude-only deploy alternative ──────────────────────────────────────────
# Uncomment this block (and comment out the Ollama section above) for hosts
# that can't run Ollama. No-prefix messages then route to Anthropic Sonnet.
# Uncomment this block (and comment out the local-engine section above) for
# hosts that can't run a local model. No-prefix messages then route to Sonnet.
# `@`/`!` prefixes still work as before.
# SOLRAC_DEFAULT_ENGINE=primary
# OLLAMA_ENABLED=false
# OLLAMA_TOOLS_ENABLED=false
# LOCAL_ENABLED=false
# LOCAL_TOOLS_ENABLED=false
# SOLRAC_INTEGRATIONS_ENABLED=true # still useful for Claude tiers

# ── Operational ─────────────────────────────────────────────────────────────
Expand Down
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,28 @@
# Changelog

## Unreleased — local LLM backend abstraction: Ollama + LMStudio (BREAKING)

Replaces the Ollama-specific path with a generic `local` engine that supports multiple backends behind a unified driver interface (`src/local-driver.ts`). Hard cutover — every `OLLAMA_*` env var, `engine: ollama` / `tier: ollama` frontmatter value, and `/clear ollama` / `>` slash alias is rejected with a rename hint. The audit-row tag becomes three-segment `local:<backend>:<modelId>` and matches the `claude:<tier>:<modelId>` shape so cross-engine queries are symmetric. LMStudio joins Ollama as a first-class backend with its own SSE wire format, `parallel_tool_calls: false` Gemma-4 workaround, and tool-call argument-delta accumulation.

- **Env vars.** All `OLLAMA_*` → `LOCAL_*`. New `LOCAL_BACKEND` (required when `LOCAL_ENABLED=true`): `ollama` or `lmstudio`. `LOCAL_URL` default is backend-aware (Ollama → `:11434`, LMStudio → `:1234`). Boot fails loud on any legacy `OLLAMA_*` env var with the rename mapping, and on `SOLRAC_DEFAULT_ENGINE=ollama` with a hint pointing at `local` + `LOCAL_BACKEND=ollama`.
- **Audit `model` column format.** `ollama:<modelId>` → `local:<backend>:<modelId>`. Migration runs idempotent retag at boot (`UPDATE audit SET model = 'local:ollama:' || substr(model, 8) WHERE model LIKE 'ollama:%'`) BEFORE the column rename below, so a crash between steps still leaves audit queries (dual-pattern reads, see next bullet) working.
- **Dual-pattern reads for one release.** `outOfBandForEngine` and `hasLocalTurnsSince` match BOTH `local:%` and legacy `ollama:%`. Mitigates rollback / partial-migration risk. The legacy clause is removed in a follow-up release once the migration has propagated.
- **Sessions schema.** Column rename `ollama_cutoff_ms` → `local_cutoff_ms` via `ALTER TABLE ... RENAME COLUMN` (SQLite 3.25+). Idempotent: legacy column → rename, neither → add new.
- **Slash commands.** `/clear ollama` → `/clear local`. Aliases `o` and `>` dropped; `l` is the new short form. `/status` line "ollama turns (24h)" → "local turns (24h)". The "Cleared <b>ollama</b>" reply text becomes "Cleared <b>local</b>".
- **Operator-edited markdown.** `tasks/*.md` `engine: ollama` and `skills/*.md` `tier: ollama` are **hard-rejected at parse** with rename hints. Replace with `engine: local` / `tier: local` before redeploying. Same hard-reject for `SOLRAC_DEFAULT_ENGINE=ollama`.
- **Web UI pill label.** `defaultEngineLabel` returns `local (<backend>)` for the local engine (e.g. `local (ollama)`, `local (lmstudio)`) so the operator sees the backend at a glance.
- **LMStudio driver hardening.** Sends `parallel_tool_calls: false` (Gemma-4 lmstudio-bug-tracker #1756 workaround) and dedupes identical `(name, args)` tool calls within one assistant message. Accumulates `function.arguments` deltas across SSE chunks before emitting one parsed `tool_call` event. Captures `usage` chunk for `inputTokens`/`outputTokens` whether it arrives inline or on a trailing dedicated chunk.
- **LMStudio silent-substitution detection.** LMStudio's `POST /v1/chat/completions` returns HTTP 200 with the *loaded* model when the requested id isn't loaded, rather than 404'ing. Caught during the carlos/solrac-local-llm-backend smoke run: a fake-model request returned a normal completion instead of erroring. Driver now compares `chunk.model` (echoed by the OpenAI streaming protocol) against the requested model on the first chunk that carries it; mismatch throws `LocalDriverError("lmstudio", "model_missing", ...)` with the served-model id surfaced in the message + `lms load <requested>` hint. Closes the mid-session hole that `probe()` (boot-only) doesn't cover. New tests in `local-driver.test.ts`: substitution detected, exact-match passes through.
- **Test coverage.** New `local-driver.test.ts` covers NDJSON partial-line buffering, SSE multi-event-per-chunk and single-event-split, `[DONE]` terminator, optional trailing `usage` chunk, tool-call args split across deltas, dedup behavior, and 404/5xx/network/abort error paths for both backends. New `local-tools.test.ts` covers `mcpToLocalTools` converter, `stripThoughts`, and `runToolLoop` via a scripted fake driver. New `local.test.ts` covers the capability-note matrix, audit-tag invariant (verified for both `local:ollama:%` and `local:lmstudio:%`), driver-error rendering, and token capture.
- **Smoke.** `test/smokes/ollama.ts` → `test/smokes/local.ts`. `npm run smoke:ollama` → `npm run smoke:local`. Switches on `LOCAL_BACKEND` env (defaults to `ollama` for back-compat with the historical smoke target). Backend-aware pull/load hint check (`ollama pull` vs `lms load`).
- **Pre-deploy backup recommendation.** Document in operator deploy procedure: `cp data/solrac.db data/solrac.db.pre-local-migration` before service restart. Rollback SQL is commented in `src/db.ts` next to the migration.
- **No SDK pin bump.** No new runtime deps. No anti-goal reversals.

Files renamed/added:
- `src/ollama.ts` → `src/local.ts`, `src/ollama-tools.ts` → `src/local-tools.ts`, new `src/local-driver.ts`.
- `src/ollama.test.ts` + `src/ollama-tools.test.ts` → `src/local.test.ts`, `src/local-tools.test.ts`, new `src/local-driver.test.ts`.
- `test/smokes/ollama.ts` → `test/smokes/local.ts`.

## Unreleased — scheduler: switch to unix cron (BREAKING TASK.md format)

Replaces the three-form schedule grammar (`every <dur>` / `daily_at HH:MM` / `at <ISO8601>`) with 5-field unix cron + optional per-task `tz:` (default: `$TZ` env / host runtime tz). One grammar closes four real gaps in a single change: time-of-day windows, day-of-week filtering, local-timezone scheduling, and anchored cadence. Predicate: the live stretch trigger on 2026-05-15 ("every 30m between 12:00 and 18:00 weekdays Denver") required thirteen separate `daily_at` TASK.md files under the old grammar.
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ For changes that touch policy, cost cap, audit, or shutdown semantics, also run

```sh
npm run smoke:flood
npm run smoke:ollama # only if you have Ollama running locally
LOCAL_BACKEND=ollama npm run smoke:local # or LOCAL_BACKEND=lmstudio; only if the backend is running locally
```

## Style
Expand Down
Loading
Loading