feat: make GLM 5.2 the default Ollama Cloud model#175
Conversation
Switch DEFAULT_OLLAMA_MODEL from kimi-k2.6:cloud to glm-5.2:cloud. Validated with a 16-task agent-sidebar benchmark (real test inbox, interleaved, warm, auto-draft disabled) against kimi-k2.7-code:cloud. GLM 5.2 was the faster model in both runs (faster on 10/16 then 13/16) at parity final-answer quality. Both models clearly beat the prior kimi-k2.6 default. Reasoning is returned in proper `thinking` blocks on Ollama's Anthropic-compat endpoint, so chain-of- thought does not leak into drafts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR changes the default Ollama Cloud model from
Confidence Score: 5/5Safe to merge — the change is a single constant string swap in a well-isolated location, and the eval fixes are correct improvements that prevent silent failures on retired model IDs. The core change is a one-line constant update that flows automatically to all callsites. The eval harness fixes are straightforward and correct — each service constructor is now given the live model ID rather than the retired hardcoded default. No production logic paths are altered beyond the new default model name. No files require special attention. All changes are narrow, well-commented, and consistent with the existing pattern.
|
| Filename | Overview |
|---|---|
| src/shared/types.ts | Single-constant update: DEFAULT_OLLAMA_MODEL changed from kimi-k2.6:cloud to glm-5.2:cloud with an expanded rationale comment. Change is isolated and flows to all dependent callsites automatically. |
| tests/evals/runner.ts | Fixes a latent bug: EmailAnalyzer is now constructed with the resolved current-tier model ID instead of the retired claude-sonnet-4-20250514 default that causes 404s. |
| tests/evals/features/archive-ready-analyzer.ts | Same eval-harness fix as runner.ts applied to ArchiveReadyAnalyzer; model now resolved from DEFAULT_MODEL_CONFIG.archiveReady. |
| tests/evals/features/calendaring-agent.ts | Same eval-harness fix applied to CalendaringAgent; model resolved from DEFAULT_MODEL_CONFIG.calendaring. |
| tests/evals/features/draft-generator.ts | Same eval-harness fix applied to DraftGenerator; model resolved from DEFAULT_MODEL_CONFIG.drafts. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["DEFAULT_OLLAMA_MODEL = glm-5.2:cloud"] --> B[OllamaCloudConfigSchema defaultModel]
A --> C[resolveAgentOllamaConfig]
A --> D[getFeatureModelConfig]
A --> E[Setup Wizard placeholder]
A --> F[Model-selector placeholder]
G["DEFAULT_MODEL_CONFIG + resolveModelId"] --> H[EmailAnalyzer eval]
G --> I[ArchiveReadyAnalyzer eval]
G --> J[CalendaringAgent eval]
G --> K[DraftGenerator eval]
C --> L[Agent path: /v1/messages thinking blocks]
D --> M[Non-agent path: /api/chat think=true]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A["DEFAULT_OLLAMA_MODEL = glm-5.2:cloud"] --> B[OllamaCloudConfigSchema defaultModel]
A --> C[resolveAgentOllamaConfig]
A --> D[getFeatureModelConfig]
A --> E[Setup Wizard placeholder]
A --> F[Model-selector placeholder]
G["DEFAULT_MODEL_CONFIG + resolveModelId"] --> H[EmailAnalyzer eval]
G --> I[ArchiveReadyAnalyzer eval]
G --> J[CalendaringAgent eval]
G --> K[DraftGenerator eval]
C --> L[Agent path: /v1/messages thinking blocks]
D --> M[Non-agent path: /api/chat think=true]
Reviews (2): Last reviewed commit: "fix(evals): eval harness used retired de..." | Re-trigger Greptile
…20250514) The eval runners constructed services with no model argument, falling back to the retired legacy constructor default `claude-sonnet-4-20250514`, which now 404s — breaking `npm run eval` (and the pre-pr eval gate) for everyone. The app itself is unaffected because it passes the resolved live model. Pass each eval service the app's actual default model via resolveModelId(DEFAULT_MODEL_CONFIG[feature]) so the eval tracks real default resolution and can't rot when a model id is retired: - runner.ts (analyzer) - features/draft-generator.ts - features/calendaring-agent.ts - features/archive-ready-analyzer.ts Verified: analyzer eval 10/10 no regressions; feature evals pass with +0.0 deltas vs baseline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
✅ Pre-PR verification — PASS
Agentic verification — summaryAgentic verification — verify-diff
SummaryPR makes two categories of changes: (1) switches DEFAULT_OLLAMA_MODEL from kimi-k2.6:cloud to glm-5.2:cloud in types.ts — only affects Ollama Cloud users, invisible in Anthropic/demo mode; (2) updates eval test constructors to pass explicit model IDs instead of relying on no-arg defaults — test infrastructure change with no UI impact. Verified: app loads cleanly, email detail view opens with draft content and sender profile panel, Settings panel with all tabs (General AI Models, Agents) opens correctly, back-navigation via Escape returns to inbox list with expected sticky selection state. No JS errors, no layout breaks, no stuck states observed. Agentic verification — literal trace[2026-06-16T21:19:54.129Z] text: Let me click on an email to test basic flow, then check Settings for the Ollama model default. {"clicked":true,"threadId":"thread-bug-report"}[2026-06-16T21:19:58.843Z] tool#8: mcp__chrome-devtools__take_screenshot {"top":128,"left":0,"width":879,"height":32,"tag":"DIV"}[2026-06-16T21:20:06.649Z] tool#10: ToolSearch Latest page snapshotuid=1_0 RootWebArea "Exo" url="http://localhost:5173/" Severity: P1 Issue: Current impact:
We need your expertise on the payment integration. Can you join the incident channel? Slack: #incident-checkout-012 On-Call Team" Quick context: The payment timeout issue might be related to the third-party payment processor. I saw similar symptoms last month when they had latency issues. First steps I'd recommend:
Hopping on Zoom now." Latest page snapshotuid=1_0 RootWebArea "Exo" url="http://localhost:5173/" Latest page snapshotuid=1_0 RootWebArea "Exo" url="http://localhost:5173/" Severity: P1 {"selectedRows":1,"senderName":"On-Call","emptySidebar":false}[2026-06-16T21:21:26.506Z] tool#21: mcp__chrome-devtools__take_screenshot Summary of what I verified:
{"verdict":"pass","summary":"PR makes two categories of changes: (1) switches DEFAULT_OLLAMA_MODEL from kimi-k2.6:cloud to glm-5.2:cloud in types.ts — only affects Ollama Cloud users, invisible in Anthropic/demo mode; (2) updates eval test constructors to pass explicit model IDs instead of relying on no-arg defaults — test infrastructure change with no UI impact. Verified: app loads cleanly, email detail view opens with draft content and sender profile panel, Settings panel with all tabs (General AI Models, Agents) opens correctly, back-navigation via Escape returns to inbox list with expected sticky selection state. No JS errors, no layout breaks, no stuck states observed.","anomalies":[],"actions_taken":13} |
## Summary Follow-up to #175 (make GLM 5.2 the default Ollama model). That PR changed `DEFAULT_OLLAMA_MODEL`, but the change is a **no-op for anyone who had already enabled Ollama** — so the Settings UI and the agent both kept using `kimi-k2.6:cloud`. ## Root cause `DEFAULT_OLLAMA_MODEL` is only the *fallback* when `ollamaCloud.defaultModel` is unset. But `SetupWizard` and `ExtensionsTab` **persist** `DEFAULT_OLLAMA_MODEL` into `ollamaCloud.defaultModel` whenever Ollama is set up/saved, and `resolveAgentOllamaConfig` resolves the agent model as `featureModels.agentDrafter ?? oc.defaultModel ?? DEFAULT_OLLAMA_MODEL` — so the persisted `kimi-k2.6:cloud` wins over the new constant. Existing installs therefore stay pinned to kimi-k2.6 (reported: the "Default Model" field still shows `kimi-k2.6:cloud` after #175). ## Fix A `configVersion` 3 migration in `getConfig()` that flips a persisted `ollamaCloud.defaultModel` of **exactly** `"kimi-k2.6:cloud"` (the old default) to the new `DEFAULT_OLLAMA_MODEL` (`glm-5.2:cloud`). Any explicitly-chosen model is left untouched. Mirrors the existing legacy-`model` migration pattern in the same function. ## Verification Launched dev with a config persisted at `kimi-k2.6:cloud` → after launch, `settings.get()` returns `configVersion: 3` and `ollamaCloud.defaultModel: "glm-5.2:cloud"`. The Settings "Default Model" field now shows GLM. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/ankitvgupta/exo/pull/177" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end --> <!-- PRE-PR-REPORT-START SHA=1e20bfb mode=full --> **Pre-PR verdict**: PASS - mode: `full` - sha: `1e20bfb` - generated: 2026-06-16T22:31:37.864Z | Phase | Status | Duration | |---|---|---| | eval:analyzer | ✅ exit 0 | 13.4s | | eval:features | ✅ exit 0 | 29.7s | | agentic-verify | ✅ exit 0 | 78.3s | | real-gmail:cached | ✅ exit 0 | 8.0s | <!-- PRE-PR-REPORT-END --> --------- Co-authored-by: Ankit Gupta <ankit@ycombinator.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Following the merge of #175 (GLM 5.2 as DEFAULT_OLLAMA_MODEL), update the curated dropdown list to match: GLM 5.2 is now the default-first entry, the "(default)" label moves off Kimi, and glm-5.2:cloud is added so the active default actually renders as a dropdown option (it was absent, which would have forced the field into "Custom…" mode). MiniMax M3 / M2.7 remain in the picker. Dropped the now-superseded glm-4.6 entry. Verified live: the Default Model field selects "GLM 5.2 — z.ai (default)" and MiniMax remains selectable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Switches the default Ollama Cloud model (
DEFAULT_OLLAMA_MODEL) fromkimi-k2.6:cloudtoglm-5.2:cloud(z.ai, 744B MoE, 1M context — launched June 2026, live on Ollama Cloud).This is the single source of truth for the Ollama default: it flows to the agent worker (
resolveAgentOllamaConfig), per-feature resolution (getFeatureModelConfig), theOllamaCloudConfigSchemadefault, the Setup Wizard, and the Extensions/Settings model-selector placeholders. No other code changes are required — the UI selectors are free-text inputs that reference the constant.Why
Evaluated against the prior candidates with a 16-task agent-sidebar benchmark run through the real app (test inbox, real LLM calls), interleaved per-task (both models back-to-back, alternating order), warm, with auto-draft disabled so the single-threaded agent worker was dedicated.
Speed (within-run; absolute times vary across sessions from cloud load, so only within-run comparisons are valid):
Quality: parity on final-turn answers (GLM slightly more thorough on extraction). Both produce natural, grounded drafts.
Integration correctness: verified via a direct compat-endpoint probe that both models emit reasoning in proper
thinkingblocks on Ollama's/v1/messages— chain-of-thought is not leaking into drafts. (An initial benchmark wrongly penalized GLM because the harness concatenated assistant text across all agent turns, counting GLM's brief inter-turn progress lines as "verbosity"; fixed to score the final answer only.)Changes
src/shared/types.ts:DEFAULT_OLLAMA_MODEL = "glm-5.2:cloud"+ updated rationale comment.Scope & caveats
/api/chatpath withthink:true, which keeps CoT out of parsed JSON the same way kimi-k2.6 did — but they were not separately benchmarked. A follow-up eval of those features on GLM 5.2 is worth doing.🤖 Generated with Claude Code
Pre-PR verdict: PASS
full226806d