feat: LLM model fallback on failure — fast failover to secondary provider/model#692
feat: LLM model fallback on failure — fast failover to secondary provider/model#692phuongvm wants to merge 14 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds per-agent immediate LLM fallback and Nous provider support with OAuth auto-refresh; forces fallback on first retryable failure via a ContextVar; converts Langfuse traces to generation-type with token reporting; refactors summarizer telemetry; updates MCP Docker/Bun dev setup; expands tests and OpenSpec documentation. ChangesLLM Fallback, Nous Provider, and Observability
MCP Deployment and Configuration
OpenSpec Documentation and Planning
Sequence Diagram (high level) sequenceDiagram
participant Client
participant honcho_llm_call
participant Runtime
participant Backend
participant Langfuse
Client->>honcho_llm_call: request (track_name?)
honcho_llm_call->>Runtime: plan_attempt(force_fallback?)
Runtime->>Backend: select client_for_model_config & call
Backend->>honcho_llm_call: response / AuthenticationError / HTTP error
honcho_llm_call->>Runtime: classify retryable -> set force_fallback
honcho_llm_call->>Langfuse: update_current_generation(name/model, metadata, usage_details, is_fallback)
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
|
hey @phuongvm thanks for the contribution. it looks like the commits/files changes are adding a lot of changes that are not in the scope of the PR title/description. could you refactor the PR to be focused on the |
There was a problem hiding this comment.
Actionable comments posted: 19
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
mcp/src/config.ts (1)
19-23:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDocumentation contradicts implementation: baseUrl is now exposed via request header.
The comment at lines 19-23 states:
"It is intentionally not exposed as a request header: routing public requests to an internal URL would be a latency and security regression."
However, line 50 now reads
X-Honcho-Base-Urlfrom the request header as the first fallback source, which directly contradicts this stated security principle. If exposing the base URL via header is acceptable, the comment must be updated. If it's not acceptable, the header check should be removed.📝 Proposed documentation fix
Update the comment to reflect the new multi-source strategy:
/** * Parse configuration from request headers and Worker env bindings. * Throws on missing required fields so callers get clear errors. * - * The Honcho API URL is read from the `HONCHO_API_URL` env var when set, - * allowing operators to run this Worker alongside a self-hosted Honcho - * instance (see the "Self-Hosted Honcho" section in README.md). It is - * intentionally not exposed as a request header: routing public requests - * to an internal URL would be a latency and security regression. + * The Honcho API URL is resolved from multiple sources in priority order: + * 1. X-Honcho-Base-Url request header (for per-request override) + * 2. HONCHO_API_URL environment variable (Worker env or process.env) + * 3. Default https://api.honcho.dev + * + * Operators should validate that X-Honcho-Base-Url is only accepted from + * trusted sources to prevent routing public requests to internal endpoints. */Also applies to: 50-50
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mcp/src/config.ts` around lines 19 - 23, The comment claiming the Honcho baseUrl is "intentionally not exposed as a request header" is out of sync with the code that reads the X-Honcho-Base-Url header first when resolving baseUrl; update the comment in mcp/src/config.ts to describe the current multi-source resolution order (X-Honcho-Base-Url header → HONCHO_API_URL env var → default), mention the header name `X-Honcho-Base-Url` and the `HONCHO_API_URL` env var, and note any security considerations for accepting a header-provided `baseUrl` (or alternatively remove the header-based fallback if you prefer to enforce the original policy) so the comment matches the behavior of the `baseUrl` resolution code.
🧹 Nitpick comments (6)
openspec/specs/observability-langfuse/spec.md (1)
4-4: ⚡ Quick winAddress the TBD placeholder in the Purpose section.
The "TBD:" prefix suggests this section is incomplete. Since this is the main observability-langfuse spec (not archived), the purpose statement should be finalized.
📝 Suggested completion
## Purpose -TBD: Core capabilities for integrating Langfuse LLM observability and tracing telemetry within Honcho. +Defines core capabilities for integrating Langfuse LLM observability and tracing telemetry within Honcho, including generation-type observations, model attribution, and metadata tracking.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openspec/specs/observability-langfuse/spec.md` at line 4, Replace the placeholder "TBD: Core capabilities for integrating Langfuse LLM observability and tracing telemetry within Honcho." in the Purpose section with a finalized, declarative purpose sentence that summarizes the spec (e.g., "Define core capabilities and integration patterns for Langfuse LLM observability and tracing telemetry within Honcho, including required events, tracing spans, configuration, and storage/retention expectations."). Edit the Purpose header content so it no longer begins with "TBD" and clearly states the goal, scope, and intended audience for the observability-langfuse spec.openspec/workspace/memories/leader/lesson_learnt/L-001-langfuse-custom-model-tokens.md (1)
7-26: ⚡ Quick winFix markdown formatting for consistency and readability.
The document has several markdown formatting issues that affect rendering and readability:
- Headings at lines 7, 10, 13 need blank lines below them
- Code fence at lines 17-25 needs blank lines before and after
📝 Proposed formatting fixes
## The Issue + When integrating Langfuse for LLM traces, models like `gpt-4o-mini` or `claude-3-haiku` automatically displayed token usage in the Langfuse dashboard, while custom models (e.g., `qwen/qwen3.5-9b` hosted via LMStudio) showed empty token fields despite successful completions. ## Root Cause + Langfuse features an automatic server-side tokenizer that calculates token usage if it recognizes the `model` identifier. If the model is not recognized (as is the case with custom or local models), Langfuse silently fails to compute tokens unless explicit `usage_details` are pushed by the application SDK. ## The Fix / Lesson + To ensure consistent token logging across **all** models (including custom and local deployments), the SDK must explicitly capture `input_tokens` and `output_tokens` from the underlying client response and report them via the Langfuse generation active observer. Example Implementation: + ```python if usage_data: langfuse.get_client().update_current_generation( usage_details={ "input": input_tokens, "output": output_tokens } )
Always explicitly pass usage to Langfuse to guarantee observability parity regardless of backend.
</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In
@openspec/workspace/memories/leader/lesson_learnt/L-001-langfuse-custom-model-tokens.md
around lines 7 - 26, The markdown has rendering issues: add a blank line after
each top-level heading ("The Issue", "Root Cause", "The Fix / Lesson") and
ensure the example code block is a proper fenced block with a language tag and
blank lines before and after it; specifically fix the example so the fenced
block contains the python snippet that calls
langfuse.get_client().update_current_generation(... usage_details={ "input":
input_tokens, "output": output_tokens }) and make sure the closing triple
backticks are present and followed by a blank line before the final sentence
"Always explicitly pass usage to Langfuse..." so headings, fences, and spacing
render correctly.</details> </blockquote></details> <details> <summary>mcp/src/config.ts (1)</summary><blockquote> `50-50`: _💤 Low value_ **Unsafe type cast accessing process.env.** The cast `(globalThis as any).process?.env?.HONCHO_API_URL` bypasses type safety. While the optional chaining prevents runtime errors, TypeScript won't catch if the access pattern changes. <details> <summary>♻️ Proposed type-safe alternative</summary> ```diff - baseUrl: request.headers.get("X-Honcho-Base-Url")?.trim() || env?.HONCHO_API_URL || (globalThis as any).process?.env?.HONCHO_API_URL || "https://api.honcho.dev", + baseUrl: + request.headers.get("X-Honcho-Base-Url")?.trim() || + env?.HONCHO_API_URL || + (typeof process !== 'undefined' ? process.env?.HONCHO_API_URL : undefined) || + "https://api.honcho.dev",This uses a typeof guard instead of casting globalThis.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mcp/src/config.ts` at line 50, The baseUrl property in config.ts uses an unsafe cast "(globalThis as any).process?.env?.HONCHO_API_URL"; replace that cast with a type-safe runtime guard that checks for existence and type (e.g. typeof (globalThis as any).process === 'object' and typeof (globalThis as any).process.env === 'object') before reading HONCHO_API_URL, and fall back to the existing env?.HONCHO_API_URL or default; update the baseUrl expression to use that checked access instead of the direct cast.mcp/run-mocked.ts (1)
22-28: ⚡ Quick winConsider preserving type safety for the fetch handler.
The
as anycast on line 25 bypasses type checking for the fetch handler signature. While expedient, this could mask incompatibilities between the imported module's fetch signature and Bun.serve's expectations.♻️ Proposed type-safe alternative
import("./src/index.ts").then((module) => { const port = process.env.HONCHO_MCP_PORT ? Number(process.env.HONCHO_MCP_PORT) : 8787; + + // Explicitly type the fetch handler + const fetchHandler: (request: Request) => Response | Promise<Response> = + (req) => module.default.fetch(req, {} as any, {} as any); + Bun.serve({ - fetch: module.default.fetch as any, + fetch: fetchHandler, port: port, });Note: The Cloudflare Worker fetch signature includes
envandexecutionCtxparameters that aren't needed in the Bun mock environment, so we pass empty objects for those.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mcp/run-mocked.ts` around lines 22 - 28, The current cast "module.default.fetch as any" hides type errors; instead create a type-safe wrapper that matches Bun.serve's expected fetch signature and calls module.default.fetch with dummy env/executionCtx parameters: implement a small function (e.g., const bunFetch = (req: Request) => module.default.fetch(req, {}, {})) and pass bunFetch to Bun.serve (replace module.default.fetch as any). This preserves type safety for Bun.serve while accommodating Cloudflare-style fetch handlers that expect (request, env, executionCtx).src/llm/nous_refresh.py (1)
17-17: 💤 Low valueUnused import.
asynciois imported but never referenced — onlyhttpx.AsyncClientis used for async I/O.🛠️ Remove unused import
-import asyncio import json import logging import os🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/nous_refresh.py` at line 17, Remove the unused asyncio import from the top of src/llm/nous_refresh.py: delete the line importing asyncio since only httpx.AsyncClient is used for async I/O (there are no asyncio.* references in this module), and run the linter/pytest to confirm no breakages.tests/llm/test_fallback_integration.py (1)
169-198: 💤 Low valueConsider making the exception assertion more specific.
The test currently catches the base
Exceptionclass. While this works for verifying that an error is raised, being more specific would improve the test's precision.📋 Optional improvement to exception assertion
You could assert that the raised exception has a
status_codeattribute or is an instance of a specific error type:with pytest.raises(Exception): await honcho_llm_call(Could be:
- with pytest.raises(Exception): + with pytest.raises(Exception) as exc_info: await honcho_llm_call( model_config=config, prompt="test prompt", max_tokens=100, enable_retry=True, retry_attempts=2, ) + # Verify it's a retryable error + assert hasattr(exc_info.value, 'status_code')However, the current implementation is acceptable for an integration test focused on error propagation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/llm/test_fallback_integration.py` around lines 169 - 198, The test test_both_primary_and_fallback_fail is too broad by using pytest.raises(Exception); update it to assert a more specific failure from honcho_llm_call: either replace pytest.raises(Exception) with a concrete error type your code raises (e.g., BackendError/HTTPError) or capture the exception via "with pytest.raises(Exception) as excinfo:" and then assert on excinfo.value.status_code (or another specific attribute) to ensure the raised error comes from the FailingBackend/fallback path; reference honcho_llm_call and FailingBackend when making the assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CLAUDE.md`:
- Line 321: Update the misleading example in the line that currently reads
"Non-retryable errors (400, 200, ValueError, etc.)..." by removing the 200
status and replacing it with appropriate non-success examples (e.g., 4xx codes
such as 400 or 404) or rewording to "Non-retryable errors (e.g., 400, 404,
ValueError)" so the phrase "Non-retryable errors" no longer lists a 200 success
code; locate the exact sentence "Non-retryable errors (400, 200, ValueError,
etc.)..." and change it accordingly.
In `@mcp/Dockerfile`:
- Around line 1-24: The image currently runs as root; create a non-root user and
switch to it before the final CMD to satisfy least-privilege checks. Add steps
in the Dockerfile after copying files and before CMD to create a user/group
(e.g., appuser), chown the /app directory to that user, set HOME and USER
environment variables as needed, and add a USER appuser line so the process (the
CMD invoking npx wrangler dev) runs as the non-root account instead of root;
ensure the created user exists in both the builder and runtime stages or that
ownership is adjusted in the runtime stage where CMD is defined.
- Line 24: The Dockerfile currently uses CMD with "npx wrangler dev ..." which
is a dev-only pattern; replace that CMD to invoke the Bun-native mock server
provided in this PR (run-mocked.ts) using the Bun runtime and pass
HONCHO_API_URL via the container environment instead of wrapping it into a --var
flag; locate the CMD line that mentions "wrangler dev" and change it to run the
Bun runtime for run-mocked.ts (or alternately switch to a production path such
as using "wrangler deploy" or a compiled bundle) so the image runs the
production-ready server instead of wrangler dev.
In
`@openspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/explorations/2026-05-05-langfuse-generation-observations-openspec-gap.md`:
- Line 126: Fix the typo in the "Next command" sentence: change the word
"approplies" to "applies" in the line that reads "**Next command**:
**`/opsx-propose`** (or workspace equivalent) when Commander approplies change
name and scope boundaries." so it reads "...when Commander applies change name
and scope boundaries." and ensure the corrected sentence preserves the bolding
and command formatting for "/opsx-propose".
In
`@openspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/tasks.md`:
- Around line 18-22: Update the verification command example so Python-invoked
tooling is run via the project's virtual env: replace the line containing
"openspec validate --changes --no-interactive --json" with "uv run openspec
validate --changes --no-interactive --json" in the snippet in tasks.md; leave
non-Python tools like the rg commands unchanged so contributors consistently use
uv for Python commands.
In `@openspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/design.md`:
- Around line 49-53: The doc lists incorrect env var prefixes for Dialectic and
Dreamer fallbacks—operators will set ignored keys like
DIALECTIC_MODEL_CONFIG__... and DREAM_MODEL_CONFIG__...; update the examples to
the real keys: for Dialectic use per-level prefixes
DIALECTIC_LEVELS__<level>__MODEL_CONFIG__FALLBACK__FALLBACK__TRANSPORT and
DIALECTIC_LEVELS__<level>__MODEL_CONFIG__FALLBACK__FALLBACK__MODEL (replace
<level> with the level name/number), and for Dreamer split the examples into
DREAM_DEDUCTION_MODEL_CONFIG__FALLBACK__FALLBACK__TRANSPORT /
DREAM_DEDUCTION_MODEL_CONFIG__FALLBACK__FALLBACK__MODEL and
DREAM_INDUCTION_MODEL_CONFIG__FALLBACK__FALLBACK__TRANSPORT /
DREAM_INDUCTION_MODEL_CONFIG__FALLBACK__FALLBACK__MODEL so the documented
FALLBACK__TRANSPORT and FALLBACK__MODEL keys map to the actual config keys.
In
`@openspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/proposal.md`:
- Around line 3-8: The proposal incorrectly prescribes global
FALLBACK_MODEL_CONFIG__TRANSPORT and FALLBACK_MODEL_CONFIG__MODEL variables
while the codebase uses per-agent (and per-level for Dialectic) fallback
settings; update the proposal text to reference and instruct configuring the
per-agent fallback keys (e.g., Deriver fallback, Dialectic per-level fallback,
Dreamer fallback, Summary fallback) instead of global FALLBACK_* vars and
include guidance on configuring each agent's specific config keys and rollout
steps to match the actual settings surface used by ModelConfig.fallback.
In
`@openspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/specs/llm-model-fallback/spec.md`:
- Around line 11-12: Update the env var names in the spec to match the actual
configuration shape: replace uses of DIALECTIC_MODEL_CONFIG__FALLBACK__TRANSPORT
and DIALECTIC_MODEL_CONFIG__FALLBACK__MODEL with the per-level Dialectic
fallback keys (e.g., DIALECTIC__<LEVEL>__FALLBACK__TRANSPORT and
DIALECTIC__<LEVEL>__FALLBACK__MODEL) and replace DREAM_MODEL_CONFIG__... with
the Dream deduction/induction keys (e.g., DREAM__DEDUCTION__MODEL /
DREAM__INDUCTION__MODEL and corresponding TRANSPORT keys); apply the same naming
corrections to the other occurrences referenced (lines 15-16) so the spec
matches the executable config shape.
In `@openspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/tasks.md`:
- Around line 12-13: The checklist items reference outdated filenames
(.env.example and docker-compose/honcho/.env); update the task lines in tasks.md
to the current file paths used by this change (replace `.env.example` with the
actual env file name used by the repo and replace `docker-compose/honcho/.env`
with the correct docker/ honcho env path), and mirror those same updates for the
corresponding entries mentioned on lines 18-20 so all checklist items point to
the real files for accurate traceability.
In `@openspec/specs/llm-model-fallback/spec.md`:
- Line 4: Replace the placeholder Purpose text "TBD - created by archiving
change llm-model-fallback-2026-05-14" in the spec's Purpose section with a
concise, descriptive statement of intent: describe the goal of the
llm-model-fallback spec, the problem it addresses, intended users/consumers, and
the expected behavior or outcomes (e.g., how fallback selection should work,
criteria for model choice, and any constraints). Ensure the Purpose is a few
clear sentences that make the spec discoverable and owned rather than leaving
the archived placeholder.
- Around line 14-15: The spec currently references non-existent top-level env
keys (DIALECTIC_MODEL_CONFIG__FALLBACK__TRANSPORT and
DIALECTIC_MODEL_CONFIG__FALLBACK__MODEL); update the normative scenarios to use
the implemented per-level Dialectic fallback keys (e.g.,
DIALECTIC_MODEL_CONFIG__<LEVEL>__FALLBACK__TRANSPORT and
DIALECTIC_MODEL_CONFIG__<LEVEL>__FALLBACK__MODEL) and adjust Dream agent cases
to use separate deduction/induction fallback keys (e.g.,
DREAM_MODEL_CONFIG__DEDUCTION__FALLBACK__* and
DREAM_MODEL_CONFIG__INDUCTION__FALLBACK__*) so the spec matches the actual
configuration shape and avoids invalid operator configuration.
In `@openspec/specs/mcp-qa-verification/spec.md`:
- Line 4: Replace the placeholder "TBD: Verification protocols and requirements
for testing Honcho MCP tools." in the Purpose section with a concise purpose
statement that summarizes the spec's intent (e.g., describe that this document
defines verification protocols, E2E tool verification criteria, data isolation
requirements, and acceptance criteria for Honcho MCP tools), ensuring the new
text appears under the Purpose header and clearly frames the requirements that
follow.
In `@src/llm/backends/openai.py`:
- Around line 117-141: The _call_with_autorefresh helper is implemented but
never used while its logic is duplicated three times; replace the duplicated
401-retry blocks in the parse, create, and stream code paths with calls to
_call_with_autorefresh(use_parse=..., params=...) so all retries go through that
single function, and inside _call_with_autorefresh change the silent "except
Exception: pass" around settings.LLM.NOUS_API_KEY = new_key to catch the
exception and log it (e.g., logger.exception or logger.error with context)
before continuing; ensure the helper still re-attempts the request after setting
self._client.api_key and returns the retried result or re-raises the original
AuthenticationError if refresh fails.
- Around line 8-11: The code imports AsyncOpenAI and AuthenticationError but
misses the exceptions used in the structured-output parsing paths; add imports
for BadRequestError and LengthFinishReasonError from the openai package so the
exception handlers that reference LengthFinishReasonError and BadRequestError
(used around the structured-output parsing logic that follows AsyncOpenAI usage)
will work correctly instead of raising NameError.
In `@src/llm/nous_refresh.py`:
- Around line 130-148: The function update_env_key currently returns early when
env_path doesn't exist, which breaks tests and first-run behavior; change
update_env_key to create the .env file instead of skipping: if not
env_path.exists() then create parent directories as needed, initialize lines to
an empty list (or include a leading newline if you want separation), and proceed
to append/write "LLM_NOUS_API_KEY={new_key}\n" (preserving the existing logic
for updating when the key exists); ensure you still call
env_path.write_text("".join(lines)) and logger.info("Updated .env with new Nous
API key") so tests and runtime see the file created.
- Around line 226-234: The code silently fails importing settings because it
uses `from honcho.config import settings`; replace that import with `from
src.config import settings` in src/llm/nous_refresh.py so the in-memory update
to settings.LLM.NOUS_API_KEY actually runs, and change the bare except to catch
ImportError (or log the exception) so failures aren't silently swallowed; keep
the logger.info("In-memory settings.LLM.NOUS_API_KEY updated") on success and
log a debug or warning with the exception on failure.
In `@tests/llm/test_nous_refresh.py`:
- Around line 22-25: The test test_load_state_missing is flaky because
load_state() reads NOUS_REFRESH_TOKEN from the environment; make the test
hermetic by deleting that env var before calling load_state: in
test_load_state_missing use the pytest monkeypatch fixture to call
monkeypatch.delenv("NOUS_REFRESH_TOKEN", raising=False) (keep the existing
patch.object(Path, "exists", return_value=False)) so the function cannot see an
external NOUS_REFRESH_TOKEN and the assertion load_state() == {} becomes
reliable.
- Around line 78-88: The test test_find_project_root_with_dotenv is dead code
with no assertions; replace the patch/pass block with a real exercise of
_find_project_root: change the working directory into the deep directory (use
monkeypatch.chdir(deep) or equivalent), call
src.llm.nous_refresh._find_project_root() and assert the returned Path equals
tmp_path (the directory containing the created .env), or if you prefer remove
the test entirely; update references to the test name and _find_project_root to
ensure the test either asserts correct behavior or is deleted to avoid
misleading coverage.
- Around line 155-158: Move the three delayed imports into the module import
block at the top of the test file: relocate "from types import SimpleNamespace",
"from unittest.mock import Mock", and "import sys" next to the other imports so
they're available before tests reference SimpleNamespace, Mock, and sys (used in
tests around usages of SimpleNamespace and Mock); remove the duplicate or
trailing imports at L156-L158 to keep import order clear and consistent.
---
Outside diff comments:
In `@mcp/src/config.ts`:
- Around line 19-23: The comment claiming the Honcho baseUrl is "intentionally
not exposed as a request header" is out of sync with the code that reads the
X-Honcho-Base-Url header first when resolving baseUrl; update the comment in
mcp/src/config.ts to describe the current multi-source resolution order
(X-Honcho-Base-Url header → HONCHO_API_URL env var → default), mention the
header name `X-Honcho-Base-Url` and the `HONCHO_API_URL` env var, and note any
security considerations for accepting a header-provided `baseUrl` (or
alternatively remove the header-based fallback if you prefer to enforce the
original policy) so the comment matches the behavior of the `baseUrl` resolution
code.
---
Nitpick comments:
In `@mcp/run-mocked.ts`:
- Around line 22-28: The current cast "module.default.fetch as any" hides type
errors; instead create a type-safe wrapper that matches Bun.serve's expected
fetch signature and calls module.default.fetch with dummy env/executionCtx
parameters: implement a small function (e.g., const bunFetch = (req: Request) =>
module.default.fetch(req, {}, {})) and pass bunFetch to Bun.serve (replace
module.default.fetch as any). This preserves type safety for Bun.serve while
accommodating Cloudflare-style fetch handlers that expect (request, env,
executionCtx).
In `@mcp/src/config.ts`:
- Line 50: The baseUrl property in config.ts uses an unsafe cast "(globalThis as
any).process?.env?.HONCHO_API_URL"; replace that cast with a type-safe runtime
guard that checks for existence and type (e.g. typeof (globalThis as
any).process === 'object' and typeof (globalThis as any).process.env ===
'object') before reading HONCHO_API_URL, and fall back to the existing
env?.HONCHO_API_URL or default; update the baseUrl expression to use that
checked access instead of the direct cast.
In `@openspec/specs/observability-langfuse/spec.md`:
- Line 4: Replace the placeholder "TBD: Core capabilities for integrating
Langfuse LLM observability and tracing telemetry within Honcho." in the Purpose
section with a finalized, declarative purpose sentence that summarizes the spec
(e.g., "Define core capabilities and integration patterns for Langfuse LLM
observability and tracing telemetry within Honcho, including required events,
tracing spans, configuration, and storage/retention expectations."). Edit the
Purpose header content so it no longer begins with "TBD" and clearly states the
goal, scope, and intended audience for the observability-langfuse spec.
In
`@openspec/workspace/memories/leader/lesson_learnt/L-001-langfuse-custom-model-tokens.md`:
- Around line 7-26: The markdown has rendering issues: add a blank line after
each top-level heading ("The Issue", "Root Cause", "The Fix / Lesson") and
ensure the example code block is a proper fenced block with a language tag and
blank lines before and after it; specifically fix the example so the fenced
block contains the python snippet that calls
langfuse.get_client().update_current_generation(... usage_details={ "input":
input_tokens, "output": output_tokens }) and make sure the closing triple
backticks are present and followed by a blank line before the final sentence
"Always explicitly pass usage to Langfuse..." so headings, fences, and spacing
render correctly.
In `@src/llm/nous_refresh.py`:
- Line 17: Remove the unused asyncio import from the top of
src/llm/nous_refresh.py: delete the line importing asyncio since only
httpx.AsyncClient is used for async I/O (there are no asyncio.* references in
this module), and run the linter/pytest to confirm no breakages.
In `@tests/llm/test_fallback_integration.py`:
- Around line 169-198: The test test_both_primary_and_fallback_fail is too broad
by using pytest.raises(Exception); update it to assert a more specific failure
from honcho_llm_call: either replace pytest.raises(Exception) with a concrete
error type your code raises (e.g., BackendError/HTTPError) or capture the
exception via "with pytest.raises(Exception) as excinfo:" and then assert on
excinfo.value.status_code (or another specific attribute) to ensure the raised
error comes from the FailingBackend/fallback path; reference honcho_llm_call and
FailingBackend when making the assertion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dabf06a1-1d82-4339-92bf-950b96274ba4
⛔ Files ignored due to path filters (1)
mcp/bun.lockis excluded by!**/*.lock
📒 Files selected for processing (57)
.dockerignore.env.templateCLAUDE.mdmcp/.dockerignoremcp/Dockerfilemcp/package.jsonmcp/run-mocked.tsmcp/src/config.tsmcp/src/index.tsopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/.openspec.yamlopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/design.mdopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/explorations/2026-05-05-nested-langfuse-spans.mdopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/proposal.mdopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/specs/.gitkeepopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/specs/observability-langfuse/spec.mdopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/tasks.mdopenspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/.openspec.yamlopenspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/design.mdopenspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/explorations/2026-05-05-langfuse-generation-observations-openspec-gap.mdopenspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/proposal.mdopenspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/specs/observability-langfuse/spec.mdopenspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/tasks.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/.openspec.yamlopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/design.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/2026-05-07-honcho-mcp-architecture.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/2026-05-07-honcho-philosophy-and-usecases.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/mcp-verification-report.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/proposal.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/specs/mcp-qa-verification/spec.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/tasks.mdopenspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/design.mdopenspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/proposal.mdopenspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/specs/llm-model-fallback/spec.mdopenspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/tasks.mdopenspec/specs/llm-model-fallback/spec.mdopenspec/specs/mcp-qa-verification/spec.mdopenspec/specs/observability-langfuse/spec.mdopenspec/workspace/journal/2026-05-05/agent_share_afternoon.mdopenspec/workspace/memories/leader/2026-05-05/archive_01.mdopenspec/workspace/memories/leader/2026-05-05/current.mdopenspec/workspace/memories/leader/lesson_learnt/L-001-langfuse-custom-model-tokens.mdopenspec/workspace/sessions/agent_share.mdsrc/config.pysrc/llm/api.pysrc/llm/backends/openai.pysrc/llm/credentials.pysrc/llm/nous_refresh.pysrc/llm/registry.pysrc/llm/runtime.pysrc/llm/tool_loop.pysrc/telemetry/logging.pysrc/utils/summarizer.pytests/llm/test_backends/test_nous_autorefresh.pytests/llm/test_fallback.pytests/llm/test_fallback_integration.pytests/llm/test_nous_refresh.pytests/utils/test_clients.py
| - `OSError` | ||
| - SDK-specific: `APIConnectionError`, `APITimeoutError`, `InternalServerError`, `ServiceUnavailableError`, `RateLimitError` | ||
|
|
||
| Non-retryable errors (400, 200, ValueError, etc.) do NOT trigger fallback — they follow normal retry behavior. |
There was a problem hiding this comment.
Clarify success vs error semantics in non-retryable list.
Including 200 in the non-retryable error examples is misleading because a 200 response is successful and should not enter retry/error handling.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@CLAUDE.md` at line 321, Update the misleading example in the line that
currently reads "Non-retryable errors (400, 200, ValueError, etc.)..." by
removing the 200 status and replacing it with appropriate non-success examples
(e.g., 4xx codes such as 400 or 404) or rewording to "Non-retryable errors
(e.g., 400, 404, ValueError)" so the phrase "Non-retryable errors" no longer
lists a 200 success code; locate the exact sentence "Non-retryable errors (400,
200, ValueError, etc.)..." and change it accordingly.
| # --- Stage 1: Lightning Fast Dependencies --- | ||
| FROM oven/bun:1 AS builder | ||
| WORKDIR /app | ||
|
|
||
| # Copy configuration and use Bun's superior async network stack to bypass WSL MTU hangs | ||
| COPY package.json ./ | ||
| RUN bun install | ||
|
|
||
| # Copy source code | ||
| COPY . . | ||
|
|
||
| # --- Stage 2: Node Native Runtime --- | ||
| FROM node:22 | ||
| WORKDIR /app | ||
|
|
||
| # Ingest all the raw files (including node_modules) built gracefully by Bun | ||
| COPY --from=builder /app /app | ||
|
|
||
| # Setup environment | ||
| ENV CI=true | ||
|
|
||
| # Revert boot command to native NPM/Node context to bypass Wrangler's anti-Bun blocker | ||
| # Crucially, we wrap it in a shell to inject the Docker container's environment variable into Wrangler's isolated sandbox | ||
| CMD ["sh", "-c", "npx wrangler dev src/index.ts --port 8787 --ip 0.0.0.0 --var HONCHO_API_URL:$HONCHO_API_URL"] |
There was a problem hiding this comment.
Add a non-root user for security.
The container runs as root, which violates the principle of least privilege and increases attack surface. Trivy static analysis flagged this as DS-0002.
🔒 Proposed fix
Add these lines before the final CMD:
# Setup environment
ENV CI=true
+# Create non-root user
+RUN groupadd -r mcpuser && useradd -r -g mcpuser mcpuser && \
+ chown -R mcpuser:mcpuser /app
+USER mcpuser
+
# Revert boot command to native NPM/Node context to bypass Wrangler's anti-Bun blocker🧰 Tools
🪛 Trivy (0.69.3)
[error] 1-1: Image user should not be 'root'
Specify at least 1 USER command in Dockerfile with non-root user as argument
Rule: DS-0002
(IaC/Dockerfile)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mcp/Dockerfile` around lines 1 - 24, The image currently runs as root; create
a non-root user and switch to it before the final CMD to satisfy least-privilege
checks. Add steps in the Dockerfile after copying files and before CMD to create
a user/group (e.g., appuser), chown the /app directory to that user, set HOME
and USER environment variables as needed, and add a USER appuser line so the
process (the CMD invoking npx wrangler dev) runs as the non-root account instead
of root; ensure the created user exists in both the builder and runtime stages
or that ownership is adjusted in the runtime stage where CMD is defined.
|
|
||
| # Revert boot command to native NPM/Node context to bypass Wrangler's anti-Bun blocker | ||
| # Crucially, we wrap it in a shell to inject the Docker container's environment variable into Wrangler's isolated sandbox | ||
| CMD ["sh", "-c", "npx wrangler dev src/index.ts --port 8787 --ip 0.0.0.0 --var HONCHO_API_URL:$HONCHO_API_URL"] |
There was a problem hiding this comment.
Using wrangler dev in production is a critical deployment anti-pattern.
The CMD runs wrangler dev, which is explicitly designed for local development with hot-reload, verbose logging, and no production optimizations. For Docker deployments, you should either:
- Use
wrangler deploy(if targeting Cloudflare Workers), or - Compile a standalone bundle using
bun build --compileor similar, or - Use the native Bun runtime via
run-mocked.ts(which this PR already provides)
The shell wrapper workaround to inject $HONCHO_API_URL into --var further suggests this isn't the intended deployment path.
🚀 Proposed fix: Use the Bun native runtime
Replace line 24 with:
-CMD ["sh", "-c", "npx wrangler dev src/index.ts --port 8787 --ip 0.0.0.0 --var HONCHO_API_URL:$HONCHO_API_URL"]
+CMD ["bun", "run", "run-mocked.ts"]This uses the native Bun mock server already provided in this PR, which is production-ready and doesn't require Wrangler's dev mode.
Alternatively, if Node compatibility is required:
-CMD ["sh", "-c", "npx wrangler dev src/index.ts --port 8787 --ip 0.0.0.0 --var HONCHO_API_URL:$HONCHO_API_URL"]
+CMD ["node", "--import", "tsx", "run-mocked.ts"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mcp/Dockerfile` at line 24, The Dockerfile currently uses CMD with "npx
wrangler dev ..." which is a dev-only pattern; replace that CMD to invoke the
Bun-native mock server provided in this PR (run-mocked.ts) using the Bun runtime
and pass HONCHO_API_URL via the container environment instead of wrapping it
into a --var flag; locate the CMD line that mentions "wrangler dev" and change
it to run the Bun runtime for run-mocked.ts (or alternately switch to a
production path such as using "wrangler deploy" or a compiled bundle) so the
image runs the production-ready server instead of wrangler dev.
| ## 10. Status | ||
|
|
||
| **Exploration**: Closed with **decision to formalize via OpenSpec proposal** (recommended Option A). | ||
| **Next command**: **`/opsx-propose`** (or workspace equivalent) when Commander approplies change name and scope boundaries. |
There was a problem hiding this comment.
Fix typo in the next-command sentence.
Line 126 contains approplies, which appears to be a typo. Please correct it to improve command guidance readability.
🧰 Tools
🪛 LanguageTool
[grammar] ~126-~126: Ensure spelling is correct
Context: ...or workspace equivalent) when Commander approplies change name and scope boundaries.
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@openspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/explorations/2026-05-05-langfuse-generation-observations-openspec-gap.md`
at line 126, Fix the typo in the "Next command" sentence: change the word
"approplies" to "applies" in the line that reads "**Next command**:
**`/opsx-propose`** (or workspace equivalent) when Commander approplies change
name and scope boundaries." so it reads "...when Commander applies change name
and scope boundaries." and ensure the corrected sentence preserves the bolding
and command formatting for "/opsx-propose".
| openspec validate --changes --no-interactive --json | ||
| rg "@conditional_observe\(name=\"LLM Call\", as_type=\"generation\"\)" src/llm/api.py | ||
| rg "update_current_generation\(" src/llm/runtime.py | ||
| rg "update_current_generation\.assert_called_once_with" tests/utils/test_clients.py | ||
| ``` |
There was a problem hiding this comment.
Use uv run in verification command examples.
Line 18 currently shows openspec validate ...; use uv run openspec validate ... (and keep Python-invoked tooling under uv) so contributors run the documented checks in the project environment consistently.
Based on learnings: Always use 'uv run' or 'uv' prefix for Python commands to ensure virtual environment usage.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@openspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/tasks.md`
around lines 18 - 22, Update the verification command example so Python-invoked
tooling is run via the project's virtual env: replace the line containing
"openspec validate --changes --no-interactive --json" with "uv run openspec
validate --changes --no-interactive --json" in the snippet in tasks.md; leave
non-Python tools like the rg commands unchanged so contributors consistently use
uv for Python commands.
| # Helper for SimpleNamespace | ||
| from types import SimpleNamespace | ||
| from unittest.mock import Mock | ||
| import sys |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Imports placed at end of file — confusing and works only by accident.
SimpleNamespace, Mock, and sys are used at L112, L114, L119, L121, L139, L140, but imported at L156-158. Python tolerates this because module-level code executes before any test function runs, but it makes the file very hard to scan and will trip readers expecting standard import order at the top. Move these next to the other imports.
🛠️ Move to the imports block at top of file
import json
import os
+import sys
from pathlib import Path
-from unittest.mock import AsyncMock, patch
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, Mock, patch
import pytest
...
-# Helper for SimpleNamespace
-from types import SimpleNamespace
-from unittest.mock import Mock
-import sys📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Helper for SimpleNamespace | |
| from types import SimpleNamespace | |
| from unittest.mock import Mock | |
| import sys | |
| import json | |
| import os | |
| import sys | |
| from pathlib import Path | |
| from types import SimpleNamespace | |
| from unittest.mock import AsyncMock, Mock, patch | |
| import pytest |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/llm/test_nous_refresh.py` around lines 155 - 158, Move the three
delayed imports into the module import block at the top of the test file:
relocate "from types import SimpleNamespace", "from unittest.mock import Mock",
and "import sys" next to the other imports so they're available before tests
reference SimpleNamespace, Mock, and sys (used in tests around usages of
SimpleNamespace and Mock); remove the duplicate or trailing imports at L156-L158
to keep import order clear and consistent.
da417b2 to
3a03615
Compare
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/specs/mcp-qa-verification/spec.md (1)
11-16:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAlign cleanup requirement with non-deletable peer behavior.
Line 11 says “All test data MUST be deleted,” but the scenario on Line 15 only requires deleting sessions/conclusions. Please explicitly exclude non-deletable peer records or require peer tagging + archival-safe retention.
📝 Proposed wording
-The verification process MUST operate exclusively on test-specific entities (e.g. peers prefixed with `test-qa-`, isolated sessions). It MUST NOT mutate or delete existing production data. All test data MUST be deleted at the end. +The verification process MUST operate exclusively on test-specific entities (e.g. peers prefixed with `test-qa-`, isolated sessions). It MUST NOT mutate or delete existing production data. All deletable test entities (at minimum sessions and conclusions) MUST be deleted at the end.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/specs/mcp-qa-verification/spec.md` around lines 11 - 16, Update the "Test cleanup" requirement to exclude non-deletable peer records: change the blanket "All test data MUST be deleted" to a constrained rule that deletes all test-specific entities (e.g., sessions, conclusions, resources prefixed with `test-qa-`) while explicitly excluding immutable/non-deletable peer records, and add a requirement that peers that cannot be deleted must be tagged as test artifacts and moved to archival-safe retention (or flagged for retention) instead of deletion; reference the "Test cleanup" scenario, the phrase "All test data MUST be deleted", and the peer prefix `test-qa-` when making the wording change.openspec/workspace/memories/leader/2026-05-05/archive_01.md (1)
8-29:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd blank lines after section headings to satisfy markdownlint.
Line 8, Line 16, Line 20, Line 23, and Line 26 need one blank line below each heading (MD022).
Suggested patch
## What Was Done + 1. Conducted an investigation into Langfuse telemetry for Honcho LLM traces, focusing on missing model metadata in the dashboard for background tasks (`Dialectic Agent`, `Minimal Deriver`). ... ## Decisions Made + - Added explicit `usage_details` to the active generation instead of relying on Langfuse's server-side auto-computation, ensuring consistent observability for custom models. ... ## Blockers + - None. ## Next Steps + - Await next user instructions for any subsequent capabilities or changes. ## Key Learnings + - Container processes caching Python modules must be explicitly restarted during active QA.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openspec/workspace/memories/leader/2026-05-05/archive_01.md` around lines 8 - 29, Add a single blank line after each top-level section heading in the markdown to satisfy markdownlint MD022: insert one empty line immediately after the headings "## What Was Done", "## Decisions Made", "## Blockers", "## Next Steps", and "## Key Learnings" so each heading is followed by a blank line before the content.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/2026-05-07-honcho-philosophy-and-usecases.md`:
- Around line 321-326: The document excerpt leaks real participant identifiers
(e.g., "hermes" and "phuong_lambert") in the discussion of the peer_id vs
target_peer_id paradigm; redact these by replacing actual names with sanitized
placeholders (e.g., "<participant_A>", "<participant_B>") throughout the
paragraph and the four listed collection examples, and ensure any other
occurrences of peer_id and target_peer_id examples use the sanitized
placeholders consistently so committed history contains no real identifiers.
In
`@openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/mcp-verification-report.md`:
- Around line 18-35: The verification artifact contains real environment
identifiers; replace sensitive values in the JSON fields workspace_id,
peer_count, peers (each object id values) and session_count, sessions (each id
values) with synthetic placeholders (e.g. "test-workspace", "test-peer-1", ...,
"session-1", and matching counts) in the mcp verification report so the peers
array entries like {"id": "phuong_lambert"} and {"id": "hermes"} and sessions
like {"id": "hermes-agent"} are sanitized; ensure peer_count and session_count
are updated to reflect the new fixture counts and keep metadata/configuration
objects unchanged.
In `@openspec/specs/mcp-qa-verification/spec.md`:
- Around line 16-20: Update the "verification process" and "Scenario: Test
cleanup" wording to enumerate exactly which test entities must be deleted and
how deletion is verified: replace the vague "all test data" with a concrete list
(e.g., peers prefixed with `test-qa-`, isolated sessions, test conclusions,
test-only artifacts) and add an explicit verification step that these listed
entities no longer exist (e.g., "THEN a discovery query returns zero peers
matching prefix `test-qa-` and no sessions or conclusions with the test flag are
present"); ensure the Scenario includes both the deletion action and an
objective check (query or API result) for each entity type so compliance is
programmatically testable.
In `@openspec/specs/observability-langfuse/spec.md`:
- Around line 3-4: Replace the placeholder purpose under the "## Purpose"
heading (the line starting with "TBD: Core capabilities for integrating
Langfuse...") with a concrete scope statement that summarizes the capability's
intent, key responsibilities, and boundaries (e.g., what telemetry/tracing will
be provided, supported integrations, expected outputs, and non-goals); update
the sentence so it reads as a clear one- or two-sentence scope for the
observability-langfuse spec rather than "TBD."
In `@openspec/workspace/memories/leader/2026-05-05/current.md`:
- Around line 6-17: Add a blank line immediately after each Markdown heading
"Current Status", "Active Focus", and "Critical Context" in the file (these are
the headers in current.md) to satisfy MD022; ensure there is an empty line
between each header line and the following paragraph/list so the sections read:
"Current Status" + blank line + content, "Active Focus" + blank line + content,
and "Critical Context" + blank line + content.
In
`@openspec/workspace/memories/leader/lesson_learnt/L-001-langfuse-custom-model-tokens.md`:
- Around line 7-26: Add missing blank lines to satisfy markdownlint: insert a
blank line after the headings "The Issue", "Root Cause", and "The Fix / Lesson"
and ensure there is a blank line immediately before and after the fenced code
block containing the example update_current_generation usage (the
triple-backtick block around the python example); this will resolve MD022 and
MD031 warnings without changing the content.
In `@src/llm/api.py`:
- Around line 271-275: In the function that classifies retryable HTTP exceptions
(e.g., is_retryable_exception(exc) in src/llm/api.py), extend the HTTP status
check so that HTTP 408 is treated as retryable: update the condition that
currently returns status == 429 or (500 <= status < 600) to also return true for
status == 408 (e.g., status == 429 or status == 408 or (500 <= status < 600));
this ensures a 408 timeout will trigger the fast-failover to the fallback model
on the first retryable error.
- Around line 194-197: The ContextVars current_attempt and force_fallback are
being set (current_attempt.set(1), force_fallback.set(False)) but never
restored, so nested calls (honcho_llm_call) leak state; capture the tokens
returned by .set(...) when you initialize these vars and wrap the call logic in
a try/finally so you reset the tokens in the finally block to restore prior
task-local state; locate the setup in honcho_llm_call (and any early return
paths) and ensure every return path happens after the finally resets the tokens.
Also update _is_retryable_error to treat HTTP status == 408 as retryable in the
same status-code checks (add status == 408 alongside existing timeout/status
checks).
In `@src/llm/nous_refresh.py`:
- Around line 71-80: save_state() currently writes STATE_FILE and its parent
directory with default process umask, leaving them potentially world-readable;
ensure the directory and file are hardened by setting restrictive permissions:
make STATE_FILE.parent directory mode 0o700 (use os.chmod on STATE_FILE.parent
after mkdir) and ensure the temp file (tmp) is created with mode 0o600 (either
create securely via os.open with mode 0o600 or write then call os.chmod(tmp,
0o600)) before calling tmp.replace(STATE_FILE) so the final persisted STATE_FILE
is non-world-readable; reference save_state, STATE_FILE, STATE_FILE.parent and
tmp when applying the changes.
In `@tests/llm/test_fallback_integration.py`:
- Line 191: Replace the broad pytest.raises(Exception) with a specific test
exception: add a small custom exception class (e.g., ExpectedTestError) near the
top of tests/llm/test_fallback_integration.py, update the test to use
pytest.raises(ExpectedTestError) instead of Exception, and ensure the code path
exercised by the test (the function or method under test referenced in the test)
raises ExpectedTestError so the test only catches the intended failure.
In `@tests/llm/test_nous_refresh.py`:
- Around line 110-126: The test creates mock_client but never injects it, so
refresh_nous_credentials() ends up creating real httpx.AsyncClient instances in
refresh_access_token() and mint_agent_key(); before calling
refresh_nous_credentials(), monkeypatch httpx.AsyncClient in the refresh_mod
module to return the prepared mock_client (i.e., set
refresh_mod.httpx.AsyncClient to a lambda/factory that returns mock_client) so
both AsyncClient usages use the mock and the test becomes hermetic.
---
Outside diff comments:
In
`@openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/specs/mcp-qa-verification/spec.md`:
- Around line 11-16: Update the "Test cleanup" requirement to exclude
non-deletable peer records: change the blanket "All test data MUST be deleted"
to a constrained rule that deletes all test-specific entities (e.g., sessions,
conclusions, resources prefixed with `test-qa-`) while explicitly excluding
immutable/non-deletable peer records, and add a requirement that peers that
cannot be deleted must be tagged as test artifacts and moved to archival-safe
retention (or flagged for retention) instead of deletion; reference the "Test
cleanup" scenario, the phrase "All test data MUST be deleted", and the peer
prefix `test-qa-` when making the wording change.
In `@openspec/workspace/memories/leader/2026-05-05/archive_01.md`:
- Around line 8-29: Add a single blank line after each top-level section heading
in the markdown to satisfy markdownlint MD022: insert one empty line immediately
after the headings "## What Was Done", "## Decisions Made", "## Blockers", "##
Next Steps", and "## Key Learnings" so each heading is followed by a blank line
before the content.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b06f5ef3-cb56-43bd-a70c-82681ed629b8
⛔ Files ignored due to path filters (1)
mcp/bun.lockis excluded by!**/*.lock
📒 Files selected for processing (55)
.dockerignore.env.templateCLAUDE.mdmcp/.dockerignoremcp/Dockerfilemcp/package.jsonmcp/run-mocked.tsopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/.openspec.yamlopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/design.mdopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/explorations/2026-05-05-nested-langfuse-spans.mdopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/proposal.mdopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/specs/.gitkeepopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/specs/observability-langfuse/spec.mdopenspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/tasks.mdopenspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/.openspec.yamlopenspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/design.mdopenspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/explorations/2026-05-05-langfuse-generation-observations-openspec-gap.mdopenspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/proposal.mdopenspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/specs/observability-langfuse/spec.mdopenspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/tasks.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/.openspec.yamlopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/design.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/2026-05-07-honcho-mcp-architecture.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/2026-05-07-honcho-philosophy-and-usecases.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/mcp-verification-report.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/proposal.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/specs/mcp-qa-verification/spec.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/tasks.mdopenspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/design.mdopenspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/proposal.mdopenspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/specs/llm-model-fallback/spec.mdopenspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/tasks.mdopenspec/specs/llm-model-fallback/spec.mdopenspec/specs/mcp-qa-verification/spec.mdopenspec/specs/observability-langfuse/spec.mdopenspec/workspace/journal/2026-05-05/agent_share_afternoon.mdopenspec/workspace/memories/leader/2026-05-05/archive_01.mdopenspec/workspace/memories/leader/2026-05-05/current.mdopenspec/workspace/memories/leader/lesson_learnt/L-001-langfuse-custom-model-tokens.mdopenspec/workspace/sessions/agent_share.mdsrc/config.pysrc/llm/api.pysrc/llm/backends/openai.pysrc/llm/credentials.pysrc/llm/nous_refresh.pysrc/llm/registry.pysrc/llm/runtime.pysrc/llm/tool_loop.pysrc/telemetry/logging.pysrc/utils/summarizer.pytests/llm/test_backends/test_nous_autorefresh.pytests/llm/test_fallback.pytests/llm/test_fallback_integration.pytests/llm/test_nous_refresh.pytests/utils/test_clients.py
✅ Files skipped from review due to trivial changes (16)
- openspec/workspace/journal/2026-05-05/agent_share_afternoon.md
- openspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/.openspec.yaml
- openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/.openspec.yaml
- mcp/.dockerignore
- openspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/tasks.md
- openspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/tasks.md
- openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/design.md
- openspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/.openspec.yaml
- openspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/design.md
- openspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/design.md
- .env.template
- openspec/changes/archive/2026-05-15-llm-model-fallback-2026-05-14/tasks.md
- .dockerignore
- CLAUDE.md
- openspec/workspace/sessions/agent_share.md
- openspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/proposal.md
🚧 Files skipped from review as they are similar to previous changes (12)
- mcp/package.json
- tests/utils/test_clients.py
- mcp/Dockerfile
- openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/tasks.md
- mcp/run-mocked.ts
- src/telemetry/logging.py
- tests/llm/test_backends/test_nous_autorefresh.py
- src/config.py
- tests/llm/test_fallback.py
- src/llm/tool_loop.py
- src/llm/registry.py
- src/utils/summarizer.py
| The verification process MUST operate exclusively on test-specific entities (e.g. peers prefixed with `test-qa-`, isolated sessions). It MUST NOT mutate or delete existing production data. All test data MUST be deleted at the end. | ||
|
|
||
| #### Scenario: Test cleanup | ||
| - **WHEN** the QA sequence completes | ||
| - **THEN** all test entities (sessions, conclusions) must be explicitly deleted. |
There was a problem hiding this comment.
Clarify cleanup acceptance criteria to avoid impossible interpretation.
The requirement says all test data must be deleted, but the scenario validates only sessions/conclusions. Please tighten wording so compliance is objectively testable.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 18-18: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@openspec/specs/mcp-qa-verification/spec.md` around lines 16 - 20, Update the
"verification process" and "Scenario: Test cleanup" wording to enumerate exactly
which test entities must be deleted and how deletion is verified: replace the
vague "all test data" with a concrete list (e.g., peers prefixed with
`test-qa-`, isolated sessions, test conclusions, test-only artifacts) and add an
explicit verification step that these listed entities no longer exist (e.g.,
"THEN a discovery query returns zero peers matching prefix `test-qa-` and no
sessions or conclusions with the test flag are present"); ensure the Scenario
includes both the deletion action and an objective check (query or API result)
for each entity type so compliance is programmatically testable.
| ## Purpose | ||
| TBD: Core capabilities for integrating Langfuse LLM observability and tracing telemetry within Honcho. |
There was a problem hiding this comment.
Replace placeholder purpose with a concrete scope statement.
Line 4 still says TBD, which leaves this capability underspecified at the top-level and makes requirement intent harder to scan.
📝 Proposed fix
## Purpose
-TBD: Core capabilities for integrating Langfuse LLM observability and tracing telemetry within Honcho.
+Define Langfuse observability requirements for Honcho LLM calls, including generation-type tracing, model attribution from attempt planning, metadata preservation, and summarization trace shape.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## Purpose | |
| TBD: Core capabilities for integrating Langfuse LLM observability and tracing telemetry within Honcho. | |
| ## Purpose | |
| Define Langfuse observability requirements for Honcho LLM calls, including generation-type tracing, model attribution from attempt planning, metadata preservation, and summarization trace shape. |
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 3-3: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@openspec/specs/observability-langfuse/spec.md` around lines 3 - 4, Replace
the placeholder purpose under the "## Purpose" heading (the line starting with
"TBD: Core capabilities for integrating Langfuse...") with a concrete scope
statement that summarizes the capability's intent, key responsibilities, and
boundaries (e.g., what telemetry/tracing will be provided, supported
integrations, expected outputs, and non-goals); update the sentence so it reads
as a clear one- or two-sentence scope for the observability-langfuse spec rather
than "TBD."
| ## Current Status | ||
| - Successfully archived `honcho-langfuse-generation-traces`. | ||
| - Tracing infrastructure is fully operational. Both API and background workers (Deriver) are logging `GENERATION` traces correctly. | ||
| - Custom models (e.g. `lmstudio` `qwen3.5`) are actively reporting explicit `usage_details` (tokens) to Langfuse. | ||
| - Main specifications for `observability-langfuse` have been established and updated in the project repository. | ||
|
|
||
| ## Active Focus | ||
| - Standing by for new OpenSpec initiatives or orchestration tasks. | ||
|
|
||
| ## Critical Context | ||
| - Langfuse configuration relies on explicitly passing `as_type="generation"` to the `@conditional_observe` decorator in `honcho_llm_call`. | ||
| - Explicit `usage_details` must be provided to Langfuse active generation object to track tokens for custom models reliably. |
There was a problem hiding this comment.
Insert blank lines below headings for MD022 compliance.
Line 6, Line 12, and Line 15 should each be followed by a blank line.
Suggested patch
## Current Status
+
- Successfully archived `honcho-langfuse-generation-traces`.
...
## Active Focus
+
- Standing by for new OpenSpec initiatives or orchestration tasks.
## Critical Context
+
- Langfuse configuration relies on explicitly passing `as_type="generation"` to the `@conditional_observe` decorator in `honcho_llm_call`.🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 6-6: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 12-12: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 15-15: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@openspec/workspace/memories/leader/2026-05-05/current.md` around lines 6 -
17, Add a blank line immediately after each Markdown heading "Current Status",
"Active Focus", and "Critical Context" in the file (these are the headers in
current.md) to satisfy MD022; ensure there is an empty line between each header
line and the following paragraph/list so the sections read: "Current Status" +
blank line + content, "Active Focus" + blank line + content, and "Critical
Context" + blank line + content.
| patch("src.llm.runtime.client_for_model_config", return_value=mock_client), | ||
| patch.object(executor_mod, "backend_for_provider", side_effect=mock_backend_for_provider), | ||
| ): | ||
| with pytest.raises(Exception): |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the test file
git ls-files tests/llm/test_fallback_integration.pyRepository: plastic-labs/honcho
Length of output: 102
🏁 Script executed:
# Read the test file to understand the context around line 191
wc -l tests/llm/test_fallback_integration.pyRepository: plastic-labs/honcho
Length of output: 106
🏁 Script executed:
# Read the full file to understand the context
cat -n tests/llm/test_fallback_integration.pyRepository: plastic-labs/honcho
Length of output: 8005
Use a specific exception type instead of pytest.raises(Exception).
Catching Exception broadly can hide unrelated failures and weakens the test's signal. Create a dedicated test exception class and use it consistently so the test only catches the expected exception.
Suggested fix
+class MockHTTPError(Exception):
+ def __init__(self, status_code: int):
+ super().__init__(f"HTTP {status_code}")
+ self.status_code = status_code
+
class FailingBackend:
@@
async def complete(self, **kwargs):
self.call_count += 1
- exc = Exception(f"HTTP {self.status_code}")
- exc.status_code = self.status_code
- raise exc
+ raise MockHTTPError(self.status_code)
@@
async def stream(self, **kwargs):
self.call_count += 1
- exc = Exception(f"HTTP {self.status_code}")
- exc.status_code = self.status_code
- raise exc
+ raise MockHTTPError(self.status_code)
@@
- with pytest.raises(Exception):
+ with pytest.raises(MockHTTPError):
await honcho_llm_call(🧰 Tools
🪛 Ruff (0.15.12)
[warning] 191-191: Do not assert blind exception: Exception
(B017)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/llm/test_fallback_integration.py` at line 191, Replace the broad
pytest.raises(Exception) with a specific test exception: add a small custom
exception class (e.g., ExpectedTestError) near the top of
tests/llm/test_fallback_integration.py, update the test to use
pytest.raises(ExpectedTestError) instead of Exception, and ensure the code path
exercised by the test (the function or method under test referenced in the test)
raises ExpectedTestError so the test only catches the intended failure.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/llm/api.py (1)
351-363: 💤 Low valueConsider extracting duplicated Langfuse usage update logic.
The usage-reporting blocks at lines 351-363 and 404-416 are nearly identical. Extracting to a small helper would reduce duplication and ensure consistent behavior.
♻️ Suggested helper
def _update_langfuse_usage(result: HonchoLLMCallResponse[Any]) -> None: """Report token usage to Langfuse if enabled.""" if not settings.LANGFUSE_PUBLIC_KEY: return try: from langfuse import get_client usage = {} if result.input_tokens is not None: usage["input"] = result.input_tokens if result.output_tokens is not None: usage["output"] = result.output_tokens if usage: get_client().update_current_generation(usage_details=usage) except Exception as exc: logger.debug("Failed to update Langfuse usage: %s", exc)Then replace both blocks with:
if isinstance(result, HonchoLLMCallResponse): _update_langfuse_usage(result)Also applies to: 404-416
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/api.py` around lines 351 - 363, Extract the duplicated Langfuse usage-reporting logic into a helper like _update_langfuse_usage(result: HonchoLLMCallResponse) that checks settings.LANGFUSE_PUBLIC_KEY, builds the usage dict from result.input_tokens and result.output_tokens, calls get_client().update_current_generation(usage_details=usage) inside a try/except and logs failures via logger.debug; then replace both inlined blocks in src/llm/api.py (the blocks that test isinstance(result, HonchoLLMCallResponse) and perform the update) with a single call to _update_langfuse_usage(result).src/llm/nous_refresh.py (1)
169-170: ⚡ Quick winUse custom exception types instead of
RuntimeError.Per coding guidelines, exceptions should be defined in
src/exceptions.py. Consider aNousAuthErroror similar for these external API failures, enabling callers to catch and handle them specifically.♻️ Suggested approach
In
src/exceptions.py:class NousAuthError(Exception): """Raised when Nous OAuth operations fail.""" passThen in
nous_refresh.py:+from src.exceptions import NousAuthError ... - raise RuntimeError(f"Token refresh failed {resp.status_code}: {resp.text}") + raise NousAuthError(f"Token refresh failed {resp.status_code}: {resp.text}") ... - raise RuntimeError(f"Agent key mint failed {resp.status_code}: {resp.text}") + raise NousAuthError(f"Agent key mint failed {resp.status_code}: {resp.text}")Also applies to: 186-187
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/nous_refresh.py` around lines 169 - 170, Define a custom NousAuthError in src/exceptions.py (class NousAuthError(Exception): ...) and import it into src/llm/nous_refresh.py, then replace the RuntimeError raises that check resp.status_code != 200 (the current raise RuntimeError(f"Token refresh failed {resp.status_code}: {resp.text}")) and the similar raise at the other occurrence (lines referenced 186-187) with raises of NousAuthError including the same contextual message and resp details so callers can catch this specific auth failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/llm/api.py`:
- Around line 351-363: Extract the duplicated Langfuse usage-reporting logic
into a helper like _update_langfuse_usage(result: HonchoLLMCallResponse) that
checks settings.LANGFUSE_PUBLIC_KEY, builds the usage dict from
result.input_tokens and result.output_tokens, calls
get_client().update_current_generation(usage_details=usage) inside a try/except
and logs failures via logger.debug; then replace both inlined blocks in
src/llm/api.py (the blocks that test isinstance(result, HonchoLLMCallResponse)
and perform the update) with a single call to _update_langfuse_usage(result).
In `@src/llm/nous_refresh.py`:
- Around line 169-170: Define a custom NousAuthError in src/exceptions.py (class
NousAuthError(Exception): ...) and import it into src/llm/nous_refresh.py, then
replace the RuntimeError raises that check resp.status_code != 200 (the current
raise RuntimeError(f"Token refresh failed {resp.status_code}: {resp.text}")) and
the similar raise at the other occurrence (lines referenced 186-187) with raises
of NousAuthError including the same contextual message and resp details so
callers can catch this specific auth failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e0a1b1e3-e42b-45fe-95c1-2d6746a7925f
📒 Files selected for processing (3)
src/llm/api.pysrc/llm/nous_refresh.pytests/llm/test_fallback_integration.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/llm/test_fallback_integration.py
- Extract duplicated Langfuse usage reporting into _update_langfuse_usage() helper - Add NousAuthError custom exception in src/exceptions.py - Replace RuntimeError with NousAuthError in nous_refresh.py - Harden save_state() file permissions (0o600 for state file and temp) - Fix test_refresh_nous_credentials_success: properly mock httpx.AsyncClient - Fix test_track_name_updates_langfuse_span_name: expect 2 update_current_generation calls (generation + usage) - Fix update_env_key() to create .env file when missing instead of no-op - Sanitize PII in openspec docs (replace real peer IDs with placeholders)
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/llm/api.py (1)
189-190:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate stale fallback behavior text in the docstring.
Line 189-Line 190 still says fallback happens on the final retry attempt, but this implementation now fast-falls back on the first retryable failure.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/api.py` around lines 189 - 190, Update the stale docstring that states the backup provider/model is used on the final retry attempt: change it to say the configured ModelConfig.fallback is used immediately on the first retryable failure (i.e., the first retry attempt) rather than only on the final retry; update any related wording in the docstring in src/llm/api.py near the ModelConfig/fallback description (search for "Backup provider/model" and "final retry attempt") so it accurately reflects the current fast-fail fallback behavior.
♻️ Duplicate comments (5)
tests/llm/test_nous_refresh.py (2)
22-25:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMake
test_load_state_missinghermetic to ambient env vars.Line 22 can still fail when
NOUS_REFRESH_TOKENis set in CI/shell, becauseload_state()may return a token-backed dict instead of{}.✅ Minimal fix
-def test_load_state_missing(tmp_path: Path) -> None: +def test_load_state_missing( + tmp_path: Path, monkeypatch: pytest.MonkeyPatch +) -> None: """load_state returns {} when state file does not exist.""" + monkeypatch.delenv("NOUS_REFRESH_TOKEN", raising=False) with patch.object(Path, "exists", return_value=False): assert load_state() == {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/llm/test_nous_refresh.py` around lines 22 - 25, The test test_load_state_missing can still see NOUS_REFRESH_TOKEN from the environment; make it hermetic by clearing or patching os.environ during the test so load_state() cannot read a real token. Update the test_load_state_missing to patch the environment (e.g., using unittest.mock.patch.dict on os.environ to remove or set NOUS_REFRESH_TOKEN to an empty value) while also patching Path.exists to False so load_state() is guaranteed to return {}; reference the test function name test_load_state_missing and the load_state function when making the change.
78-89:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReplace the placeholder
passtest with a real assertion (or remove it).Line 85-Line 88 currently never validates
_find_project_root, so this test always passes regardless of behavior.✅ Minimal fix
def test_find_project_root_with_dotenv(tmp_path: Path) -> None: """_find_project_root locates directory containing .env.""" (tmp_path / ".env").touch() - # Simulate file deep in subdir deep = tmp_path / "sub" / "deep" deep.mkdir(parents=True) - fake_file = deep / "dummy.py" - with patch("src.llm.nous_refresh.Path", side_effect=lambda x: tmp_path / x if x == "__file__" else Path(x)): - # simpler: patch Path(__file__).resolve to return deep file - pass # This is complex; test at integration level may be better. - # Skip unit-level path walk test — rely on manual verification later. + assert _find_project_root(start=deep / "dummy.py") == tmp_path🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/llm/test_nous_refresh.py` around lines 78 - 89, The test test_find_project_root_with_dotenv currently contains a no-op pass; replace it with a real assertion that _find_project_root finds the tmp_path containing the .env. Inside the patch of "src.llm.nous_refresh.Path", make Path("__file__").resolve() (or the side_effect lambda) return the fake_file (deep / "dummy.py"), then call src.llm.nous_refresh._find_project_root(str(fake_file)) (or the module's _find_project_root function) and assert the result == tmp_path (or tmp_path.resolve()), ensuring the test validates that _find_project_root locates the directory containing .env; alternatively remove the whole test if you prefer not to implement the patch.src/llm/api.py (1)
213-219:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuarantee ContextVar restoration with
try/finallyaround the call flow.Line 213-Line 214 set task-local state, but
_restore_contextvars()only runs on success paths. Ifdecorated()orexecute_tool_loop()raises, state leaks into subsequent same-task calls.🔧 Suggested fix shape
_current_attempt_token = current_attempt.set(1) _force_fallback_token = force_fallback.set(False) def _restore_contextvars() -> None: current_attempt.reset(_current_attempt_token) force_fallback.reset(_force_fallback_token) - if not tools or not tool_executor: - result = await decorated() - ... - _restore_contextvars() - return result - - result = await execute_tool_loop(...) - ... - _restore_contextvars() - return result + try: + if not tools or not tool_executor: + result = await decorated() + ... + return result + + result = await execute_tool_loop(...) + ... + return result + finally: + _restore_contextvars()Also applies to: 364-387, 390-428
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/api.py` around lines 213 - 219, The contextvars set by current_attempt.set and force_fallback.set must be restored even on exceptions: wrap the call flow that invokes decorated() (and similarly any execute_tool_loop() call sites) in a try/finally where the finally always calls _restore_contextvars(), so _restore_contextvars() runs on both success and error paths; apply the same try/finally pattern in the other affected blocks that set these tokens (the blocks around execute_tool_loop and the ranges analogous to lines 364-387 and 390-428) to guarantee no task-local state leaks.openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/mcp-verification-report.md (1)
4-5:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRedact environment-specific identifiers in committed verification artifacts.
Line 4, Line 18, Line 46, and Line 75 still expose concrete workspace/peer/session IDs from a real environment. Please replace these with synthetic fixtures consistently across the report.
🧹 Suggested redaction pattern
-**Target Environment**: `hermes_workspace` (PTDEV) +**Target Environment**: `test-workspace` (fixture) - "workspace_id": "hermes_workspace", + "workspace_id": "test-workspace", - "peers": [ {"id": "hao-nguyen"}, {"id": "aziz"}, ... ] + "peers": [ {"id": "test-peer-1"}, {"id": "test-peer-2"}, ... ] - "sessions": [ {"id":"test-session-1"}, ... ] + "sessions": [ {"id":"session-1"}, {"id":"session-2"}, ... ]Also applies to: 18-35, 46-53, 75-82
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/mcp-verification-report.md` around lines 4 - 5, The report exposes real environment identifiers (e.g., the Target Environment value "hermes_workspace" and concrete workspace/peer/session IDs found in the report) and needs redaction; locate occurrences of those environment-specific strings in the file openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/mcp-verification-report.md (including the blocks around the header and the other repeated sections) and replace them with consistent synthetic fixtures (for example "env_placeholder", "workspace_id_fixture", "peer_id_fixture", "session_id_fixture") everywhere they appear so the same placeholder is used across the report. Ensure you update all instances in the noted sections and preserve formatting/markdown while not altering other content.src/llm/nous_refresh.py (1)
235-242:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix the in-memory settings import and stop swallowing update failures silently.
Line 236 imports
honcho.config, which does not match the runtime module path used elsewhere (src.config), so the in-memory key update can silently fail every time.As per coding guidelines, "Use explicit error handling with appropriate exception types from src/exceptions.py".🔧 Suggested fix
# 6. Update in-memory settings globally (if Honcho is running) try: - from honcho.config import settings + from src.config import settings settings.LLM.NOUS_API_KEY = agent_key logger.info("In-memory settings.LLM.NOUS_API_KEY updated") - except Exception: + except ImportError as exc: # settings may not be importable in all contexts (tests, CLI) - pass + logger.debug("Could not import settings for in-memory update: %s", exc)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/nous_refresh.py` around lines 235 - 242, The current import uses honcho.config and swallows all exceptions; change the import to use the runtime module path by importing settings from src.config (replace "from honcho.config import settings" with "from src.config import settings"), then update the try/except to catch and handle specific exceptions (ImportError, AttributeError and the project-specific ConfigError from src.exceptions) instead of a bare except; on failure log a clear error with the exception details via logger.error including agent_key/context and re-raise or propagate the error so the update to settings.LLM.NOUS_API_KEY (the settings.LLM.NOUS_API_KEY assignment referencing agent_key) does not fail silently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/llm/api.py`:
- Around line 189-190: Update the stale docstring that states the backup
provider/model is used on the final retry attempt: change it to say the
configured ModelConfig.fallback is used immediately on the first retryable
failure (i.e., the first retry attempt) rather than only on the final retry;
update any related wording in the docstring in src/llm/api.py near the
ModelConfig/fallback description (search for "Backup provider/model" and "final
retry attempt") so it accurately reflects the current fast-fail fallback
behavior.
---
Duplicate comments:
In
`@openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/mcp-verification-report.md`:
- Around line 4-5: The report exposes real environment identifiers (e.g., the
Target Environment value "hermes_workspace" and concrete workspace/peer/session
IDs found in the report) and needs redaction; locate occurrences of those
environment-specific strings in the file
openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/mcp-verification-report.md
(including the blocks around the header and the other repeated sections) and
replace them with consistent synthetic fixtures (for example "env_placeholder",
"workspace_id_fixture", "peer_id_fixture", "session_id_fixture") everywhere they
appear so the same placeholder is used across the report. Ensure you update all
instances in the noted sections and preserve formatting/markdown while not
altering other content.
In `@src/llm/api.py`:
- Around line 213-219: The contextvars set by current_attempt.set and
force_fallback.set must be restored even on exceptions: wrap the call flow that
invokes decorated() (and similarly any execute_tool_loop() call sites) in a
try/finally where the finally always calls _restore_contextvars(), so
_restore_contextvars() runs on both success and error paths; apply the same
try/finally pattern in the other affected blocks that set these tokens (the
blocks around execute_tool_loop and the ranges analogous to lines 364-387 and
390-428) to guarantee no task-local state leaks.
In `@src/llm/nous_refresh.py`:
- Around line 235-242: The current import uses honcho.config and swallows all
exceptions; change the import to use the runtime module path by importing
settings from src.config (replace "from honcho.config import settings" with
"from src.config import settings"), then update the try/except to catch and
handle specific exceptions (ImportError, AttributeError and the project-specific
ConfigError from src.exceptions) instead of a bare except; on failure log a
clear error with the exception details via logger.error including
agent_key/context and re-raise or propagate the error so the update to
settings.LLM.NOUS_API_KEY (the settings.LLM.NOUS_API_KEY assignment referencing
agent_key) does not fail silently.
In `@tests/llm/test_nous_refresh.py`:
- Around line 22-25: The test test_load_state_missing can still see
NOUS_REFRESH_TOKEN from the environment; make it hermetic by clearing or
patching os.environ during the test so load_state() cannot read a real token.
Update the test_load_state_missing to patch the environment (e.g., using
unittest.mock.patch.dict on os.environ to remove or set NOUS_REFRESH_TOKEN to an
empty value) while also patching Path.exists to False so load_state() is
guaranteed to return {}; reference the test function name
test_load_state_missing and the load_state function when making the change.
- Around line 78-89: The test test_find_project_root_with_dotenv currently
contains a no-op pass; replace it with a real assertion that _find_project_root
finds the tmp_path containing the .env. Inside the patch of
"src.llm.nous_refresh.Path", make Path("__file__").resolve() (or the side_effect
lambda) return the fake_file (deep / "dummy.py"), then call
src.llm.nous_refresh._find_project_root(str(fake_file)) (or the module's
_find_project_root function) and assert the result == tmp_path (or
tmp_path.resolve()), ensuring the test validates that _find_project_root locates
the directory containing .env; alternatively remove the whole test if you prefer
not to implement the patch.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dde833f4-dac0-432a-986f-f29c7101fdda
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/2026-05-07-honcho-philosophy-and-usecases.mdopenspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/mcp-verification-report.mdsrc/exceptions.pysrc/llm/api.pysrc/llm/nous_refresh.pytests/llm/test_nous_refresh.pytests/utils/test_clients.py
✅ Files skipped from review due to trivial changes (1)
- src/exceptions.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/utils/test_clients.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/llm/nous_refresh.py (1)
78-84:⚠️ Potential issue | 🟠 Major | ⚡ Quick winState directory still created with default umask.
Lines 82 and 84 correctly chmod the temp/final file to
0o600, butSTATE_FILE.parent.mkdir(parents=True, exist_ok=True)on line 78 still uses the process umask. On shared hosts,~/.honcho/and its tmp file (briefly) remain world-readable before the chmod runs. Two small hardening steps close this:🔒 Suggested hardening
- STATE_FILE.parent.mkdir(parents=True, exist_ok=True) - # Atomic write via temp file + rename - tmp = STATE_FILE.with_suffix(".tmp") - tmp.write_text(json.dumps(state, indent=2)) - os.chmod(tmp, 0o600) - tmp.replace(STATE_FILE) - os.chmod(STATE_FILE, 0o600) + STATE_FILE.parent.mkdir(parents=True, exist_ok=True) + os.chmod(STATE_FILE.parent, 0o700) + # Atomic write via temp file + rename; create with 0600 to avoid a + # window where the tmp file is world-readable before chmod. + tmp = STATE_FILE.with_suffix(".tmp") + fd = os.open(tmp, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600) + with os.fdopen(fd, "w") as f: + f.write(json.dumps(state, indent=2)) + tmp.replace(STATE_FILE)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/nous_refresh.py` around lines 78 - 84, STATE_FILE.parent.mkdir uses the process umask so the directory (and the temp file created inside it) can be briefly world-readable; wrap the mkdir in a temporary umask change and explicitly set the directory mode to a restrictive value (e.g. 0o700) so the dir is created with correct permissions: save old_umask = os.umask(0), call STATE_FILE.parent.mkdir(parents=True, exist_ok=True, mode=0o700) and then restore os.umask(old_umask); keep the existing tmp write/chmod/replace sequence (STATE_FILE.with_suffix and subsequent os.chmod calls) so files remain 0o600.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/llm/nous_refresh.py`:
- Around line 78-84: STATE_FILE.parent.mkdir uses the process umask so the
directory (and the temp file created inside it) can be briefly world-readable;
wrap the mkdir in a temporary umask change and explicitly set the directory mode
to a restrictive value (e.g. 0o700) so the dir is created with correct
permissions: save old_umask = os.umask(0), call
STATE_FILE.parent.mkdir(parents=True, exist_ok=True, mode=0o700) and then
restore os.umask(old_umask); keep the existing tmp write/chmod/replace sequence
(STATE_FILE.with_suffix and subsequent os.chmod calls) so files remain 0o600.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8d26aa54-a672-4be2-bc8d-5ad35554ee23
📒 Files selected for processing (4)
openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/mcp-verification-report.mdsrc/llm/api.pysrc/llm/nous_refresh.pytests/llm/test_nous_refresh.py
✅ Files skipped from review due to trivial changes (1)
- openspec/changes/archive/2026-05-07-verify-honcho-mcp-tools/explorations/mcp-verification-report.md
- Introduced new capability for tracing LLM calls as Langfuse generation observations. - Updated `honcho_llm_call` to use `@conditional_observe` with `as_type="generation"`. - Modified `update_current_langfuse_observation` to set the `model` field and preserve metadata. - Added comprehensive documentation and tasks for verification and future enhancements. - Created exploration and proposal documents to formalize the integration process.
…custom models and initialize project documentation
…name passing to honcho_llm_call for improved Langfuse telemetry aggregation.
- Force fallback on first retryable failure (429/5xx/timeout) - Configurable via FALLBACK_MODEL_CONFIG__ env vars - Langfuse signal: is_fallback=true on generation span - Configurable timeout via LLM_DEFAULT_TIMEOUT=180.0 - 26 tests (23 unit + 3 integration) all passing - OpenSpec spec synced and archived
…ore, test precision, unused import
- Extract duplicated Langfuse usage reporting into _update_langfuse_usage() helper - Add NousAuthError custom exception in src/exceptions.py - Replace RuntimeError with NousAuthError in nous_refresh.py - Harden save_state() file permissions (0o600 for state file and temp) - Fix test_refresh_nous_credentials_success: properly mock httpx.AsyncClient - Fix test_track_name_updates_langfuse_span_name: expect 2 update_current_generation calls (generation + usage) - Fix update_env_key() to create .env file when missing instead of no-op - Sanitize PII in openspec docs (replace real peer IDs with placeholders)
- Extract Langfuse usage update into _update_langfuse_usage helper - Guarantee ContextVar restoration with try/finally block - Update stale docstring about fallback behavior - Fix import path in nous_refresh.py (honcho.config -> src.config) - Fix bare except with specific exceptions and logging - Make test_load_state_missing hermetic with monkeypatch - Replace no-op pass with real assertion in test_find_project_root - Redact environment identifiers in mcp-verification-report.md
849dbb9 to
53d5156
Compare
# Conflicts: # uv.lock
PR #692 Code Review: LLM Model Fallback on FailureOverall Assessment: Approve with comments ✅This is a well-architected feature with strong test coverage (950+ lines of tests). The fast-failover design is the correct choice — retrying a dead provider wastes time and tokens. The implementation is production-ready with the caveats below. ✅ Strengths
🔴 Critical Issues1. DRY Violation: Auth-refresh logic duplicated 3 times in
|
| Area | Coverage | Notes |
|---|---|---|
select_model_config_for_attempt with force_fallback |
✅ Thorough | All attempt/fallback combinations tested |
_is_retryable_error classification |
✅ Good | HTTP codes, built-in exceptions, SDK exceptions |
| Integration: primary fails → fallback succeeds | ✅ Tested | Mock backends verify call counts |
| Integration: both fail → error propagated | ✅ Tested | |
| Nous auto-refresh | ✅ Tested | test_nous_refresh.py + test_backends/test_nous_autorefresh.py |
| Streaming fallback | Integration tests only cover non-streaming path | |
| Tool loop fallback reset | No test verifies force_fallback resets per iteration |
|
| Langfuse metadata | is_fallback metadata not asserted in tests |
Recommendation: Add at least one streaming fallback test and one tool-loop iteration test.
Summary
| Category | Count |
|---|---|
| Critical (must fix before merge) | 2 |
| Medium (should fix) | 4 |
| Minor (nice to have) | 3 |
| Test gaps | 3 |
The core fallback architecture is sound. The two critical issues (DRY violation in auth-refresh, race condition in Nous refresh) should be addressed before merge. The medium issues are improvements that can follow in a subsequent PR if timeline is tight.
phuongvm
left a comment
There was a problem hiding this comment.
Hermes Agent Code Review -- PR #692
Verdict: Changes Requested (3 critical issues, 4 warnings, 5 suggestions)
What is well done
- ContextVar scoping --
force_fallbackis properly saved with a reset token (_force_fallback_token) and restored in afinallyblock. This prevents cross-call leakage. - First-failure fast failover -- The design correctly switches to fallback on the first retryable error, not after exhausting all retries. This is the right UX for rate limits and outages.
- Langfuse span metadata -- Annotating the single generation span with
is_fallbackinstead of creating separate failed-attempt spans is clean and avoids trace pollution. - Backward compatibility -- When no fallback is configured,
force_fallbackstaysFalseand the existing retry path is unchanged. - Timeout consolidation -- Replacing hardcoded
600.0(Anthropic) / SDK defaults (~60s) with a singleLLM_DEFAULT_TIMEOUTenv var is good. 180s default is reasonable. - Test coverage breadth -- 23 unit tests + 3 integration tests cover the core logic: force_fallback True/False, no-fallback config, cross-provider, final-attempt fallback.
CRITICAL
-
_is_retryable_error()catchesOSErrortoo broadlysrc/llm/api.py:308:isinstance(exc, (TimeoutError, ConnectionError, OSError))--OSErrorincludes many non-retryable errors (permission denied, disk full, etc.).ConnectionErroris already a subclass ofOSError, so theOSErrorcatch-all is redundant and dangerous.- Fix: Remove
OSErrorfrom the tuple.TimeoutErrorandConnectionErrorare sufficient for network errors. SDK-specific exceptions cover the rest.
-
Missing agent name in fallback log (spec violation)
src/llm/api.py:339-343: The fallback log message includes transport/model but does not include the agent name (deriver, dialectic, dreamer, etc.). The spec (openspec/specs/llm-model-fallback/spec.md, Requirement 3) explicitly requires:"agent=deriver, primary=<provider>/<model>, fallback=<provider>/<model>, reason=<error>".- Fix: Thread agent name through
honcho_llm_calland include it in the log message.
-
nous_refresh.pywrites to.envfile at runtimesrc/llm/nous_refresh.py: Theupdate_env_key()function modifies the.envfile on disk after refreshing credentials. In containerized deployments (Docker, k8s), the.envis baked into the image or mounted read-only -- this write will fail silently or cause issues.- Fix: Write to a runtime state file (like the OAuth state file) and have the credential loader check that file before falling back to
.env. Or make this a no-op when running in a container.
WARNINGS
-
No test for partial fallback config
- The spec (Requirement 8, last scenario) requires that when only
FALLBACK__TRANSPORTis set withoutFALLBACK__MODEL, the system ignores the incomplete config. No test verifies this edge case. - Fix: Add a test case with
fallback=ModelConfig(transport="anthropic", model=None)and verify it returns primary.
- The spec (Requirement 8, last scenario) requires that when only
-
settings.LLM.NOUS_API_KEYmutation in_call_with_autorefreshsrc/llm/backends/openai.py:~143:settings.LLM.NOUS_API_KEY = new_key-- mutating a PydanticBaseSettingsfield at runtime is fragile. If the settings class is re-instantiated or validated elsewhere, the mutation is lost.- Fix: Consider storing the refreshed key in a dedicated module-level variable or the OAuth state file, and have the settings getter check that source first.
-
Stream path does not get
_update_langfuse_usagesrc/llm/api.py:~456-462:_update_langfuse_usage(result)is only called forHonchoLLMCallResponse(non-stream). Streaming responses returnAsyncIterator[HonchoLLMCallStreamChunk]and skip usage reporting entirely.- Fix: Hook usage reporting into the stream consumer or document this as a known gap.
-
honcho_llm_callContextVar reset in try/finally- The
_restore_contextvars()function is defined but I did not see afinallyblock calling it in the visible diff. If the ContextVars are not reset on exit (exception or normal return), subsequent calls may inherit staleforce_fallback=Truestate. - Verify: Ensure the
try/finallywrapping the call body actually calls_restore_contextvars().
- The
SUGGESTIONS
-
select_model_config_for_attempt()docstring says "last attempt" but now also handles force_fallback -- the docstring was updated but could be clearer about precedence:force_fallback=Trueoverrides attempt number logic entirely. -
_is_retryable_error()usestype(exc).__name__string matching -- this is fragile. If SDK exception class names change, the check silently stops working. Consider importing the actual SDK exception types when available. -
LMStudio and Nous share
OpenAIBackend-- this is correct (they are OpenAI-compatible), but theis_nous=Trueflag onOpenAIBackendis the only distinguishing feature. Consider whether aNousBackendsubclass would be cleaner as the Nous-specific logic grows (OAuth refresh, error handling). -
PR scope: 4,342 additions in 58 files -- this PR includes not just fallback logic but also: Langfuse generation traces, MCP QA verification, Nous OAuth refresh, Docker/MCP infrastructure, and OpenSpec archival. Consider splitting into smaller PRs for easier review.
Reviewed by Hermes Agent
- Remove OSError from _is_retryable_error() — OSError is too broad and includes non-retryable errors (permission denied, disk full). ConnectionError is already a subclass of OSError, making it redundant. - Add agent name (track_name) to fallback log message per spec Requirement 3. Log now includes: agent label, primary model, fallback model, attempt number, and error type/message. - Skip .env file write in containerized environments (Docker/k8s). .env is baked into images or mounted read-only in containers. Persisted state file from save_state() is sufficient for runtime credential management.
Summary
Introduces a robust model fallback mechanism for Honcho's LLM layer. When a primary LLM provider/model returns a retryable error (HTTP 429/5xx, timeout, connection error), the system automatically fails over to a configured fallback model on the very next attempt — no need to exhaust all retries on a dead provider first.
Problem
When an LLM provider is rate-limited (429), experiencing server errors (5xx), or unreachable (timeout/connection error), Honcho would retry on the same provider up to
retry_attempts(default 3) before giving up entirely. This meant:Solution
Fast Fallback on First Failure
A
force_fallbackContextVar tracks whether the current LLM call series has hit a retryable error. On the first detectable failure:before_retry_callbackinhoncho_llm_call()detects retryable errors via_is_retryable_error()force_fallback = Trueimmediatelyplan_attempt()function selects the fallbackModelConfigon all subsequent attemptsConfigurable via Environment Variables
Each agent can define its fallback model independently using
__-delimited Pydantic env vars:# Example: LMStudio local model as fallback for the Deriver agent DERIVER_MODEL_CONFIG__FALLBACK__TRANSPORT=lmstudio DERIVER_MODEL_CONFIG__FALLBACK__MODEL=qwen/qwen3.5-9b DERIVER_MODEL_CONFIG__FALLBACK__API_KEY=lmstudio-key DERIVER_MODEL_CONFIG__FALLBACK__BASE_URL=http://localhost:1234/v1The same pattern works for all agents:
DERIVER_MODEL_CONFIG__FALLBACK__*(memory worker)DIALECTIC_MODEL_CONFIG__FALLBACK__*(query/chat)DREAMER_MODEL_CONFIG__FALLBACK__*(memory consolidation)MCP_MODEL_CONFIG__FALLBACK__*(MCP server)Observability: Langfuse Signal
When fallback is active, the single generation span (not a separate failed-attempt span) is updated with:
is_fallback: truein metadataproviderandmodelnames appear on the spanThis keeps traces clean — no pollution from failed-attempt spans.
Configurable HTTP Timeout
Added
LLMSettings.DEFAULT_TIMEOUT(env varLLM_DEFAULT_TIMEOUT, default180.0) replacing the hardcoded600.0for Anthropic and SDK defaults (~60s) for OpenAI/LMStudio clients. All provider clients now use a consistent, configurable timeout.Architecture
Fallback Decision Logic
Retryable Error Detection
_is_retryable_error()classifies these as triggers:TimeoutErrorConnectionErrorAPITimeoutErrorAPIConnectionErrorInternalServerErrorServiceUnavailableErrorRateLimitErrorFiles Changed
Core Logic
src/llm/runtime.py_force_fallbackContextVar,force_fallbackparameter toselect_model_config_for_attempt(),is_fallbackparam toupdate_current_langfuse_observation()src/llm/api.py_is_retryable_error(),before_retry_callbackwith first-failure fallback trigger,_get_attempt_plan()withis_fallbackcomputationsrc/llm/tool_loop.pyforce_fallback.set(False)per tool iterationsrc/llm/registry.pytimeout=600.0/ SDK defaults withsettings.LLM.DEFAULT_TIMEOUT(env varLLM_DEFAULT_TIMEOUT)src/config.pyLLMSettings.DEFAULT_TIMEOUT: float = 180.0Configuration
.env.templateLLM_DEFAULT_TIMEOUTenv var documentationTests
tests/llm/test_fallback.pyselect_model_config_for_attempt,plan_attempt,_is_retryable_error, cross-provider fallback (openai→anthropic, nous→openai), ContextVar lifecycletests/llm/test_fallback_integration.pyDocumentation
CLAUDE.mdUsage
Testing
Verification
False)test_update_env_key_creates_file_if_missing,test_refresh_nous_credentials_success)Summary by CodeRabbit
New Features
Configuration
Improvements
Infrastructure
Documentation & Tests