-
Notifications
You must be signed in to change notification settings - Fork 544
feat: LLM model fallback on failure — fast failover to secondary provider/model #692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
phuongvm
wants to merge
14
commits into
plastic-labs:main
Choose a base branch
from
phuongvm:feat/llm-model-fallback
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
1bbd426
Minor fix issue to work with lmstudio, fix building of mcp
phuonglambert 87591b7
Support LMStudio
phuonglambert 52b1043
feat: add support for Nous Research LLM provider
phuongvm 6a65436
feat: implement automatic Nous API key refreshing on authentication e…
phuongvm 3d45308
feat: integrate Langfuse for LLM observability in honcho
phuongvm c8f835e
feat: add explicit token usage reporting to Langfuse generations for …
phuongvm 9353891
refactor: replace outer summarizer span decorators with direct track_…
phuongvm 4be0598
feat: document honcho-mcp architecture and verification strategy for …
phuongvm 38fb776
feat: LLM model fallback on failure (t_124ab52c)
phuongvm 2d803e9
fix: address CodeRabbit review — HTTP 408 retryable, ContextVars rest…
phuongvm a89000a
fix: address remaining CodeRabbit review items for PR #692
phuongvm 53d5156
fix: address CodeRabbit review comments
phuongvm 4cdbb29
Merge remote-tracking branch 'origin/main' into pr-692
phuongvm e47d28c
fix(llm): address 3 critical review issues from #692
phuongvm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,3 +12,7 @@ docker-compose.yml.example | |
| .vscode/** | ||
| data/** | ||
| .venv | ||
| .wrangler | ||
| **/.wrangler | ||
| mcp/.wrangler | ||
| mcp/node_modules | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| .wrangler | ||
| node_modules |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| # --- Stage 1: Lightning Fast Dependencies --- | ||
| FROM oven/bun:1 AS builder | ||
| WORKDIR /app | ||
|
|
||
| # Copy configuration and use Bun's superior async network stack to bypass WSL MTU hangs | ||
| COPY package.json ./ | ||
| RUN bun install | ||
|
|
||
| # Copy source code | ||
| COPY . . | ||
|
|
||
| # --- Stage 2: Node Native Runtime --- | ||
| FROM node:20 | ||
| WORKDIR /app | ||
|
|
||
| # Ingest all the raw files (including node_modules) built gracefully by Bun | ||
| COPY --from=builder /app /app | ||
|
|
||
| # Setup environment | ||
| ENV CI=true | ||
|
|
||
| # Revert boot command to native NPM/Node context to bypass Wrangler's anti-Bun blocker | ||
| # Crucially, we wrap it in a shell to inject the Docker container's environment variable into Wrangler's isolated sandbox | ||
| CMD sh -c "npx wrangler dev src/index.ts --port 8787 --ip 0.0.0.0 --var HONCHO_API_URL:$HONCHO_API_URL" |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| // The "Bùa Ngải" Mock execution strategy requested by the User | ||
| import { plugin } from "bun"; | ||
|
|
||
| plugin({ | ||
| name: "cloudflare-mock", | ||
| setup(build) { | ||
| // Intercept standard resolution for cloudflare:email | ||
| build.onResolve({ filter: /^cloudflare:email$/ }, () => ({ | ||
| path: "mock-cloudflare-email", | ||
| namespace: "mock", | ||
| })); | ||
|
|
||
| // Supply a dummy payload that successfully resolves but does nothing | ||
| build.onLoad({ filter: /.*/, namespace: "mock" }, () => ({ | ||
| contents: "export default {}; export const EmailMessage = class {};", | ||
| loader: "js", | ||
| })); | ||
| }, | ||
| }); | ||
|
|
||
| // We dynamically import the main index ONLY AFTER the plugin is registered | ||
| import("./src/index.ts").then((module) => { | ||
| const port = process.env.HONCHO_MCP_PORT ? Number(process.env.HONCHO_MCP_PORT) : 8787; | ||
| Bun.serve({ | ||
| fetch: module.default.fetch as any, | ||
| port: port, | ||
| }); | ||
| console.log(`[Honcho MCP] Native mock server booted up on port ${port}!`); | ||
| }).catch(err => { | ||
| console.error("Failed to dynamically load MCP:", err); | ||
| }); |
2 changes: 2 additions & 0 deletions
2
openspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/.openspec.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| schema: spec-driven | ||
| created: 2026-05-05 |
32 changes: 32 additions & 0 deletions
32
openspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/design.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| ## Context | ||
|
|
||
| Langfuse aggregates model identification and token usage at the `GENERATION` observation level. When an observation is created as a `SPAN`, it acts purely as a grouping container and its table view lacks these columns. | ||
|
|
||
| In `src/utils/summarizer.py`, the `create_short_summary` and `create_long_summary` functions are wrapped in `@conditional_observe` decorators, emitting `SPAN` observations. However, they internally call `honcho_llm_call`, which emits a nested `GENERATION` observation. Langfuse captures the model/tokens on the inner Generation, leaving the outer Span telemetry empty in the UI. | ||
|
|
||
| Other modules (like the Dialectic Agent) avoid this by not using an outer decorator, and instead passing `track_name` directly to `honcho_llm_call()`, creating a single top-level `GENERATION` named exactly what it needs to be. | ||
|
|
||
| ## Goals / Non-Goals | ||
|
|
||
| **Goals:** | ||
| - Consolidate the Langfuse telemetry for summarizer operations into single `GENERATION` observations to expose model and token data at the root level of the trace. | ||
| - Align `src/utils/summarizer.py` with the `track_name` pattern established elsewhere in the codebase. | ||
|
|
||
| **Non-Goals:** | ||
| - Modifying the behavior, logic, or model choices of the summarizer algorithms themselves. | ||
| - Altering the implementation of `honcho_llm_call` or its core telemetry structure. | ||
|
|
||
| ## Decisions | ||
|
|
||
| **1. Remove outer `@conditional_observe` from summarizers** | ||
| Instead of having a `SPAN` encompassing a `GENERATION`, we will completely remove the outer `SPAN`. | ||
| *Rationale:* Nested observations where the inner contains all the actionable metadata lead to confusing UI experiences. A single pure generation is exactly what these functions represent. | ||
|
|
||
| **2. Pass `track_name` to `honcho_llm_call`** | ||
| We will add `track_name="Create Short Summary"` and `track_name="Create Long Summary"` to their respective `honcho_llm_call` arguments. | ||
| *Rationale:* `honcho_llm_call` is already designed to consume a `track_name` and correctly name the `GENERATION` trace. This perfectly satisfies the observability requirement without redundant decorators. | ||
|
|
||
| ## Risks / Trade-offs | ||
|
|
||
| - **Risk:** Any manual metadata that the outer `@conditional_observe` might have been capturing could be lost. | ||
| - **Mitigation:** The summarizer functions do not pass any special manual context kwargs or tags to the decorator; they simply name the span. The inner generation captures all input/output payload data reliably. |
47 changes: 47 additions & 0 deletions
47
...fix-summarizer-telemetry-spans/explorations/2026-05-05-nested-langfuse-spans.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| # Exploration: Nested Langfuse Observations & Missing Telemetry Columns | ||
|
|
||
| **Date**: 2026-05-05 | ||
| **Topic**: Root cause analysis for `Create Short Summary` (and similar traces) missing model and token attribution in the Langfuse UI. | ||
|
|
||
| ## The Problem | ||
| The user noticed that the trace observation named **"Create Short Summary"** appears in Langfuse without a defined model or token usage, despite the underlying LLM call functioning correctly. They hypothesized that this might be a widespread issue for other traces. | ||
|
|
||
| ## System Analysis | ||
|
|
||
| ### How the Summarizer works | ||
| In `src/utils/summarizer.py`, the functions are structured like this: | ||
| ```python | ||
| @conditional_observe(name="Create Short Summary") | ||
| async def create_short_summary(...) -> HonchoLLMCallResponse[str]: | ||
| # ... | ||
| return await honcho_llm_call(...) | ||
| ``` | ||
|
|
||
| ### The "Nested Observation" Conflict | ||
| 1. The `@conditional_observe` decorator on the outer function creates a **SPAN** observation (since it does not specify `as_type="generation"`). | ||
| 2. Inside that function, `honcho_llm_call` executes. `honcho_llm_call` is wrapped with `@conditional_observe(name="LLM Call", as_type="generation")`. | ||
| 3. Consequently, Langfuse records a nested hierarchy: | ||
| - **SPAN**: "Create Short Summary" *(No model/tokens)* | ||
| - **GENERATION**: "LLM Call" *(Contains model/tokens)* | ||
|
|
||
| Because Langfuse UI (specifically the trace and generations tables) only aggregates model and token statistics at the **GENERATION** level, the top-level "Create Short Summary" span appears empty. | ||
|
|
||
| ### How other modules (e.g. Dialectic Agent) avoid this | ||
| In `src/dialectic/core.py`, the `Dialectic Agent` does **not** use an outer `@conditional_observe` span. Instead, it utilizes the `track_name` argument directly: | ||
| ```python | ||
| return await honcho_llm_call( | ||
| # ... | ||
| track_name="Dialectic Agent", | ||
| ) | ||
| ``` | ||
| This causes the inner `honcho_llm_call` GENERATION observation to dynamically rename itself to "Dialectic Agent", cleanly consolidating the trace into a single generation with full token/model attribution. | ||
|
|
||
| ## Scope of the Issue | ||
| A codebase-wide search reveals that ONLY the `create_short_summary` and `create_long_summary` functions suffer from this nested `@conditional_observe` pattern. | ||
|
|
||
| ## Recommended Path Forward (Actionable Fix) | ||
| We can fix this permanently and elegantly by aligning the summarizer module with the `Dialectic Agent` pattern: | ||
| 1. **Remove** `@conditional_observe(name="Create Short Summary")` and `name="Create Long Summary"` from `src/utils/summarizer.py`. | ||
| 2. **Inject** `track_name="Create Short Summary"` and `track_name="Create Long Summary"` into their respective `honcho_llm_call(...)` invocations. | ||
|
|
||
| This will collapse the nested traces into single, pure GENERATION observations that fully populate the Langfuse UI. |
22 changes: 22 additions & 0 deletions
22
openspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/proposal.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| ## Why | ||
|
|
||
| Langfuse UI only aggregates and displays `model` and `tokens` explicitly at the `GENERATION` observation level. Currently, the `create_short_summary` and `create_long_summary` functions in `src/utils/summarizer.py` are wrapped with `@conditional_observe`, which creates a top-level `SPAN` observation. Inside them, `honcho_llm_call` creates a nested `GENERATION` observation. Because the root observation is a `SPAN`, the Langfuse dashboard does not display model and token usage for the summarizer traces at a glance, obscuring critical telemetry for these operations. | ||
|
|
||
| ## What Changes | ||
|
|
||
| - Remove the `@conditional_observe(name="Create Short Summary")` and `@conditional_observe(name="Create Long Summary")` decorators from the summarizer functions. | ||
| - Inject the names explicitly via the `track_name="Create Short Summary"` and `track_name="Create Long Summary"` parameters in the inner `honcho_llm_call` invocations. | ||
| - This collapses the traces into single, pure `GENERATION` observations that properly bubble up their telemetry in the Langfuse UI, mirroring the successful pattern used in `Dialectic Agent`. | ||
|
|
||
| ## Capabilities | ||
|
|
||
| ### New Capabilities | ||
| None. | ||
|
|
||
| ### Modified Capabilities | ||
| - `observability-langfuse`: Require that all LLM interactions, including summarizers, cleanly propagate top-level generation traces without opaque span wrappers. | ||
|
|
||
| ## Impact | ||
|
|
||
| - `src/utils/summarizer.py`: The `@conditional_observe` decorators will be removed and replaced with explicit `track_name` kwargs in the LLM calls. | ||
| - **Langfuse Telemetry**: Summarizer operations will appear as top-level `GENERATION`s instead of nested inside `SPAN`s, granting full visibility to token and model metadata. |
Empty file.
7 changes: 7 additions & 0 deletions
7
.../2026-05-05-fix-summarizer-telemetry-spans/specs/observability-langfuse/spec.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| ## MODIFIED Requirements | ||
|
|
||
| ### Requirement: Langfuse Observability Tracing | ||
|
|
||
| #### Scenario: Summarization Tracing | ||
| - **WHEN** a background task or explicit request triggers `create_short_summary` or `create_long_summary` | ||
| - **THEN** the system MUST trace it as a top-level `GENERATION` observation without nested `SPAN` wrappers to ensure accurate model and token attribution in the Langfuse UI |
11 changes: 11 additions & 0 deletions
11
openspec/changes/archive/2026-05-05-fix-summarizer-telemetry-spans/tasks.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| ## 1. Source Modification | ||
|
|
||
| - [x] 1.1 Remove `@conditional_observe(name="Create Short Summary")` from `create_short_summary` in `src/utils/summarizer.py`. | ||
| - [x] 1.2 Inject `track_name="Create Short Summary"` into the `honcho_llm_call` within `create_short_summary`. | ||
| - [x] 1.3 Remove `@conditional_observe(name="Create Long Summary")` from `create_long_summary` in `src/utils/summarizer.py`. | ||
| - [x] 1.4 Inject `track_name="Create Long Summary"` into the `honcho_llm_call` within `create_long_summary`. | ||
|
|
||
| ## 2. Verification | ||
|
|
||
| - [x] 2.1 Trigger a summary flow within the application (e.g. via agent context limits or explicit test). | ||
| - [x] 2.2 Verify in the Langfuse UI that `Create Short Summary` and `Create Long Summary` now appear as root-level `GENERATION` observations containing valid `model` and `tokens` columns, with no empty top-level SPAN wrappers. |
2 changes: 2 additions & 0 deletions
2
openspec/changes/archive/2026-05-05-honcho-langfuse-generation-traces/.openspec.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| schema: spec-driven | ||
| created: 2026-05-05 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify success vs error semantics in non-retryable list.
Including
200in the non-retryable error examples is misleading because a 200 response is successful and should not enter retry/error handling.🤖 Prompt for AI Agents