Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,7 @@ docker-compose.yml.example
.vscode/**
data/**
.venv
.wrangler
**/.wrangler
mcp/.wrangler
mcp/node_modules
20 changes: 18 additions & 2 deletions .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ LLM_OPENAI_API_KEY=your-api-key-here
# =============================================================================
# Global LLM settings
# LLM_DEFAULT_MAX_TOKENS=2500
# LLM_DEFAULT_TIMEOUT=180.0 # HTTP timeout (seconds) for all LLM provider clients
# LLM_MAX_TOOL_OUTPUT_CHARS=10000 # Max chars for tool output (~2500 tokens)
# LLM_MAX_MESSAGE_CONTENT_CHARS=2000 # Max chars per message in tool results

Expand Down Expand Up @@ -119,8 +120,8 @@ LLM_OPENAI_API_KEY=your-api-key-here
# DERIVER_FLUSH_ENABLED=false # Bypass batch token threshold, process work immediately
# DERIVER_MODEL_CONFIG__FALLBACK__MODEL=
# DERIVER_MODEL_CONFIG__FALLBACK__TRANSPORT=
# DERIVER_MODEL_CONFIG__OVERRIDES__BASE_URL=
# DERIVER_MODEL_CONFIG__OVERRIDES__API_KEY_ENV=
# DERIVER_MODEL_CONFIG__FALLBACK__OVERRIDES__BASE_URL=
# DERIVER_MODEL_CONFIG__FALLBACK__OVERRIDES__API_KEY_ENV=

# =============================================================================
# Peer Card
Expand Down Expand Up @@ -168,6 +169,14 @@ LLM_OPENAI_API_KEY=your-api-key-here
# Optional backup per level (must set both or neither):
# DIALECTIC_LEVELS__max__MODEL_CONFIG__FALLBACK__MODEL=gemini-2.5-pro
# DIALECTIC_LEVELS__max__MODEL_CONFIG__FALLBACK__TRANSPORT=gemini
# DIALECTIC_LEVELS__high__MODEL_CONFIG__FALLBACK__MODEL=
# DIALECTIC_LEVELS__high__MODEL_CONFIG__FALLBACK__TRANSPORT=
# DIALECTIC_LEVELS__medium__MODEL_CONFIG__FALLBACK__MODEL=
# DIALECTIC_LEVELS__medium__MODEL_CONFIG__FALLBACK__TRANSPORT=
# DIALECTIC_LEVELS__low__MODEL_CONFIG__FALLBACK__MODEL=
# DIALECTIC_LEVELS__low__MODEL_CONFIG__FALLBACK__TRANSPORT=
# DIALECTIC_LEVELS__minimal__MODEL_CONFIG__FALLBACK__MODEL=
# DIALECTIC_LEVELS__minimal__MODEL_CONFIG__FALLBACK__TRANSPORT=

# =============================================================================
# Summary
Expand All @@ -186,6 +195,9 @@ LLM_OPENAI_API_KEY=your-api-key-here
# SUMMARY_MAX_TOKENS_SHORT=1000
# SUMMARY_MAX_TOKENS_LONG=4000
# SUMMARY_MODEL_CONFIG__FALLBACK__MODEL=
# SUMMARY_MODEL_CONFIG__FALLBACK__TRANSPORT=
# SUMMARY_MODEL_CONFIG__FALLBACK__OVERRIDES__BASE_URL=
# SUMMARY_MODEL_CONFIG__FALLBACK__OVERRIDES__API_KEY_ENV=

# =============================================================================
# Dream
Expand All @@ -201,6 +213,10 @@ LLM_OPENAI_API_KEY=your-api-key-here
# DREAM_DEDUCTION_MODEL_CONFIG__OVERRIDES__BASE_URL=https://openrouter.ai/api/v1
# DREAM_INDUCTION_MODEL_CONFIG__MODEL=your-model-here
# DREAM_INDUCTION_MODEL_CONFIG__OVERRIDES__BASE_URL=https://openrouter.ai/api/v1
# DREAM_DEDUCTION_MODEL_CONFIG__FALLBACK__MODEL=
# DREAM_DEDUCTION_MODEL_CONFIG__FALLBACK__TRANSPORT=
# DREAM_INDUCTION_MODEL_CONFIG__FALLBACK__MODEL=
# DREAM_INDUCTION_MODEL_CONFIG__FALLBACK__TRANSPORT=
# DREAM_DOCUMENT_THRESHOLD=50
# DREAM_IDLE_TIMEOUT_MINUTES=60
# DREAM_MIN_HOURS_BETWEEN_DREAMS=8
Expand Down
53 changes: 53 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -297,3 +297,56 @@ src/
### Notes

- Always use `uv run` or `uv` to prefix any commands related to python to ensure you use the virtual environment

## LLM Model Fallback

Honcho supports automatic fallback to a secondary LLM model when the primary model fails (rate limit, timeout, API error).

### How It Works

1. **First-failure trigger**: When the primary model returns a retryable error (429, 5xx, timeout, connection error), the system immediately switches to the fallback model on the next attempt — it does NOT wait for all retries to exhaust.
2. **Per-agent configuration**: Each agent (Deriver, Dialectic, Dreamer, Summary) has independent fallback configuration.
3. **Cross-provider support**: The fallback model can use a different provider (e.g., primary=openai, fallback=lmstudio).
4. **Backward compatible**: If no fallback is configured, behavior is identical to before.

### Configuration

Set these environment variables for each agent:

```bash
# Deriver
DERIVER_MODEL_CONFIG__FALLBACK__TRANSPORT=lmstudio
DERIVER_MODEL_CONFIG__FALLBACK__MODEL=qwen/qwen3.5-9b

# Dialectic (per reasoning level)
DIALECTIC_LEVELS__low__MODEL_CONFIG__FALLBACK__TRANSPORT=lmstudio
DIALECTIC_LEVELS__low__MODEL_CONFIG__FALLBACK__MODEL=qwen/qwen3.5-9b

# Summary
SUMMARY_MODEL_CONFIG__FALLBACK__TRANSPORT=lmstudio
SUMMARY_MODEL_CONFIG__FALLBACK__MODEL=qwen/qwen3.5-9b

# Dream
DREAM_DEDUCTION_MODEL_CONFIG__FALLBACK__TRANSPORT=lmstudio
DREAM_DEDUCTION_MODEL_CONFIG__FALLBACK__MODEL=qwen/qwen3.5-9b
DREAM_INDUCTION_MODEL_CONFIG__FALLBACK__TRANSPORT=lmstudio
DREAM_INDUCTION_MODEL_CONFIG__FALLBACK__MODEL=qwen/qwen3.5-9b
```

### Observability

- **Logs**: WARNING-level log emitted when fallback is activated, including primary→fallback provider/model info
- **Langfuse**: Generation span metadata includes `is_fallback: true` when fallback model is used
- **No separate span**: Only the successful attempt's span is kept (failed attempts do not create separate spans)

### Retryable Errors

The following errors trigger fast fallback:
- HTTP 429 (Too Many Requests / Rate Limit)
- HTTP 5xx (Server Errors)
- `TimeoutError`
- `ConnectionError`
- `OSError`
- SDK-specific: `APIConnectionError`, `APITimeoutError`, `InternalServerError`, `ServiceUnavailableError`, `RateLimitError`

Non-retryable errors (400, 200, ValueError, etc.) do NOT trigger fallback — they follow normal retry behavior.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify success vs error semantics in non-retryable list.

Including 200 in the non-retryable error examples is misleading because a 200 response is successful and should not enter retry/error handling.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CLAUDE.md` at line 321, Update the misleading example in the line that
currently reads "Non-retryable errors (400, 200, ValueError, etc.)..." by
removing the 200 status and replacing it with appropriate non-success examples
(e.g., 4xx codes such as 400 or 404) or rewording to "Non-retryable errors
(e.g., 400, 404, ValueError)" so the phrase "Non-retryable errors" no longer
lists a 200 success code; locate the exact sentence "Non-retryable errors (400,
200, ValueError, etc.)..." and change it accordingly.

2 changes: 2 additions & 0 deletions mcp/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.wrangler
node_modules
24 changes: 24 additions & 0 deletions mcp/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# --- Stage 1: Lightning Fast Dependencies ---
FROM oven/bun:1 AS builder
WORKDIR /app

# Copy configuration and use Bun's superior async network stack to bypass WSL MTU hangs
COPY package.json ./
RUN bun install

# Copy source code
COPY . .

# --- Stage 2: Node Native Runtime ---
FROM node:20
WORKDIR /app

# Ingest all the raw files (including node_modules) built gracefully by Bun
COPY --from=builder /app /app

# Setup environment
ENV CI=true

# Revert boot command to native NPM/Node context to bypass Wrangler's anti-Bun blocker
# Crucially, we wrap it in a shell to inject the Docker container's environment variable into Wrangler's isolated sandbox
CMD sh -c "npx wrangler dev src/index.ts --port 8787 --ip 0.0.0.0 --var HONCHO_API_URL:$HONCHO_API_URL"
8 changes: 2 additions & 6 deletions mcp/bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions mcp/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@
"nanoid": "^5.1.7",
"zod": "^4.3.6"
},
"browser": {
"cloudflare:email": false,
"core-js-pure/features/instance/trim": false
},
"devDependencies": {
"@cloudflare/workers-types": "^4.20241002.0",
"typescript": "^5.3.3",
Expand Down
31 changes: 31 additions & 0 deletions mcp/run-mocked.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
// The "Bùa Ngải" Mock execution strategy requested by the User
import { plugin } from "bun";

plugin({
name: "cloudflare-mock",
setup(build) {
// Intercept standard resolution for cloudflare:email
build.onResolve({ filter: /^cloudflare:email$/ }, () => ({
path: "mock-cloudflare-email",
namespace: "mock",
}));

// Supply a dummy payload that successfully resolves but does nothing
build.onLoad({ filter: /.*/, namespace: "mock" }, () => ({
contents: "export default {}; export const EmailMessage = class {};",
loader: "js",
}));
},
});

// We dynamically import the main index ONLY AFTER the plugin is registered
import("./src/index.ts").then((module) => {
const port = process.env.HONCHO_MCP_PORT ? Number(process.env.HONCHO_MCP_PORT) : 8787;
Bun.serve({
fetch: module.default.fetch as any,
port: port,
});
console.log(`[Honcho MCP] Native mock server booted up on port ${port}!`);
}).catch(err => {
console.error("Failed to dynamically load MCP:", err);
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-05-05
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
## Context

Langfuse aggregates model identification and token usage at the `GENERATION` observation level. When an observation is created as a `SPAN`, it acts purely as a grouping container and its table view lacks these columns.

In `src/utils/summarizer.py`, the `create_short_summary` and `create_long_summary` functions are wrapped in `@conditional_observe` decorators, emitting `SPAN` observations. However, they internally call `honcho_llm_call`, which emits a nested `GENERATION` observation. Langfuse captures the model/tokens on the inner Generation, leaving the outer Span telemetry empty in the UI.

Other modules (like the Dialectic Agent) avoid this by not using an outer decorator, and instead passing `track_name` directly to `honcho_llm_call()`, creating a single top-level `GENERATION` named exactly what it needs to be.

## Goals / Non-Goals

**Goals:**
- Consolidate the Langfuse telemetry for summarizer operations into single `GENERATION` observations to expose model and token data at the root level of the trace.
- Align `src/utils/summarizer.py` with the `track_name` pattern established elsewhere in the codebase.

**Non-Goals:**
- Modifying the behavior, logic, or model choices of the summarizer algorithms themselves.
- Altering the implementation of `honcho_llm_call` or its core telemetry structure.

## Decisions

**1. Remove outer `@conditional_observe` from summarizers**
Instead of having a `SPAN` encompassing a `GENERATION`, we will completely remove the outer `SPAN`.
*Rationale:* Nested observations where the inner contains all the actionable metadata lead to confusing UI experiences. A single pure generation is exactly what these functions represent.

**2. Pass `track_name` to `honcho_llm_call`**
We will add `track_name="Create Short Summary"` and `track_name="Create Long Summary"` to their respective `honcho_llm_call` arguments.
*Rationale:* `honcho_llm_call` is already designed to consume a `track_name` and correctly name the `GENERATION` trace. This perfectly satisfies the observability requirement without redundant decorators.

## Risks / Trade-offs

- **Risk:** Any manual metadata that the outer `@conditional_observe` might have been capturing could be lost.
- **Mitigation:** The summarizer functions do not pass any special manual context kwargs or tags to the decorator; they simply name the span. The inner generation captures all input/output payload data reliably.
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Exploration: Nested Langfuse Observations & Missing Telemetry Columns

**Date**: 2026-05-05
**Topic**: Root cause analysis for `Create Short Summary` (and similar traces) missing model and token attribution in the Langfuse UI.

## The Problem
The user noticed that the trace observation named **"Create Short Summary"** appears in Langfuse without a defined model or token usage, despite the underlying LLM call functioning correctly. They hypothesized that this might be a widespread issue for other traces.

## System Analysis

### How the Summarizer works
In `src/utils/summarizer.py`, the functions are structured like this:
```python
@conditional_observe(name="Create Short Summary")
async def create_short_summary(...) -> HonchoLLMCallResponse[str]:
# ...
return await honcho_llm_call(...)
```

### The "Nested Observation" Conflict
1. The `@conditional_observe` decorator on the outer function creates a **SPAN** observation (since it does not specify `as_type="generation"`).
2. Inside that function, `honcho_llm_call` executes. `honcho_llm_call` is wrapped with `@conditional_observe(name="LLM Call", as_type="generation")`.
3. Consequently, Langfuse records a nested hierarchy:
- **SPAN**: "Create Short Summary" *(No model/tokens)*
- **GENERATION**: "LLM Call" *(Contains model/tokens)*

Because Langfuse UI (specifically the trace and generations tables) only aggregates model and token statistics at the **GENERATION** level, the top-level "Create Short Summary" span appears empty.

### How other modules (e.g. Dialectic Agent) avoid this
In `src/dialectic/core.py`, the `Dialectic Agent` does **not** use an outer `@conditional_observe` span. Instead, it utilizes the `track_name` argument directly:
```python
return await honcho_llm_call(
# ...
track_name="Dialectic Agent",
)
```
This causes the inner `honcho_llm_call` GENERATION observation to dynamically rename itself to "Dialectic Agent", cleanly consolidating the trace into a single generation with full token/model attribution.

## Scope of the Issue
A codebase-wide search reveals that ONLY the `create_short_summary` and `create_long_summary` functions suffer from this nested `@conditional_observe` pattern.

## Recommended Path Forward (Actionable Fix)
We can fix this permanently and elegantly by aligning the summarizer module with the `Dialectic Agent` pattern:
1. **Remove** `@conditional_observe(name="Create Short Summary")` and `name="Create Long Summary"` from `src/utils/summarizer.py`.
2. **Inject** `track_name="Create Short Summary"` and `track_name="Create Long Summary"` into their respective `honcho_llm_call(...)` invocations.

This will collapse the nested traces into single, pure GENERATION observations that fully populate the Langfuse UI.
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## Why

Langfuse UI only aggregates and displays `model` and `tokens` explicitly at the `GENERATION` observation level. Currently, the `create_short_summary` and `create_long_summary` functions in `src/utils/summarizer.py` are wrapped with `@conditional_observe`, which creates a top-level `SPAN` observation. Inside them, `honcho_llm_call` creates a nested `GENERATION` observation. Because the root observation is a `SPAN`, the Langfuse dashboard does not display model and token usage for the summarizer traces at a glance, obscuring critical telemetry for these operations.

## What Changes

- Remove the `@conditional_observe(name="Create Short Summary")` and `@conditional_observe(name="Create Long Summary")` decorators from the summarizer functions.
- Inject the names explicitly via the `track_name="Create Short Summary"` and `track_name="Create Long Summary"` parameters in the inner `honcho_llm_call` invocations.
- This collapses the traces into single, pure `GENERATION` observations that properly bubble up their telemetry in the Langfuse UI, mirroring the successful pattern used in `Dialectic Agent`.

## Capabilities

### New Capabilities
None.

### Modified Capabilities
- `observability-langfuse`: Require that all LLM interactions, including summarizers, cleanly propagate top-level generation traces without opaque span wrappers.

## Impact

- `src/utils/summarizer.py`: The `@conditional_observe` decorators will be removed and replaced with explicit `track_name` kwargs in the LLM calls.
- **Langfuse Telemetry**: Summarizer operations will appear as top-level `GENERATION`s instead of nested inside `SPAN`s, granting full visibility to token and model metadata.
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
## MODIFIED Requirements

### Requirement: Langfuse Observability Tracing

#### Scenario: Summarization Tracing
- **WHEN** a background task or explicit request triggers `create_short_summary` or `create_long_summary`
- **THEN** the system MUST trace it as a top-level `GENERATION` observation without nested `SPAN` wrappers to ensure accurate model and token attribution in the Langfuse UI
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## 1. Source Modification

- [x] 1.1 Remove `@conditional_observe(name="Create Short Summary")` from `create_short_summary` in `src/utils/summarizer.py`.
- [x] 1.2 Inject `track_name="Create Short Summary"` into the `honcho_llm_call` within `create_short_summary`.
- [x] 1.3 Remove `@conditional_observe(name="Create Long Summary")` from `create_long_summary` in `src/utils/summarizer.py`.
- [x] 1.4 Inject `track_name="Create Long Summary"` into the `honcho_llm_call` within `create_long_summary`.

## 2. Verification

- [x] 2.1 Trigger a summary flow within the application (e.g. via agent context limits or explicit test).
- [x] 2.2 Verify in the Langfuse UI that `Create Short Summary` and `Create Long Summary` now appear as root-level `GENERATION` observations containing valid `model` and `tokens` columns, with no empty top-level SPAN wrappers.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-05-05
Loading