Skip to content

Context window overflow produces raw error messages and no auto-recovery #1750

@aheritier

Description

@aheritier

Problem

When a conversation approaches or exceeds the model's context window, users see raw HTTP error messages with request IDs, URLs, and JSON payloads instead of actionable guidance. There is no automatic recovery — the session is stuck until the user discovers the /compact workaround by trial and error.

Observed errors

Two distinct failure patterns have been observed from the same session:

1. Anthropic 500 Internal Server Error (opaque context overflow)

all models failed: error receiving from stream: POST "https://api.anthropic.com/v1/messages?beta=true": 500 Internal Server Error (Request-ID: req_011CYBcGTSLUALfwB2eaiNPT)
{"type":"error","error":{"type":"api_error","message":"Internal server error"},"request_id":"req_011CYBcGTSLUALfwB2eaiNPT"}

Anthropic sometimes returns a 500 instead of a clear 400 when the payload is too large. Since 500 is classified as retryable by isRetryableModelError(), we retry the same request 3 times with exponential backoff — all failing identically because the context hasn't changed.

2. Anthropic 400 thinking budget / prompt too long cascade

all models failed: error receiving from stream: POST "https://api.anthropic.com/v1/messages?beta=true": 400 Bad Request
`max_tokens` must be greater than `thinking.budget_tokens`
all models failed: error receiving from stream: POST "https://api.anthropic.com/v1/messages": 400 Bad Request
prompt is too long: 226360 tokens > 200000 maximum

This is a two-step cascade:

  • First, the Beta API path (with thinking enabled) fails because the prompt is so large that max_tokens can't accommodate the thinking budget. Anthropic validates max_tokens > thinking.budget_tokens before checking prompt length, so this is the error we get.
  • Then, the non-Beta fallback path (without thinking) sends the same oversized prompt and gets the real error: prompt is too long: 226360 tokens > 200000 maximum.

The adapter's retryFn tries to fix this by clamping max_tokens, but that doesn't help when the input itself exceeds the context limit.

How to reproduce

  1. Start a long session with many tool calls producing large outputs (file reads, search results, etc.)
  2. Continue using the session past ~90% of the model's context window (200k tokens for Claude)
  3. The next request may trigger either error pattern above
  4. The user sees the raw error and the session appears broken
  5. If the user sends any follow-up message (e.g., "continue"), the compaction check triggers on the next loop iteration (because the stale token counts from the previous successful response now cross the 90% threshold), compaction runs, and the conversation resumes

Root cause

The compaction check in runtime.go uses stale token counts from the previous response:

if sess.InputTokens+sess.OutputTokens > int64(float64(contextLimit)*0.9) {
    r.Summarize(ctx, sess, "", events)
}

These counts (sess.InputTokens, sess.OutputTokens) are updated only after a successful response. If a single turn adds enough tokens (e.g., multiple large tool outputs) to push from ~88% to over 100%, the threshold doesn't trigger. The oversized request is sent to the API and fails.

Specific issues

  1. Raw error messages shown to the user — The full HTTP error with request IDs, URLs, and JSON payloads is displayed as-is in the TUI (events <- Error(err.Error()) in runtime.go). Not user-friendly and not actionable.

  2. No auto-recovery on context overflow — When the error is clearly caused by context overflow (either the 500 heuristic or the explicit prompt is too long 400), we could auto-compact and retry instead of stopping the session.

  3. Wasteful retries on 500 context overflow — A 500 caused by context overflow will fail on every retry since the context doesn't change between attempts. We burn time on 3 attempts + exponential backoff for nothing.

  4. Stale token count in compaction check — The 90% threshold relies on token counts that don't include the current turn's tool outputs. The check can miss overflow when a single turn contributes a large number of tokens.

  5. max_tokens / thinking.budget_tokens error is a red herring — When the prompt is too large, the Beta API path fails with a confusing thinking budget constraint error before the real "prompt too long" error appears on the non-Beta fallback. This masks the actual problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions