Context window overflow produces raw error messages and no auto-recovery

## Problem

When a conversation approaches or exceeds the model's context window, users see raw HTTP error messages with request IDs, URLs, and JSON payloads instead of actionable guidance. There is no automatic recovery — the session is stuck until the user discovers the `/compact` workaround by trial and error.

## Observed errors

Two distinct failure patterns have been observed from the same session:

### 1. Anthropic 500 Internal Server Error (opaque context overflow)

```
all models failed: error receiving from stream: POST "https://api.anthropic.com/v1/messages?beta=true": 500 Internal Server Error (Request-ID: req_011CYBcGTSLUALfwB2eaiNPT)
{"type":"error","error":{"type":"api_error","message":"Internal server error"},"request_id":"req_011CYBcGTSLUALfwB2eaiNPT"}
```

Anthropic sometimes returns a 500 instead of a clear 400 when the payload is too large. Since 500 is classified as retryable by `isRetryableModelError()`, we retry the same request 3 times with exponential backoff — all failing identically because the context hasn't changed.

### 2. Anthropic 400 thinking budget / prompt too long cascade

```
all models failed: error receiving from stream: POST "https://api.anthropic.com/v1/messages?beta=true": 400 Bad Request
`max_tokens` must be greater than `thinking.budget_tokens`
```
```
all models failed: error receiving from stream: POST "https://api.anthropic.com/v1/messages": 400 Bad Request
prompt is too long: 226360 tokens > 200000 maximum
```

This is a two-step cascade:
- First, the Beta API path (with thinking enabled) fails because the prompt is so large that `max_tokens` can't accommodate the thinking budget. Anthropic validates `max_tokens > thinking.budget_tokens` before checking prompt length, so this is the error we get.
- Then, the non-Beta fallback path (without thinking) sends the same oversized prompt and gets the real error: `prompt is too long: 226360 tokens > 200000 maximum`.

The adapter's `retryFn` tries to fix this by clamping `max_tokens`, but that doesn't help when the **input itself** exceeds the context limit.

## How to reproduce

1. Start a long session with many tool calls producing large outputs (file reads, search results, etc.)
2. Continue using the session past ~90% of the model's context window (200k tokens for Claude)
3. The next request may trigger either error pattern above
4. The user sees the raw error and the session appears broken
5. If the user sends any follow-up message (e.g., "continue"), the compaction check triggers on the next loop iteration (because the stale token counts from the previous *successful* response now cross the 90% threshold), compaction runs, and the conversation resumes

## Root cause

The compaction check in `runtime.go` uses **stale token counts** from the previous response:

```go
if sess.InputTokens+sess.OutputTokens > int64(float64(contextLimit)*0.9) {
    r.Summarize(ctx, sess, "", events)
}
```

These counts (`sess.InputTokens`, `sess.OutputTokens`) are updated only after a successful response. If a single turn adds enough tokens (e.g., multiple large tool outputs) to push from ~88% to over 100%, the threshold doesn't trigger. The oversized request is sent to the API and fails.

## Specific issues

1. **Raw error messages shown to the user** — The full HTTP error with request IDs, URLs, and JSON payloads is displayed as-is in the TUI (`events <- Error(err.Error())` in `runtime.go`). Not user-friendly and not actionable.

2. **No auto-recovery on context overflow** — When the error is clearly caused by context overflow (either the 500 heuristic or the explicit `prompt is too long` 400), we could auto-compact and retry instead of stopping the session.

3. **Wasteful retries on 500 context overflow** — A 500 caused by context overflow will fail on every retry since the context doesn't change between attempts. We burn time on 3 attempts + exponential backoff for nothing.

4. **Stale token count in compaction check** — The 90% threshold relies on token counts that don't include the current turn's tool outputs. The check can miss overflow when a single turn contributes a large number of tokens.

5. **`max_tokens` / `thinking.budget_tokens` error is a red herring** — When the prompt is too large, the Beta API path fails with a confusing thinking budget constraint error before the real "prompt too long" error appears on the non-Beta fallback. This masks the actual problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context window overflow produces raw error messages and no auto-recovery #1750

Problem

Observed errors

1. Anthropic 500 Internal Server Error (opaque context overflow)

2. Anthropic 400 thinking budget / prompt too long cascade

How to reproduce

Root cause

Specific issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Context window overflow produces raw error messages and no auto-recovery #1750

Description

Problem

Observed errors

1. Anthropic 500 Internal Server Error (opaque context overflow)

2. Anthropic 400 thinking budget / prompt too long cascade

How to reproduce

Root cause

Specific issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions