fix(agent): gate handoff on provider token usage, not byte estimate by tlongwell-block · Pull Request #821 · block/sprout

tlongwell-block · 2026-06-02T19:29:00Z

fix(agent): gate handoff on provider token usage, not byte estimate

Problem

Agents fill their context, the inference provider 400s on the next request,
and the handoff never fires — the turn dies and the user's message can be
silently dropped (the batch dead-letters after retries). Root cause: the
handoff gate measured bytes (max_history_bytes * 0.75, default ~12 MiB)
while the real limit is in tokens. A normal model's token window is
exhausted long before the byte threshold is reached, so the gate is effectively
dead code, and the resulting 400 had no recovery path.
(Investigation: RESEARCH/SPROUT_HANDOFF_400_ROOT_CAUSE.md.)

Fix

Gate the handoff on the provider's reported input-token usage, which every
supported provider returns on the (non-streaming) response we already parse.

LlmResponse gains input_tokens: Option<u64> — the cache-summed input
total. Anthropic/Databricks sum input_tokens + cache_read + cache_creation
(plain input_tokens excludes cached tokens); OpenAI uses prompt_tokens;
Responses uses input_tokens. Missing usage → None.
Session stores last_request_input_tokens paired with
last_request_history_bytes (captured at the same instant — the history
actually sent to that request).
should_handoff is token-first: it projects
measured_tokens + estimate(current_bytes - measured_bytes) against
token_threshold = min(window * 3/4, window - max_output_tokens)
(reserves output headroom). The projection accounts for history appended
since the measurement (tool results, next prompt) — closing the stale-usage
gap. Estimate uses 1 byte/token, a guaranteed upper bound on tokens.
First request / missing usage → conservative byte fallback, capped by the
existing byte budget so it can only be more conservative than before.
Both fields cleared on handoff (context reset) and preserved on usage: None.
New config knob SPROUT_AGENT_MAX_CONTEXT_TOKENS (default 128_000),
validated to exceed max_output_tokens.

Non-streaming only (sprout-agent does not stream). No tokenizer dependency.

Tests

Parser unit tests: each provider, usage present/absent, cache-summed.
Threshold math unit tests: fraction vs output-headroom, saturation, upper-bound estimate.
Integration regressions (proven to fail on the pre-fix logic):
- token_usage_over_budget_triggers_handoff — usage over threshold hands off
  before the next complete(), on tiny prompts (proves token gate, not bytes).
- stale_usage_plus_history_growth_triggers_handoff — usage under threshold
  but a large tool result grows history past it; the projection fires.
Full cargo test -p sprout-agent green (79 unit + 19 regression), clippy
--all-targets -D warnings clean, cargo fmt --check clean.

Reviewed by Max (independent re-apply + full verification): 9/10+ on
minimalness, elegance, correctness.

Handoff gate measured bytes (~12 MiB) while the real limit is tokens, so the token window was exhausted long before the byte threshold and the gate was dead code — the next request 400'd with no recovery. Gate on the provider's reported input-token usage (cache-summed for Anthropic/Databricks, prompt_tokens for OpenAI). Token-first with output headroom, a measured-byte delta so history grown since the last request can't sneak past, and a conservative 1 B/tok byte fallback for the first/unknown call. Cleared on handoff, preserved on missing usage. New SPROUT_AGENT_MAX_CONTEXT_TOKENS knob (default 128k). Non-streaming only, no tokenizer dep. 79 unit + 19 regression green; clippy -D warnings + fmt clean. Live-verified end-to-end across Anthropic/OpenAI/Databricks. Signed-off-by: npub1qyvc0c5kl4gqv2fd97fsk46tu378sqgy35vc83rvgfwne90sel7s0ed67d <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>

tlongwell-block requested a review from a team as a code owner June 2, 2026 19:29

tlongwell-block force-pushed the eva/handoff-token-gate branch from 814bf37 to 0283732 Compare June 2, 2026 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): gate handoff on provider token usage, not byte estimate#821

fix(agent): gate handoff on provider token usage, not byte estimate#821
tlongwell-block wants to merge 1 commit into
mainfrom
eva/handoff-token-gate

tlongwell-block commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tlongwell-block commented Jun 2, 2026

fix(agent): gate handoff on provider token usage, not byte estimate

Problem

Fix

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant