fix(parser): fall back to message.id when requestId is absent#16
Open
estelledc wants to merge 2 commits into
Open
fix(parser): fall back to message.id when requestId is absent#16estelledc wants to merge 2 commits into
estelledc wants to merge 2 commits into
Conversation
Proxied API calls (corporate Bedrock gateways, third-party LLM relays) strip the requestId header but preserve message.id. Previously, every chunk landed in the without_req path and the line_uuid dedup never fired (each JSONL line carries a unique uuid), so streaming responses with N content-block lines got summed N times. Observed inflation on a real proxy environment: ~2.6x across 28k events / 511 sessions. Post-patch totals match the byte-for-byte reference (CCCostMonitor's analyze_usage.py) within 0.4%. Both id-spaces are globally unique per Anthropic API call, so falling back to message.id preserves all the existing canonicalization guarantees (deltaize, ghost-chunk extraction, mirror dedup). Direct-API users (where requestId is present) are unaffected. Tests: 30 existing parser tests pass; adds 3 regression tests covering same-file streaming, mirror dedup, and requestId-precedence.
The previous commit fixes ~2.6x token inflation for proxied API users by falling back to message.id when requestId is absent. But existing v2 caches were populated by the buggy parser, and the high-water-mark merge would freeze those inflated values in place across upgrades. Bumping to v3 forces a one-time clean rebuild on first launch (the existing 'Cache rebuilt' banner already covers this UX).
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes ~2.6× token over-counting on Claude Code installs that point at a proxied API endpoint (corporate Bedrock gateways, third-party LLM relays). Direct-API users are unaffected.
Root cause
dedup_by_request_idgroups assistant chunks byrequestIdand deltaizes cumulative snapshots. WhenrequestIdis absent the event falls through to thewithout_reqpath, which deduplicates byline_uuid. But Claude Code assigns a uniqueuuidper JSONL line — every content-block line of one streaming response gets a different uuid — so this fallback never collapses anything. Modern Claude Code also writes the same finalusagesnapshot on every content-block line, so an N-block response gets summed N times.The proxy case is the trigger: corporate gateways often strip the
requestIdrequest header but preservemessage.id(it lives in the response body). On a real user's archive (511 sessions, 28,581 assistant-with-usage lines, 12,619 uniquemessage.ids) this produced inflation factor 2.265× (token-weighted: 2.635×, since longer responses have more chunks). The 2.0.0 changelog claim "totals now match what Anthropic actually billed" silently regresses to 2-3× over-count for these users.Fix
request_id = raw.request_id.or_else(|| msg.id.clone())Both id-spaces (
req_…andmsg_…) are globally unique per Anthropic API call, so substitutingmessage.idwhenrequestIdis missing reuses the existing canonicalization machinery (deltaize, ghost-chunk extraction, canonical-stream picker, mirror dedup) without touching any of it. Direct-API users keeprequest_idsemantics exactly as before — the fallback only kicks in when the field isNone.Verification
message.idin its py implementation)Tests
dedups_assistant_chunks_by_message_id_when_request_id_absent— same-file streaming, identical cumulative snapshotsmirror_dedup_uses_message_id_when_request_id_absent— parent + sub-agent files mirroring three cumulative chunksprefers_request_id_over_message_id_when_both_present— sanity thatrequestIdstill wins when presentTest plan
cargo test parser::→ 33 pass (30 existing + 3 new)captures_request_id,deltaized_tokens_sum_to_final_snapshot_across_streams, etc.)🤖 Generated with Claude Code