Skip to content

fix(parser): fall back to message.id when requestId is absent#16

Open
estelledc wants to merge 2 commits into
hmenzagh:mainfrom
estelledc:fix/dedup-fallback-message-id
Open

fix(parser): fall back to message.id when requestId is absent#16
estelledc wants to merge 2 commits into
hmenzagh:mainfrom
estelledc:fix/dedup-fallback-message-id

Conversation

@estelledc
Copy link
Copy Markdown

Summary

Fixes ~2.6× token over-counting on Claude Code installs that point at a proxied API endpoint (corporate Bedrock gateways, third-party LLM relays). Direct-API users are unaffected.

Root cause

dedup_by_request_id groups assistant chunks by requestId and deltaizes cumulative snapshots. When requestId is absent the event falls through to the without_req path, which deduplicates by line_uuid. But Claude Code assigns a unique uuid per JSONL line — every content-block line of one streaming response gets a different uuid — so this fallback never collapses anything. Modern Claude Code also writes the same final usage snapshot on every content-block line, so an N-block response gets summed N times.

The proxy case is the trigger: corporate gateways often strip the requestId request header but preserve message.id (it lives in the response body). On a real user's archive (511 sessions, 28,581 assistant-with-usage lines, 12,619 unique message.ids) this produced inflation factor 2.265× (token-weighted: 2.635×, since longer responses have more chunks). The 2.0.0 changelog claim "totals now match what Anthropic actually billed" silently regresses to 2-3× over-count for these users.

Fix

request_id = raw.request_id.or_else(|| msg.id.clone())

Both id-spaces (req_… and msg_…) are globally unique per Anthropic API call, so substituting message.id when requestId is missing reuses the existing canonicalization machinery (deltaize, ghost-chunk extraction, canonical-stream picker, mirror dedup) without touching any of it. Direct-API users keep request_id semantics exactly as before — the fallback only kicks in when the field is None.

Verification

tokens (sum of all events)
Pre-patch CCMeter 4,437,331,044
Post-patch CCMeter (simulated) 1,690,796,103
CCCostMonitor reference (dedup by message.id in its py implementation) 1,683,903,949
Delta 0.4% (residual: null-msg.id line handling differences, not algorithmic)

Tests

  • All 30 existing parser tests pass.
  • Adds 3 regression tests:
    • dedups_assistant_chunks_by_message_id_when_request_id_absent — same-file streaming, identical cumulative snapshots
    • mirror_dedup_uses_message_id_when_request_id_absent — parent + sub-agent files mirroring three cumulative chunks
    • prefers_request_id_over_message_id_when_both_present — sanity that requestId still wins when present

Test plan

  • cargo test parser:: → 33 pass (30 existing + 3 new)
  • Real-data sanity: post-patch totals on a 511-session proxy archive match CCCostMonitor's reference within 0.4%
  • Reviewer to confirm direct-API behavior unchanged (covered by existing captures_request_id, deltaized_tokens_sum_to_final_snapshot_across_streams, etc.)

🤖 Generated with Claude Code

estelledc added 2 commits May 13, 2026 15:39
Proxied API calls (corporate Bedrock gateways, third-party LLM relays)
strip the requestId header but preserve message.id. Previously, every
chunk landed in the without_req path and the line_uuid dedup never
fired (each JSONL line carries a unique uuid), so streaming responses
with N content-block lines got summed N times.

Observed inflation on a real proxy environment: ~2.6x across 28k
events / 511 sessions. Post-patch totals match the byte-for-byte
reference (CCCostMonitor's analyze_usage.py) within 0.4%.

Both id-spaces are globally unique per Anthropic API call, so falling
back to message.id preserves all the existing canonicalization
guarantees (deltaize, ghost-chunk extraction, mirror dedup). Direct-API
users (where requestId is present) are unaffected.

Tests: 30 existing parser tests pass; adds 3 regression tests covering
same-file streaming, mirror dedup, and requestId-precedence.
The previous commit fixes ~2.6x token inflation for proxied API users by
falling back to message.id when requestId is absent. But existing v2
caches were populated by the buggy parser, and the high-water-mark merge
would freeze those inflated values in place across upgrades.

Bumping to v3 forces a one-time clean rebuild on first launch (the
existing 'Cache rebuilt' banner already covers this UX).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant