Bug
Every streaming response is recorded with zero input and output tokens. Cost and quota tracking are entirely inaccurate for streamed requests.
Location: src/factory.ts:1064
// buildFactoryStream() — called for all streaming responses
const usage = { inputTokens: 0, outputTokens: 0, totalTokens: 0, cost: estimatedCost };
The token fields are hardcoded to 0. Only the pre-request cost estimate (derived from character count) is stored. No post-stream reconciliation happens.
Downstream effects:
CreditLedger spend is based on rough estimates, not actual usage — monthly budgets can drift significantly
CostTracker per-provider totals are wrong for any provider used via streaming
QuotaHook receives actualCost: estimatedCost with no token breakdown (src/factory.ts:1079)
Root cause
generateResponseStream() returns a ReadableStream<string> rather than an LLMResponse, so the token counts in the final SSE event are never extracted.
Providers do surface final usage in their SSE streams:
- Anthropic:
message_delta event with usage.output_tokens; message_start with usage.input_tokens
- OpenAI / Groq / Cerebras: final
data: chunk with usage.prompt_tokens / usage.completion_tokens when stream_options: { include_usage: true } is set
Fix
- Pass
stream_options: { include_usage: true } in OpenAI/Groq/Cerebras streaming requests
- Buffer the final SSE chunk in each provider's stream path and extract usage
- After the stream drains, call the same
recordQuota / costTracker.trackCost path used by non-streaming responses
Acceptance criteria
Found by
Codebase audit (automated) — src/factory.ts:1047–1090
Bug
Every streaming response is recorded with zero input and output tokens. Cost and quota tracking are entirely inaccurate for streamed requests.
Location:
src/factory.ts:1064The token fields are hardcoded to
0. Only the pre-request cost estimate (derived from character count) is stored. No post-stream reconciliation happens.Downstream effects:
CreditLedgerspend is based on rough estimates, not actual usage — monthly budgets can drift significantlyCostTrackerper-provider totals are wrong for any provider used via streamingQuotaHookreceivesactualCost: estimatedCostwith no token breakdown (src/factory.ts:1079)Root cause
generateResponseStream()returns aReadableStream<string>rather than anLLMResponse, so the token counts in the final SSE event are never extracted.Providers do surface final usage in their SSE streams:
message_deltaevent withusage.output_tokens;message_startwithusage.input_tokensdata:chunk withusage.prompt_tokens/usage.completion_tokenswhenstream_options: { include_usage: true }is setFix
stream_options: { include_usage: true }in OpenAI/Groq/Cerebras streaming requestsrecordQuota/costTracker.trackCostpath used by non-streaming responsesAcceptance criteria
inputTokensandoutputTokenspost-streamCreditLedgerandCostTrackerreflect real token counts for streamed requestsFound by
Codebase audit (automated) —
src/factory.ts:1047–1090