fix(auto-itemize): bump max_tokens to 16384 + surface truncation explicitly (#1547)#1562
Merged
Conversation
…citly Real-world invoice triggered LLM_INVALID_RESPONSE with parseError "Unterminated string in JSON at position 10483" — Anthropic hit the 4096 max_tokens cap mid-stream while extracting a 40+ line German construction invoice, returning a truncated JSON that failed parsing. Changes: - Bump DEFAULT_MAX_TOKENS from 4096 → 16384. Stays under every supported provider's max output (OpenAI gpt-4o-mini is the binding 16K cap; Anthropic Haiku supports 64K, Gemini 65K). - Read `finish_reason` from the chat-completions envelope. When it's 'length' (LLM was truncated by our max_tokens cap), throw a distinct LlmInvalidResponseError with a clear actionable message instead of a confusing "Unterminated string in JSON" parse failure. - Include finishReason in error details on JSON parse failures too, so ops can spot the truncation pattern even if our explicit check misses. Co-Authored-By: Claude backend-developer (Haiku 4.5) <noreply@anthropic.com>
Contributor
|
🎉 This PR is included in version 2.7.0-beta.9 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
steilerDev
pushed a commit
that referenced
this pull request
May 24, 2026
Follow-up to #1562 (which set the cap to 16384). 16384 handles 100+ line invoices on every supported provider, but exposing the knob lets operators tune up for unusually large invoices without a code change, or tune down to tighten cost ceilings. - New optional env var LLM_MAX_TOKENS (default 16384, must be positive int) - Threaded through AppConfig.llmMaxTokens → LlmConfig.maxTokens → RequestBodyInput.maxTokens → request body's max_tokens - Documented in .env.example and CLAUDE.md env table - 4 new config tests (default, custom, non-numeric reject, zero reject) - 2 new providerProfiles tests (honors override, falls back to default) - 5 existing AppConfig fixtures updated with llmMaxTokens When omitted, behavior is identical to #1562 — 16384 default. When set, the operator's value flows end-to-end. Co-Authored-By: Claude backend-developer (Haiku 4.5) <noreply@anthropic.com>
steilerDev
added a commit
that referenced
this pull request
May 24, 2026
…) (#1563) Follow-up to #1562 (which set the cap to 16384). 16384 handles 100+ line invoices on every supported provider, but exposing the knob lets operators tune up for unusually large invoices without a code change, or tune down to tighten cost ceilings. - New optional env var LLM_MAX_TOKENS (default 16384, must be positive int) - Threaded through AppConfig.llmMaxTokens → LlmConfig.maxTokens → RequestBodyInput.maxTokens → request body's max_tokens - Documented in .env.example and CLAUDE.md env table - 4 new config tests (default, custom, non-numeric reject, zero reject) - 2 new providerProfiles tests (honors override, falls back to default) - 5 existing AppConfig fixtures updated with llmMaxTokens When omitted, behavior is identical to #1562 — 16384 default. When set, the operator's value flows end-to-end. Co-authored-by: Frank Steiler <frank@steiler.de> Co-authored-by: Claude backend-developer (Haiku 4.5) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Real-world invoice triggered `LLM_INVALID_RESPONSE` with parseError `Unterminated string in JSON at position 10483`. Anthropic hit our 4096 `max_tokens` cap mid-stream extracting 40+ verbose German construction line items, returning a truncated JSON that failed parsing.
Fixes the parse-failure surface the diagnostic logging from #1558 exposed against a real invoice.
Test plan
🤖 Generated with Claude Code