fix(auto-itemize): bump max_tokens to 16384 + surface truncation explicitly (#1547) by steilerDev · Pull Request #1562 · steilerDev/cornerstone

steilerDev · 2026-05-24T07:35:28Z

Summary

Real-world invoice triggered `LLM_INVALID_RESPONSE` with parseError `Unterminated string in JSON at position 10483`. Anthropic hit our 4096 `max_tokens` cap mid-stream extracting 40+ verbose German construction line items, returning a truncated JSON that failed parsing.

Bump DEFAULT_MAX_TOKENS from 4096 → 16384. Stays under every supported provider's output max (OpenAI gpt-4o-mini = 16K is the binding cap; Anthropic Haiku = 64K, Gemini = 65K).
Detect `finish_reason: 'length'` in the chat-completions envelope and surface a distinct `LLM_INVALID_RESPONSE` with message "LLM response was truncated (hit max_tokens). Increase LLM max_tokens or shorten the invoice OCR." — instead of the confusing "Unterminated string in JSON" parse failure.
Include `finishReason` in error details on parse failures too, so ops can spot the pattern even if the explicit check misses.

Fixes the parse-failure surface the diagnostic logging from #1558 exposed against a real invoice.

Test plan

New test asserts `finish_reason: 'length'` → distinct error with truncation message + `details.finishReason`
Existing `max_tokens: 4096` assertion bumped to 16384
Server typecheck passes

🤖 Generated with Claude Code

…citly Real-world invoice triggered LLM_INVALID_RESPONSE with parseError "Unterminated string in JSON at position 10483" — Anthropic hit the 4096 max_tokens cap mid-stream while extracting a 40+ line German construction invoice, returning a truncated JSON that failed parsing. Changes: - Bump DEFAULT_MAX_TOKENS from 4096 → 16384. Stays under every supported provider's max output (OpenAI gpt-4o-mini is the binding 16K cap; Anthropic Haiku supports 64K, Gemini 65K). - Read `finish_reason` from the chat-completions envelope. When it's 'length' (LLM was truncated by our max_tokens cap), throw a distinct LlmInvalidResponseError with a clear actionable message instead of a confusing "Unterminated string in JSON" parse failure. - Include finishReason in error details on JSON parse failures too, so ops can spot the truncation pattern even if our explicit check misses. Co-Authored-By: Claude backend-developer (Haiku 4.5) <noreply@anthropic.com>

github-actions · 2026-05-24T07:59:58Z

🎉 This PR is included in version 2.7.0-beta.9 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Follow-up to #1562 (which set the cap to 16384). 16384 handles 100+ line invoices on every supported provider, but exposing the knob lets operators tune up for unusually large invoices without a code change, or tune down to tighten cost ceilings. - New optional env var LLM_MAX_TOKENS (default 16384, must be positive int) - Threaded through AppConfig.llmMaxTokens → LlmConfig.maxTokens → RequestBodyInput.maxTokens → request body's max_tokens - Documented in .env.example and CLAUDE.md env table - 4 new config tests (default, custom, non-numeric reject, zero reject) - 2 new providerProfiles tests (honors override, falls back to default) - 5 existing AppConfig fixtures updated with llmMaxTokens When omitted, behavior is identical to #1562 — 16384 default. When set, the operator's value flows end-to-end. Co-Authored-By: Claude backend-developer (Haiku 4.5) <noreply@anthropic.com>

…) (#1563) Follow-up to #1562 (which set the cap to 16384). 16384 handles 100+ line invoices on every supported provider, but exposing the knob lets operators tune up for unusually large invoices without a code change, or tune down to tighten cost ceilings. - New optional env var LLM_MAX_TOKENS (default 16384, must be positive int) - Threaded through AppConfig.llmMaxTokens → LlmConfig.maxTokens → RequestBodyInput.maxTokens → request body's max_tokens - Documented in .env.example and CLAUDE.md env table - 4 new config tests (default, custom, non-numeric reject, zero reject) - 2 new providerProfiles tests (honors override, falls back to default) - 5 existing AppConfig fixtures updated with llmMaxTokens When omitted, behavior is identical to #1562 — 16384 default. When set, the operator's value flows end-to-end. Co-authored-by: Frank Steiler <frank@steiler.de> Co-authored-by: Claude backend-developer (Haiku 4.5) <noreply@anthropic.com>

steilerDev mentioned this pull request May 24, 2026

feat(auto-itemize): operator-configurable LLM_MAX_TOKENS env var (#1547) #1563

Merged

4 tasks

steilerDev merged commit 74be802 into beta May 24, 2026
29 of 32 checks passed

steilerDev deleted the fix/1547-max-tokens branch May 24, 2026 07:58

github-actions Bot added the released on @beta label May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(auto-itemize): bump max_tokens to 16384 + surface truncation explicitly (#1547)#1562

fix(auto-itemize): bump max_tokens to 16384 + surface truncation explicitly (#1547)#1562
steilerDev merged 1 commit into
betafrom
fix/1547-max-tokens

steilerDev commented May 24, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

steilerDev commented May 24, 2026

Summary

Test plan

Uh oh!

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant