Skip to content

fix(auto-itemize): bump max_tokens to 16384 + surface truncation explicitly (#1547)#1562

Merged
steilerDev merged 1 commit into
betafrom
fix/1547-max-tokens
May 24, 2026
Merged

fix(auto-itemize): bump max_tokens to 16384 + surface truncation explicitly (#1547)#1562
steilerDev merged 1 commit into
betafrom
fix/1547-max-tokens

Conversation

@steilerDev
Copy link
Copy Markdown
Owner

Summary

Real-world invoice triggered `LLM_INVALID_RESPONSE` with parseError `Unterminated string in JSON at position 10483`. Anthropic hit our 4096 `max_tokens` cap mid-stream extracting 40+ verbose German construction line items, returning a truncated JSON that failed parsing.

  • Bump DEFAULT_MAX_TOKENS from 4096 → 16384. Stays under every supported provider's output max (OpenAI gpt-4o-mini = 16K is the binding cap; Anthropic Haiku = 64K, Gemini = 65K).
  • Detect `finish_reason: 'length'` in the chat-completions envelope and surface a distinct `LLM_INVALID_RESPONSE` with message "LLM response was truncated (hit max_tokens). Increase LLM max_tokens or shorten the invoice OCR." — instead of the confusing "Unterminated string in JSON" parse failure.
  • Include `finishReason` in error details on parse failures too, so ops can spot the pattern even if the explicit check misses.

Fixes the parse-failure surface the diagnostic logging from #1558 exposed against a real invoice.

Test plan

  • New test asserts `finish_reason: 'length'` → distinct error with truncation message + `details.finishReason`
  • Existing `max_tokens: 4096` assertion bumped to 16384
  • Server typecheck passes

🤖 Generated with Claude Code

…citly

Real-world invoice triggered LLM_INVALID_RESPONSE with parseError
"Unterminated string in JSON at position 10483" — Anthropic hit the
4096 max_tokens cap mid-stream while extracting a 40+ line German
construction invoice, returning a truncated JSON that failed parsing.

Changes:
- Bump DEFAULT_MAX_TOKENS from 4096 → 16384. Stays under every supported
  provider's max output (OpenAI gpt-4o-mini is the binding 16K cap;
  Anthropic Haiku supports 64K, Gemini 65K).
- Read `finish_reason` from the chat-completions envelope. When it's
  'length' (LLM was truncated by our max_tokens cap), throw a distinct
  LlmInvalidResponseError with a clear actionable message instead of
  a confusing "Unterminated string in JSON" parse failure.
- Include finishReason in error details on JSON parse failures too, so
  ops can spot the truncation pattern even if our explicit check misses.

Co-Authored-By: Claude backend-developer (Haiku 4.5) <noreply@anthropic.com>
@steilerDev steilerDev merged commit 74be802 into beta May 24, 2026
29 of 32 checks passed
@steilerDev steilerDev deleted the fix/1547-max-tokens branch May 24, 2026 07:58
@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 2.7.0-beta.9 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

steilerDev pushed a commit that referenced this pull request May 24, 2026
Follow-up to #1562 (which set the cap to 16384). 16384 handles 100+ line
invoices on every supported provider, but exposing the knob lets operators
tune up for unusually large invoices without a code change, or tune down
to tighten cost ceilings.

- New optional env var LLM_MAX_TOKENS (default 16384, must be positive int)
- Threaded through AppConfig.llmMaxTokens → LlmConfig.maxTokens →
  RequestBodyInput.maxTokens → request body's max_tokens
- Documented in .env.example and CLAUDE.md env table
- 4 new config tests (default, custom, non-numeric reject, zero reject)
- 2 new providerProfiles tests (honors override, falls back to default)
- 5 existing AppConfig fixtures updated with llmMaxTokens

When omitted, behavior is identical to #1562 — 16384 default. When set,
the operator's value flows end-to-end.

Co-Authored-By: Claude backend-developer (Haiku 4.5) <noreply@anthropic.com>
steilerDev added a commit that referenced this pull request May 24, 2026
…) (#1563)

Follow-up to #1562 (which set the cap to 16384). 16384 handles 100+ line
invoices on every supported provider, but exposing the knob lets operators
tune up for unusually large invoices without a code change, or tune down
to tighten cost ceilings.

- New optional env var LLM_MAX_TOKENS (default 16384, must be positive int)
- Threaded through AppConfig.llmMaxTokens → LlmConfig.maxTokens →
  RequestBodyInput.maxTokens → request body's max_tokens
- Documented in .env.example and CLAUDE.md env table
- 4 new config tests (default, custom, non-numeric reject, zero reject)
- 2 new providerProfiles tests (honors override, falls back to default)
- 5 existing AppConfig fixtures updated with llmMaxTokens

When omitted, behavior is identical to #1562 — 16384 default. When set,
the operator's value flows end-to-end.

Co-authored-by: Frank Steiler <frank@steiler.de>
Co-authored-by: Claude backend-developer (Haiku 4.5) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant