feat(ai): per-provider max_tokens defaults with optional override#9703
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
1 issue found across 12 files
Architecture diagram
sequenceDiagram
participant UI as Settings UI (ai-config.tsx)
participant Store as Jotai Store (resolvedMarimoConfigAtom)
participant CompletionUtils as completion-utils.ts
participant API as AI Endpoint (ai.py)
participant Config as ai/config.py
participant Provider as Provider (providers.py)
participant Model as pydantic-ai Model
Note over UI,Model: Per-provider max_tokens defaults with optional override
UI->>UI: User toggles checkbox or changes input
UI->>Store: NEW: Update config.ai.max_tokens
CompletionUtils->>Store: NEW: Read config.ai.max_tokens
alt Override is set (number)
CompletionUtils-->>API: NEW: Include maxTokens in request body
else Override is unset
CompletionUtils-->>API: NEW: Omit maxTokens from request body
end
API->>Config: Calls get_max_tokens(config)
Config->>Config: Returns None when unconfigured, or configured value
alt Body includes explicit max_tokens
API->>API: Use body.max_tokens
else Body omits max_tokens
API->>API: Use get_max_tokens(config) result (may be None)
end
API->>Provider: stream_completion/stream_text(max_tokens)
alt Anthropic provider
Provider->>Provider: create_model(max_tokens)
alt max_tokens is None
Provider->>Provider: Apply ANTHROPIC_DEFAULT_MAX_TOKENS (32768)
else explicit value
Provider->>Provider: Pass through as-is
end
Provider->>Model: AnthropicModel(settings={max_tokens: resolved})
else OpenAI/Google/Bedrock provider
Provider->>Provider: create_model(max_tokens)
alt max_tokens is None
Provider-->>Model: Omit settings entirely (provider default applies)
else explicit value
Provider->>Model: Pass {max_tokens: N} in settings
end
Provider->>Model: (settings={max_tokens: N} or empty)
end
Model-->>Provider: Returns model instance
Provider-->>API: Returns streaming response
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
42eb503 to
e28fe6f
Compare
02ae3de to
3ab92a1
Compare
There was a problem hiding this comment.
Pull request overview
This PR removes the global DEFAULT_MAX_TOKENS=4096 behavior so providers can use their own defaults when max_tokens is unset, while keeping Anthropic’s required max_tokens field via a 32,768 default floor and adding a user-configurable override.
Changes:
- Backend:
get_max_tokens()now returnsNonewhen unset; providers omitmax_tokensin model settings unless explicitly configured (Anthropic applies a 32,768 default). - API/OpenAPI: adds optional
maxTokensto completion/chat request schemas and server request models. - Config/UI: adds
ai.max_tokensto user config schema and a Settings UI control that can set or clear the override (with “None-as-delete” support in config persistence).
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/_server/ai/test_providers.py | Adds tests for Anthropic floor behavior and omission/pass-through behavior when max_tokens is unset/set. |
| tests/_server/ai/test_ai_config.py | Updates get_max_tokens tests to expect None when unset. |
| tests/_config/test_manager.py | Adds test coverage for “None-as-delete” config save semantics. |
| packages/openapi/src/api.ts | Regenerated TS OpenAPI types to include optional maxTokens. |
| packages/openapi/api.yaml | Adds maxTokens to completion/chat request schemas. |
| marimo/_server/models/models.py | Documents “None-as-delete” semantics for user config saves. |
| marimo/_server/models/completion.py | Adds max_tokens to completion and chat request structs. |
| marimo/_server/api/endpoints/ai.py | Prefers request max_tokens when provided, otherwise falls back to config. |
| marimo/_server/ai/providers.py | Updates provider model/agent creation to accept optional max_tokens and omit settings when unset (Anthropic uses default floor). |
| marimo/_server/ai/constants.py | Replaces DEFAULT_MAX_TOKENS with ANTHROPIC_DEFAULT_MAX_TOKENS. |
| marimo/_server/ai/config.py | get_max_tokens() now returns `int |
| marimo/_config/manager.py | Implements recursive dropping of None values before writing config to disk. |
| marimo/_config/config.py | Minor signature formatting change. |
| frontend/src/core/config/config-schema.ts | Adds ai.max_tokens as nullable/optional in user config schema. |
| frontend/src/components/editor/ai/completion-utils.ts | Includes maxTokens in completion request bodies when configured. |
| frontend/src/components/app-config/user-config-form.tsx | Adjusts form debounce and onChange submit wiring. |
| frontend/src/components/app-config/ai-config.tsx | Adds UI controls (checkbox + numeric input) to set/clear ai.max_tokens. |
|
@kirangadhave I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
4 issues found across 18 files
Architecture diagram
sequenceDiagram
participant UI as User Config UI
participant RHF as React Hook Form
participant ConfigAPI as Config Save API
participant ConfigMgr as Config Manager
participant TOML as TOML Disk Store
participant CompletionUI as AI Completion UI
participant ResolvedConfig as Resolved Config Atom
participant CompletionReq as AI Completion/Chat Request
participant AIEndpoint as AI API Endpoint (/completion | /chat)
participant Provider as Provider (Anthropic/OpenAI/etc.)
participant PydanticAI as Pydantic-AI Model
Note over UI,TOML: Configuration Save Flow
UI->>RHF: checkbox toggle (on/off)
RHF->>RHF: setValue("ai.max_tokens", 32768 or null)
RHF->>ConfigAPI: saveConfig({ ai: { max_tokens: N | null } })
ConfigAPI->>ConfigMgr: save_config(partial config)
ConfigMgr->>ConfigMgr: merge with current, call _drop_none_values()
alt max_tokens is null
ConfigMgr->>TOML: write config WITHOUT "max_tokens" key
else max_tokens is a number
ConfigMgr->>TOML: write config WITH "max_tokens": N
end
TOML-->>ConfigMgr: success
ConfigMgr-->>ConfigAPI: ok
ConfigAPI-->>UI: config saved
Note over CompletionUI,PydanticAI: AI Completion/Chat Request Flow
CompletionUI->>ResolvedConfig: read resolvedMarimoConfigAtom
ResolvedConfig-->>CompletionUI: { ai: { max_tokens: N | undefined } }
CompletionUI->>CompletionReq: POST /api/ai/completion | /chat with maxTokens (if set)
CompletionReq->>AIEndpoint: request with body.maxTokens or undefined
AIEndpoint->>AIEndpoint: resolve: body.maxTokens ?? config.max_tokens ?? None
alt Non-Anthropic provider
AIEndpoint->>Provider: stream_completion/stream_text(max_tokens=None)
Provider->>PydanticAI: create_model(max_tokens=None)
PydanticAI-->>Provider: model with settings omitting max_tokens
Provider->>PydanticAI: stream using provider default
else Anthropic provider
AIEndpoint->>Provider: stream_completion/stream_text(max_tokens=N | 32768)
Provider->>Provider: if max_tokens is None, use ANTHROPIC_DEFAULT_MAX_TOKENS
Provider->>PydanticAI: create_model(max_tokens=32768 or N)
PydanticAI-->>Provider: model with explicit max_tokens
Provider->>PydanticAI: stream with max_tokens applied
end
PydanticAI-->>Provider: stream response
Provider-->>AIEndpoint: streamed text
AIEndpoint-->>CompletionReq: streaming response
CompletionReq-->>CompletionUI: display result
Note over CompletionReq,AIEndpoint: Error Path - Anthropic Without Override
alt No override, no config, Anthropic provider
CompletionReq->>AIEndpoint: body.maxTokens undefined, config max_tokens undefined
AIEndpoint->>Provider: max_tokens=None
Provider->>Provider: set to ANTHROPIC_DEFAULT_MAX_TOKENS (32768)
Provider->>PydanticAI: create_model(max_tokens=32768)
PydanticAI-->>Provider: model with max_tokens=32768
Note over Provider: succeeds (required field satisfied)
end
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
dmadisetti
left a comment
There was a problem hiding this comment.
Just some comments on language. Think it's a little awkward right now.
Looks like a few clean up things snuck into there, but nice!
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
98e0c34 to
f3e473b
Compare
f3e473b to
87e0d04
Compare
Light2Dark
left a comment
There was a problem hiding this comment.
Is there a issue this is attached too? I'm wondering if we should avoid adding UI config, just keep it in the toml.
If not, imo the UI is abit verbose. I would just say "Max tokens" and the description about overriding. You should also use the component to check if it has been overrided by the backend.
| # None-as-delete: any key whose merged value is None (typically because | ||
| # the incoming config explicitly sent null) is removed from disk. Lets | ||
| # the UI clear optional scalars (e.g. ai.max_tokens) without a separate | ||
| # delete primitive. | ||
| _drop_none_values(cast(dict[str, Any], merged)) |
There was a problem hiding this comment.
We should note this for bug bash. We had issues in the past where config changes on the frontend didn't save, this whole function is a bit fragile.
There was a problem hiding this comment.
I'll add it to notes for next release
There was a problem hiding this comment.
do you think we need more tests around this?
There was a problem hiding this comment.
Maybe..
The previous issues was if you add a custom model, then refresh, it wouldn't be saved.
Same if you ticked certain models in the AI models dropdown, and then refreshed, it would disappear.
Same with deleting a custom model.
I would test ^ during bug bash
I feel there is value in UI config. Just so one can override it if they are using local models like Ollama with much lower limits. (i assume we can just add a local provider with url?)
Do you suggest a specific copy for the text? |
Only a linear issue. ID mentioned in PR desc |
Max tokens: [ disabled if override is unticked ] [ ] Override maybe something like this?
yeap, thats possible |
Adds an optional ai.max_tokens setting in Settings > AI that overrides every provider's default. Anthropic gets a 32768 floor when unset; sending null removes the key from disk.
Consolidate the two descriptions into a single recommendation per review feedback.
87e0d04 to
925418c
Compare
|
Fixed copy:
Reads much better now. Thanks @Light2Dark |
i added a PR here #9720 for the UI, feel free to use it or leave it. I had an issue where my config wasn't being saved on refresh. So it keeps showing this |
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
Track the override-enabled state locally instead of deriving it from the field value, so clearing the input no longer disables it and unchecks the Override checkbox mid-edit.


Summary
Drop the blanket
DEFAULT_MAX_TOKENS = 4096so reasoning-heavy models (Kimi K2.6, DeepSeek, gpt-oss, etc.) stop truncating or hard-erroring at 4096.max_tokensfrom settings entirely when unset, letting pydantic-ai fall through to each provider's own default.32768because its API requires the field.ai.max_tokensconfig field plus a Settings > AI checkbox that, when on, applies one explicit limit to every provider (including Anthropic).Kimi K2.6 working with ~27k chars (roughly 6-7k tokens)

After UI override to 100

max_tokens, reasoning errors out at 550 charsBehavior
max_tokensomitted; provider default appliesNNNCloses MO-6314