Skip to content

feat(ai): per-provider max_tokens defaults with optional override#9703

Merged
kirangadhave merged 5 commits into
mainfrom
kg/ai-max-tokens-override
May 30, 2026
Merged

feat(ai): per-provider max_tokens defaults with optional override#9703
kirangadhave merged 5 commits into
mainfrom
kg/ai-max-tokens-override

Conversation

@kirangadhave
Copy link
Copy Markdown
Member

@kirangadhave kirangadhave commented May 27, 2026

Summary

Drop the blanket DEFAULT_MAX_TOKENS = 4096 so reasoning-heavy models (Kimi K2.6, DeepSeek, gpt-oss, etc.) stop truncating or hard-erroring at 4096.

  • Non-Anthropic providers omit max_tokens from settings entirely when unset, letting pydantic-ai fall through to each provider's own default.
  • Anthropic provider call keeps a floor of 32768 because its API requires the field.
  • Adds an optional ai.max_tokens config field plus a Settings > AI checkbox that, when on, applies one explicit limit to every provider (including Anthropic).
image

Kimi K2.6 working with ~27k chars (roughly 6-7k tokens)
image

After UI override to 100 max_tokens, reasoning errors out at 550 chars
image

Behavior

Override Anthropic Other providers
Off (default) server sends 32768 max_tokens omitted; provider default applies
On, value N N N

Closes MO-6314

@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment May 29, 2026 5:17pm

Request Review

@github-actions github-actions Bot added the bash-focus Area to focus on during release bug bash label May 27, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 12 files

Architecture diagram
sequenceDiagram
    participant UI as Settings UI (ai-config.tsx)
    participant Store as Jotai Store (resolvedMarimoConfigAtom)
    participant CompletionUtils as completion-utils.ts
    participant API as AI Endpoint (ai.py)
    participant Config as ai/config.py
    participant Provider as Provider (providers.py)
    participant Model as pydantic-ai Model

    Note over UI,Model: Per-provider max_tokens defaults with optional override

    UI->>UI: User toggles checkbox or changes input
    UI->>Store: NEW: Update config.ai.max_tokens

    CompletionUtils->>Store: NEW: Read config.ai.max_tokens
    alt Override is set (number)
        CompletionUtils-->>API: NEW: Include maxTokens in request body
    else Override is unset
        CompletionUtils-->>API: NEW: Omit maxTokens from request body
    end

    API->>Config: Calls get_max_tokens(config)
    Config->>Config: Returns None when unconfigured, or configured value

    alt Body includes explicit max_tokens
        API->>API: Use body.max_tokens
    else Body omits max_tokens
        API->>API: Use get_max_tokens(config) result (may be None)
    end

    API->>Provider: stream_completion/stream_text(max_tokens)

    alt Anthropic provider
        Provider->>Provider: create_model(max_tokens)
        alt max_tokens is None
            Provider->>Provider: Apply ANTHROPIC_DEFAULT_MAX_TOKENS (32768)
        else explicit value
            Provider->>Provider: Pass through as-is
        end
        Provider->>Model: AnthropicModel(settings={max_tokens: resolved})
    else OpenAI/Google/Bedrock provider
        Provider->>Provider: create_model(max_tokens)
        alt max_tokens is None
            Provider-->>Model: Omit settings entirely (provider default applies)
        else explicit value
            Provider->>Model: Pass {max_tokens: N} in settings
        end
        Provider->>Model: (settings={max_tokens: N} or empty)
    end

    Model-->>Provider: Returns model instance
    Provider-->>API: Returns streaming response
Loading

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread packages/openapi/api.yaml Outdated
@kirangadhave kirangadhave added the bug Something isn't working label May 27, 2026
@kirangadhave kirangadhave force-pushed the kg/ai-max-tokens-override branch from 42eb503 to e28fe6f Compare May 28, 2026 01:38
@kirangadhave kirangadhave force-pushed the kg/ai-max-tokens-override branch from 02ae3de to 3ab92a1 Compare May 28, 2026 21:03
@kirangadhave kirangadhave marked this pull request as ready for review May 28, 2026 21:04
Copilot AI review requested due to automatic review settings May 28, 2026 21:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the global DEFAULT_MAX_TOKENS=4096 behavior so providers can use their own defaults when max_tokens is unset, while keeping Anthropic’s required max_tokens field via a 32,768 default floor and adding a user-configurable override.

Changes:

  • Backend: get_max_tokens() now returns None when unset; providers omit max_tokens in model settings unless explicitly configured (Anthropic applies a 32,768 default).
  • API/OpenAPI: adds optional maxTokens to completion/chat request schemas and server request models.
  • Config/UI: adds ai.max_tokens to user config schema and a Settings UI control that can set or clear the override (with “None-as-delete” support in config persistence).

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/_server/ai/test_providers.py Adds tests for Anthropic floor behavior and omission/pass-through behavior when max_tokens is unset/set.
tests/_server/ai/test_ai_config.py Updates get_max_tokens tests to expect None when unset.
tests/_config/test_manager.py Adds test coverage for “None-as-delete” config save semantics.
packages/openapi/src/api.ts Regenerated TS OpenAPI types to include optional maxTokens.
packages/openapi/api.yaml Adds maxTokens to completion/chat request schemas.
marimo/_server/models/models.py Documents “None-as-delete” semantics for user config saves.
marimo/_server/models/completion.py Adds max_tokens to completion and chat request structs.
marimo/_server/api/endpoints/ai.py Prefers request max_tokens when provided, otherwise falls back to config.
marimo/_server/ai/providers.py Updates provider model/agent creation to accept optional max_tokens and omit settings when unset (Anthropic uses default floor).
marimo/_server/ai/constants.py Replaces DEFAULT_MAX_TOKENS with ANTHROPIC_DEFAULT_MAX_TOKENS.
marimo/_server/ai/config.py get_max_tokens() now returns `int
marimo/_config/manager.py Implements recursive dropping of None values before writing config to disk.
marimo/_config/config.py Minor signature formatting change.
frontend/src/core/config/config-schema.ts Adds ai.max_tokens as nullable/optional in user config schema.
frontend/src/components/editor/ai/completion-utils.ts Includes maxTokens in completion request bodies when configured.
frontend/src/components/app-config/user-config-form.tsx Adjusts form debounce and onChange submit wiring.
frontend/src/components/app-config/ai-config.tsx Adds UI controls (checkbox + numeric input) to set/clear ai.max_tokens.

Comment thread marimo/_server/api/endpoints/ai.py Outdated
Comment thread frontend/src/components/app-config/user-config-form.tsx Outdated
@kirangadhave kirangadhave marked this pull request as draft May 28, 2026 21:23
@kirangadhave kirangadhave marked this pull request as ready for review May 28, 2026 21:38
@kirangadhave
Copy link
Copy Markdown
Member Author

@cubic-dev-ai

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented May 28, 2026

@cubic-dev-ai

@kirangadhave I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Comment thread frontend/src/components/app-config/ai-config.tsx Outdated
Comment thread frontend/src/components/app-config/ai-config.tsx
Comment thread frontend/src/components/editor/ai/completion-utils.ts Outdated
Comment thread frontend/src/components/editor/ai/completion-utils.ts Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 18 files

Architecture diagram
sequenceDiagram
    participant UI as User Config UI
    participant RHF as React Hook Form
    participant ConfigAPI as Config Save API
    participant ConfigMgr as Config Manager
    participant TOML as TOML Disk Store
    participant CompletionUI as AI Completion UI
    participant ResolvedConfig as Resolved Config Atom
    participant CompletionReq as AI Completion/Chat Request
    participant AIEndpoint as AI API Endpoint (/completion | /chat)
    participant Provider as Provider (Anthropic/OpenAI/etc.)
    participant PydanticAI as Pydantic-AI Model

    Note over UI,TOML: Configuration Save Flow

    UI->>RHF: checkbox toggle (on/off)
    RHF->>RHF: setValue("ai.max_tokens", 32768 or null)
    RHF->>ConfigAPI: saveConfig({ ai: { max_tokens: N | null } })
    ConfigAPI->>ConfigMgr: save_config(partial config)
    ConfigMgr->>ConfigMgr: merge with current, call _drop_none_values()
    alt max_tokens is null
        ConfigMgr->>TOML: write config WITHOUT "max_tokens" key
    else max_tokens is a number
        ConfigMgr->>TOML: write config WITH "max_tokens": N
    end
    TOML-->>ConfigMgr: success
    ConfigMgr-->>ConfigAPI: ok
    ConfigAPI-->>UI: config saved

    Note over CompletionUI,PydanticAI: AI Completion/Chat Request Flow

    CompletionUI->>ResolvedConfig: read resolvedMarimoConfigAtom
    ResolvedConfig-->>CompletionUI: { ai: { max_tokens: N | undefined } }
    CompletionUI->>CompletionReq: POST /api/ai/completion | /chat with maxTokens (if set)
    CompletionReq->>AIEndpoint: request with body.maxTokens or undefined
    AIEndpoint->>AIEndpoint: resolve: body.maxTokens ?? config.max_tokens ?? None
    alt Non-Anthropic provider
        AIEndpoint->>Provider: stream_completion/stream_text(max_tokens=None)
        Provider->>PydanticAI: create_model(max_tokens=None)
        PydanticAI-->>Provider: model with settings omitting max_tokens
        Provider->>PydanticAI: stream using provider default
    else Anthropic provider
        AIEndpoint->>Provider: stream_completion/stream_text(max_tokens=N | 32768)
        Provider->>Provider: if max_tokens is None, use ANTHROPIC_DEFAULT_MAX_TOKENS
        Provider->>PydanticAI: create_model(max_tokens=32768 or N)
        PydanticAI-->>Provider: model with explicit max_tokens
        Provider->>PydanticAI: stream with max_tokens applied
    end
    PydanticAI-->>Provider: stream response
    Provider-->>AIEndpoint: streamed text
    AIEndpoint-->>CompletionReq: streaming response
    CompletionReq-->>CompletionUI: display result

    Note over CompletionReq,AIEndpoint: Error Path - Anthropic Without Override
    alt No override, no config, Anthropic provider
        CompletionReq->>AIEndpoint: body.maxTokens undefined, config max_tokens undefined
        AIEndpoint->>Provider: max_tokens=None
        Provider->>Provider: set to ANTHROPIC_DEFAULT_MAX_TOKENS (32768)
        Provider->>PydanticAI: create_model(max_tokens=32768)
        PydanticAI-->>Provider: model with max_tokens=32768
        Note over Provider: succeeds (required field satisfied)
    end
Loading

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread frontend/src/components/app-config/ai-config.tsx Outdated
Comment thread marimo/_server/models/completion.py Outdated
Comment thread frontend/src/components/editor/ai/completion-utils.ts Outdated
Comment thread frontend/src/components/app-config/ai-config.tsx Outdated
Copy link
Copy Markdown
Collaborator

@dmadisetti dmadisetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments on language. Think it's a little awkward right now.

Looks like a few clean up things snuck into there, but nice!

Comment thread frontend/src/components/app-config/ai-config.tsx Outdated
Comment thread frontend/src/components/app-config/ai-config.tsx Outdated
Comment thread marimo/_config/manager.py
Comment thread frontend/src/components/app-config/ai-config.tsx Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread frontend/src/components/app-config/ai-config.tsx Outdated
Copy link
Copy Markdown
Collaborator

@Light2Dark Light2Dark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a issue this is attached too? I'm wondering if we should avoid adding UI config, just keep it in the toml.

If not, imo the UI is abit verbose. I would just say "Max tokens" and the description about overriding. You should also use the component to check if it has been overrided by the backend.

Comment thread marimo/_config/manager.py
Comment on lines +403 to +407
# None-as-delete: any key whose merged value is None (typically because
# the incoming config explicitly sent null) is removed from disk. Lets
# the UI clear optional scalars (e.g. ai.max_tokens) without a separate
# delete primitive.
_drop_none_values(cast(dict[str, Any], merged))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should note this for bug bash. We had issues in the past where config changes on the frontend didn't save, this whole function is a bit fragile.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add it to notes for next release

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we need more tests around this?

Copy link
Copy Markdown
Collaborator

@Light2Dark Light2Dark May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe..

The previous issues was if you add a custom model, then refresh, it wouldn't be saved.
Same if you ticked certain models in the AI models dropdown, and then refreshed, it would disappear.
Same with deleting a custom model.

I would test ^ during bug bash

Comment thread marimo/_server/ai/constants.py
@kirangadhave
Copy link
Copy Markdown
Member Author

Is there a issue this is attached too? I'm wondering if we should avoid adding UI config, just keep it in the toml.

I feel there is value in UI config. Just so one can override it if they are using local models like Ollama with much lower limits. (i assume we can just add a local provider with url?)

If not, imo the UI is abit verbose. I would just say "Max tokens" and the description about overriding. You should also use the component to check if it has been overrided by the backend.
How would backend overide it? Only source of truth is toml. Only thing backend has is a default value for anthropic provider

Do you suggest a specific copy for the text?

@kirangadhave kirangadhave requested a review from Light2Dark May 29, 2026 05:53
@kirangadhave
Copy link
Copy Markdown
Member Author

kirangadhave commented May 29, 2026

Is there a issue this is attached too?

Only a linear issue. ID mentioned in PR desc

@Light2Dark
Copy link
Copy Markdown
Collaborator

Light2Dark commented May 29, 2026

Do you suggest a specific copy for the text?

Max tokens: [ disabled if override is unticked ] [ ] Override
We recommend leaving this off as each provider applies their own limit, except for Anthropic where we set a default. Use this to control costs or raise limits for heavy-reasoning models.

maybe something like this?

(i assume we can just add a local provider with url?)

yeap, thats possible

Adds an optional ai.max_tokens setting in Settings > AI that overrides every provider's default. Anthropic gets a 32768 floor when unset; sending null removes the key from disk.
Consolidate the two descriptions into a single recommendation per
review feedback.
@kirangadhave kirangadhave force-pushed the kg/ai-max-tokens-override branch from 87e0d04 to 925418c Compare May 29, 2026 06:18
@kirangadhave
Copy link
Copy Markdown
Member Author

Fixed copy:

Screenshot 2026-05-28 at 11 18 20 PM

Reads much better now. Thanks @Light2Dark

@Light2Dark
Copy link
Copy Markdown
Collaborator

Light2Dark commented May 29, 2026

Fixed copy

i added a PR here #9720 for the UI, feel free to use it or leave it.

I had an issue where my config wasn't being saved on refresh. So it keeps showing this
image

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread frontend/src/components/app-config/ai-config.tsx
Track the override-enabled state locally instead of deriving it from the
field value, so clearing the input no longer disables it and unchecks the
Override checkbox mid-edit.
Copy link
Copy Markdown
Collaborator

@Light2Dark Light2Dark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing!

@kirangadhave kirangadhave merged commit 87446a5 into main May 30, 2026
44 checks passed
@kirangadhave kirangadhave deleted the kg/ai-max-tokens-override branch May 30, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bash-focus Area to focus on during release bug bash bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants