Skip to content

fix: use max output size for completion budget#318

Open
qkunio wants to merge 1 commit into
MoonshotAI:mainfrom
qkunio:codex/max-output-size-completion-budget
Open

fix: use max output size for completion budget#318
qkunio wants to merge 1 commit into
MoonshotAI:mainfrom
qkunio:codex/max-output-size-completion-budget

Conversation

@qkunio
Copy link
Copy Markdown

@qkunio qkunio commented Jun 2, 2026

Related Issue

Resolve #306

Problem

According to the OpenAI Chat Completions API reference, max_tokens is “the maximum number of tokens that can be generated in the chat completion” (source). Kimi Code currently derives this provider request field from max_context_size instead of the configured max_output_size, which can cause OpenAI-compatible providers to reject requests with 400 errors when the context window exceeds the provider's output token limit.

What changed

This change preserves each model alias's configured output limit and uses it as the completion token cap for runtime requests. The model context size still describes the context window, while max_output_size now controls the output budget that is ultimately sent through provider-specific max_tokens handling.

This prevents OpenAI-compatible providers from using max_context_size as the request output limit when max_output_size is available.

Checklist

  • I have read the CONTRIBUTING document.
  • I have linked a related issue, or explained the problem above.
  • I have added tests that prove my feature works.
  • Ran gen-changesets skill, or this PR needs no changeset.
  • Ran gen-docs skill, or this PR needs no doc update.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 2, 2026

🦋 Changeset detected

Latest commit: 7a746a8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
@moonshot-ai/agent-core Patch
@moonshot-ai/kimi-code Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@qkunio
Copy link
Copy Markdown
Author

qkunio commented Jun 2, 2026

@kermanx could you take a look at this PR when you have time?

This fixes the max_tokens cap issue reported in #306. It seems closely related to the completion-budget propagation work in #267: for OpenAI-compatible providers, the runtime request can end up using max_context_size as the output token cap, while the configured max_output_size is the actual provider output limit.

The PR is focused on using max_output_size as the completion budget cap, and includes a changeset. Please let me know if you prefer a different fix direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

配置 DeepSeek 模型的 max_output_size 后报错 400 Invalid max_tokens value

1 participant