fix: use max output size for completion budget by qkunio · Pull Request #318 · MoonshotAI/kimi-code

qkunio · 2026-06-02T07:23:11Z

Related Issue

Resolve #306

Problem

According to the OpenAI Chat Completions API reference, max_tokens is “the maximum number of tokens that can be generated in the chat completion” (source). Kimi Code currently derives this provider request field from max_context_size instead of the configured max_output_size, which can cause OpenAI-compatible providers to reject requests with 400 errors when the context window exceeds the provider's output token limit.

What changed

This change preserves each model alias's configured output limit and uses it as the completion token cap for runtime requests. The model context size still describes the context window, while max_output_size now controls the output budget that is ultimately sent through provider-specific max_tokens handling.

This prevents OpenAI-compatible providers from using max_context_size as the request output limit when max_output_size is available.

Checklist

I have read the CONTRIBUTING document.
I have linked a related issue, or explained the problem above.
I have added tests that prove my feature works.
Ran gen-changesets skill, or this PR needs no changeset.
Ran gen-docs skill, or this PR needs no doc update.

changeset-bot · 2026-06-02T07:23:18Z

🦋 Changeset detected

Latest commit: 7a746a8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages

Name	Type
@moonshot-ai/agent-core	Patch
@moonshot-ai/kimi-code	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

qkunio · 2026-06-02T11:26:58Z

@kermanx could you take a look at this PR when you have time?

This fixes the max_tokens cap issue reported in #306. It seems closely related to the completion-budget propagation work in #267: for OpenAI-compatible providers, the runtime request can end up using max_context_size as the output token cap, while the configured max_output_size is the actual provider output limit.

The PR is focused on using max_output_size as the completion budget cap, and includes a changeset. Please let me know if you prefer a different fix direction.

fix: use max output size for completion budget

7a746a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use max output size for completion budget#318

fix: use max output size for completion budget#318
qkunio wants to merge 1 commit into
MoonshotAI:mainfrom
qkunio:codex/max-output-size-completion-budget

qkunio commented Jun 2, 2026

Uh oh!

changeset-bot Bot commented Jun 2, 2026

Uh oh!

qkunio commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

qkunio commented Jun 2, 2026

Related Issue

Problem

What changed

Checklist

Uh oh!

changeset-bot Bot commented Jun 2, 2026

🦋 Changeset detected

Uh oh!

qkunio commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant