Skip to content

Support max_kv_size configuration in HTTP server#1272

Open
r-bahuguna wants to merge 1 commit into
ml-explore:mainfrom
r-bahuguna:feature/server-max-kv-size
Open

Support max_kv_size configuration in HTTP server#1272
r-bahuguna wants to merge 1 commit into
ml-explore:mainfrom
r-bahuguna:feature/server-max-kv-size

Conversation

@r-bahuguna
Copy link
Copy Markdown

@r-bahuguna r-bahuguna commented May 13, 2026

Adds --max-kv-size as a server startup argument, bringing it to feature parity
with mlx_lm.generate().

Closes #615

Changes

  • Added --max-kv-size CLI argument to mlx_lm.server
  • Passed max_kv_size to make_prompt_cache() for both primary and draft models in _serve_single
  • Passed max_kv_size into BatchGenerator for the batched request path
  • Updated test mock to include the new max_kv_size default

Usage

python -m mlx_lm.server --model <model> --max-kv-size 4096

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add max-kv-size Support to MLX HTTP Server

1 participant