Support max_kv_size configuration in HTTP server by r-bahuguna · Pull Request #1272 · ml-explore/mlx-lm

r-bahuguna · 2026-05-13T13:31:29Z

Adds --max-kv-size as a server startup argument, bringing it to feature parity
with mlx_lm.generate().

Closes #615

Changes

Added --max-kv-size CLI argument to mlx_lm.server
Passed max_kv_size to make_prompt_cache() for both primary and draft models in _serve_single
Passed max_kv_size into BatchGenerator for the batched request path
Updated test mock to include the new max_kv_size default

python -m mlx_lm.server --model <model> --max-kv-size 4096

Support max_kv_size configuration in HTTP server

4dbca07