bugfix: use max_concurrent_requests for single block and linear state allocation. by pjgao · Pull Request #1496 · jd-opensource/xllm

pjgao · 2026-05-20T08:17:46Z

Summary

Cherry-pick two bugfixes from preview/qwen3.5-qwen3.6 branch to main.

Cherry-picked Commits

9b64c30 bugfix: change single block manager blocks to max concurrent requests. (#1413)
- Original Author: Yingxu Deng dengyingxu1@jd.com
- Original Commit: 005adbd1
- File: xllm/core/framework/block/block_manager_pool.cpp
- Change: SingleBlockManager num_blocks from max_seqs_per_batch + 2 → max_concurrent_requests + 2
b2900e7 bugfix: fix high concurrency linear state overflow (#1422)
- Original Author: Joey Gao 1783198484@qq.com
- Original Commit: c7ec380f
- File: xllm/core/distributed_runtime/llm_engine.cpp
- Change: num_linear_state_blocks from max_seqs_per_batch + 2 → max_concurrent_requests + 2, update error messages

Root Cause

Both bugs use max_seqs_per_batch as the allocation basis for shared single-block and linear-state resources. When max_concurrent_requests > max_seqs_per_batch, reserved block count is insufficient, causing resource exhaustion and overflow errors under high concurrency.

Note

Original c7ec380f also changed vlm_engine.cpp, but VLMEngine on main has been refactored and no longer contains linear-state reservation logic, so that part was dropped.
Original authorship preserved via git cherry-pick (Author field unchanged).

gemini-code-assist

Code Review

This pull request replaces the use of max_seqs_per_batch from the SchedulerConfig with a global flag FLAGS_max_concurrent_requests for KV cache capacity estimation and block manager allocation. Review feedback highlights that direct access to global flags violates Rule 10 of the repository style guide, which prefers configuration structs or constructor parameters over global state. It is recommended to add max_concurrent_requests to the SchedulerConfig or pass it via the Options struct to maintain encapsulation and follow established architectural patterns.

gemini-code-assist · 2026-05-20T08:21:02Z


-  kv_cache_cap.num_linear_state_blocks() =
-      ::xllm::SchedulerConfig::get_instance().max_seqs_per_batch() + 2;
+  kv_cache_cap.num_linear_state_blocks() = FLAGS_max_concurrent_requests + 2;


Directly accessing FLAGS_max_concurrent_requests violates the repository style guide (Rule 10), which prefers using configuration structs over global flags. Since max_seqs_per_batch is already managed via SchedulerConfig, max_concurrent_requests should also be added to SchedulerConfig and accessed via ::xllm::SchedulerConfig::get_instance().max_concurrent_requests(). This ensures consistency, centralizes configuration, and allows for proper validation during initialization. Additionally, using std::max(FLAGS_max_concurrent_requests, ::xllm::SchedulerConfig::get_instance().max_seqs_per_batch()) would be more robust to ensure sufficient blocks are reserved even if max_concurrent_requests is misconfigured to a value smaller than the batch size.

References

Do not overuse FLAGS_ global variables. Prefer passing configuration through constructor parameters or config structs. Only use global flags for top-level, process-wide settings. ^(link)

gemini-code-assist · 2026-05-20T08:21:02Z

-        /*num_blocks=*/::xllm::SchedulerConfig::get_instance()
-                .max_seqs_per_batch() +
-            2,
+        /*num_blocks=*/FLAGS_max_concurrent_requests + 2,


Directly accessing FLAGS_max_concurrent_requests here violates the style guide (Rule 10) and breaks encapsulation. This value should be passed through the BlockManagerPool::Options struct, following the pattern used for other configuration parameters like num_blocks and block_size. This makes the class easier to test and decouples it from global state.

References

Do not overuse FLAGS_ global variables. Prefer passing configuration through constructor parameters or config structs. ^(link)

yingxudeng · 2026-05-20T09:16:04Z

        "SCHEDULER OPTIONS",
        {"max_tokens_per_batch",
         "max_seqs_per_batch",
+         "max_concurrent_requests",


主分支重构后，这个参数丢失了嘛？

是的，看着像是#1430 这个PR重构的

yingxudeng · 2026-05-20T16:40:25Z


  kv_cache_cap.num_linear_state_blocks() =
-      ::xllm::SchedulerConfig::get_instance().max_seqs_per_batch() + 2;
+      ::xllm::SchedulerConfig::get_instance().max_concurrent_requests() + 2;


vlm_engine.cpp 需要类似修改吗

Ah right, the qwen3.5 VLM code hasn't been merged yet. Never mind — let's revisit this after the VLM PR lands.

jd-opensource#1413) Co-authored-by: kangmeng3 <kangmeng3@jd.com>

Co-authored-by: pjgao <gaopengju3@huawei.com>

pjgao requested review from Clement-Wang26, DongheJin, DragonFive, JimHsiung, Kang-Meng, RobbieLeung, XuZhang99, liujinguang0125, liutongxuan, walsonyang, xiao-yu-chen, yingxudeng, yq33victor and zhang-minchao as code owners May 20, 2026 08:17

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

yingxudeng reviewed May 20, 2026

View reviewed changes

yingxudeng changed the title ~~bugfix: use max_concurrent_requests for single block and linear state allocation~~ bugfix: use max_concurrent_requests for single block and linear state allocation. May 20, 2026

yingxudeng reviewed May 20, 2026

View reviewed changes

pjgao force-pushed the cherry-pick/concurrent-linear-bugfix branch from 404ba50 to 2989452 Compare May 21, 2026 08:54

yingxudeng and others added 2 commits May 21, 2026 19:16

bugfix: change single block manager blocks to max concurrent requests. (

c62a89d

jd-opensource#1413) Co-authored-by: kangmeng3 <kangmeng3@jd.com>

bugfix: fix high concurrency linear state overflow (jd-opensource#1422)

446c057

Co-authored-by: pjgao <gaopengju3@huawei.com>

pjgao force-pushed the cherry-pick/concurrent-linear-bugfix branch from 2989452 to 446c057 Compare May 21, 2026 11:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: use max_concurrent_requests for single block and linear state allocation.#1496

bugfix: use max_concurrent_requests for single block and linear state allocation.#1496
pjgao wants to merge 2 commits into
jd-opensource:mainfrom
pjgao:cherry-pick/concurrent-linear-bugfix

pjgao commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

yingxudeng May 20, 2026

Uh oh!

pjgao May 20, 2026

Uh oh!

yingxudeng May 20, 2026

Uh oh!

yingxudeng May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pjgao commented May 20, 2026

Summary

Cherry-picked Commits

Root Cause

Note

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

yingxudeng May 20, 2026

Choose a reason for hiding this comment

Uh oh!

pjgao May 20, 2026

Choose a reason for hiding this comment

Uh oh!

yingxudeng May 20, 2026

Choose a reason for hiding this comment

Uh oh!

yingxudeng May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants