Relax routed_experts capture KV-connector check to a warning for P/D#45419
Draft
S1ro1 wants to merge 1 commit into
Draft
Relax routed_experts capture KV-connector check to a warning for P/D#45419S1ro1 wants to merge 1 commit into
S1ro1 wants to merge 1 commit into
Conversation
This was referenced Jun 12, 2026
--enable-return-routed-experts currently hard-rejects any KV-transfer instance. That blocks P/D disaggregation, where the routing captured on the prefill replica simply needs to be spliced into the decode response by the router/proxy (the decode replica pulls the prompt KV and never forwards the prompt, so its prompt-region rows are invalid). Routers/proxies can and now do perform this merge, so for a P/D setup this is a deployment concern, not a hard error. Replace the ValueError for KV-transfer instances with a warning: 'You are using P/D disaggregation with routed_experts capture, for this to work your router/proxy needs to support it'. The PP>1 incompatibility is unchanged. Note: kv_role alone cannot distinguish P/D transfer from single-instance KV offload (NixlConnector P/D itself runs as kv_both), which is why this is a warning rather than a role-gated error. Signed-off-by: Matej Sirovatka <S1ro1@users.noreply.github.com>
eaed832 to
b32231b
Compare
17 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Relax
routed_expertscapture KV-connector check to a warning for P/DBackground
--enable-return-routed-expertsrecords, per token, which MoE experts it routedto. Today
VllmConfig.__post_init__hard-rejects it whenever a KV connector isconfigured:
Why relax it
For P/D disaggregation this is not fundamentally incompatible. The decode
replica pulls the prompt KV from prefill and never forwards the prompt, so its
prompt-region routing rows are invalid — but the prefill replica returns the
correct prompt-region routing, and a P/D-aware router/proxy can splice it back in:
So whether routed-experts capture works under disaggregation is a property of the
router/proxy, not something vLLM can decide at config time.
Change
Replace the
ValueErrorfor KV-transfer instances with a warning:The PP>1 incompatibility is unchanged.
Note:
kv_rolealone cannot distinguish P/D transfer from single-instance KVoffload —
NixlConnectorP/D itself runs askv_both(seedocs/features/disagg_prefill.md,docs/serving/expert_parallel_deployment.md) —which is why this is a warning rather than a role-gated error.
Router/proxy support
Companion PRs add the prefill→decode
routed_expertsmerge to two routers, eachverified end-to-end against this change on a 2-node Qwen3-30B-A3B P/D deployment
(NIXL + Mooncake), checked against a non-disaggregated oracle under greedy
decoding:
Router PRs: vllm-project/router#184 · llm-d/llm-d-router#1627