Relax routed_experts capture KV-connector check to a warning for P/D by S1ro1 · Pull Request #45419 · vllm-project/vllm

S1ro1 · 2026-06-12T15:20:23Z

Relax `routed_experts` capture KV-connector check to a warning for P/D

Background

--enable-return-routed-experts records, per token, which MoE experts it routed
to. Today VllmConfig.__post_init__ hard-rejects it whenever a KV connector is
configured:

if self.kv_transfer_config is not None and self.kv_transfer_config.is_kv_transfer_instance:
    raise ValueError("--enable-return-routed-experts is incompatible with KV "
                     "connectors (PD disaggregation, KV cache offload).")

Why relax it

For P/D disaggregation this is not fundamentally incompatible. The decode
replica pulls the prompt KV from prefill and never forwards the prompt, so its
prompt-region routing rows are invalid — but the prefill replica returns the
correct prompt-region routing, and a P/D-aware router/proxy can splice it back in:

merged = concat( prefill_rows[:Lp], decode_rows[Lp:] )

So whether routed-experts capture works under disaggregation is a property of the
router/proxy, not something vLLM can decide at config time.

Change

Replace the ValueError for KV-transfer instances with a warning:

You are using P/D disaggregation with routed_experts capture, for this to work
your router/proxy needs to support it

The PP>1 incompatibility is unchanged.

Note: kv_role alone cannot distinguish P/D transfer from single-instance KV
offload — NixlConnector P/D itself runs as kv_both (see
docs/features/disagg_prefill.md, docs/serving/expert_parallel_deployment.md) —
which is why this is a warning rather than a role-gated error.

Router/proxy support

Companion PRs add the prefill→decode routed_experts merge to two routers, each
verified end-to-end against this change on a 2-node Qwen3-30B-A3B P/D deployment
(NIXL + Mooncake), checked against a non-disaggregated oracle under greedy
decoding:

vllm-project/router
llm-d/llm-d-router

Router PRs: vllm-project/router#184 · llm-d/llm-d-router#1627

--enable-return-routed-experts currently hard-rejects any KV-transfer instance. That blocks P/D disaggregation, where the routing captured on the prefill replica simply needs to be spliced into the decode response by the router/proxy (the decode replica pulls the prompt KV and never forwards the prompt, so its prompt-region rows are invalid). Routers/proxies can and now do perform this merge, so for a P/D setup this is a deployment concern, not a hard error. Replace the ValueError for KV-transfer instances with a warning: 'You are using P/D disaggregation with routed_experts capture, for this to work your router/proxy needs to support it'. The PP>1 incompatibility is unchanged. Note: kv_role alone cannot distinguish P/D transfer from single-instance KV offload (NixlConnector P/D itself runs as kv_both), which is why this is a warning rather than a role-gated error. Signed-off-by: Matej Sirovatka <S1ro1@users.noreply.github.com>

This was referenced Jun 12, 2026

Merge routed_experts across prefill/decode in P/D disaggregation vllm-project/router#184

Draft

Merge routed_experts across prefill/decode in the P/D sidecar llm-d/llm-d-router#1627

Open

mergify Bot added the kv-connector label Jun 12, 2026

S1ro1 force-pushed the feat/relax-routed-experts-kv-check branch from eaed832 to b32231b Compare June 12, 2026 15:59

aoshen02 mentioned this pull request Jun 18, 2026

[Roadmap] 2026 Q2 vLLM × RL Roadmap #41733

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Relax routed_experts capture KV-connector check to a warning for P/D#45419

Relax routed_experts capture KV-connector check to a warning for P/D#45419
S1ro1 wants to merge 1 commit into
vllm-project:mainfrom
S1ro1:feat/relax-routed-experts-kv-check

S1ro1 commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

S1ro1 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!