Skip to content

Relax routed_experts capture KV-connector check to a warning for P/D#45419

Draft
S1ro1 wants to merge 1 commit into
vllm-project:mainfrom
S1ro1:feat/relax-routed-experts-kv-check
Draft

Relax routed_experts capture KV-connector check to a warning for P/D#45419
S1ro1 wants to merge 1 commit into
vllm-project:mainfrom
S1ro1:feat/relax-routed-experts-kv-check

Conversation

@S1ro1

@S1ro1 S1ro1 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Relax routed_experts capture KV-connector check to a warning for P/D

Background

--enable-return-routed-experts records, per token, which MoE experts it routed
to. Today VllmConfig.__post_init__ hard-rejects it whenever a KV connector is
configured:

if self.kv_transfer_config is not None and self.kv_transfer_config.is_kv_transfer_instance:
    raise ValueError("--enable-return-routed-experts is incompatible with KV "
                     "connectors (PD disaggregation, KV cache offload).")

Why relax it

For P/D disaggregation this is not fundamentally incompatible. The decode
replica pulls the prompt KV from prefill and never forwards the prompt, so its
prompt-region routing rows are invalid — but the prefill replica returns the
correct prompt-region routing, and a P/D-aware router/proxy can splice it back in:

merged = concat( prefill_rows[:Lp], decode_rows[Lp:] )

So whether routed-experts capture works under disaggregation is a property of the
router/proxy, not something vLLM can decide at config time.

Change

Replace the ValueError for KV-transfer instances with a warning:

You are using P/D disaggregation with routed_experts capture, for this to work
your router/proxy needs to support it

The PP>1 incompatibility is unchanged.

Note: kv_role alone cannot distinguish P/D transfer from single-instance KV
offload — NixlConnector P/D itself runs as kv_both (see
docs/features/disagg_prefill.md, docs/serving/expert_parallel_deployment.md) —
which is why this is a warning rather than a role-gated error.

Router/proxy support

Companion PRs add the prefill→decode routed_experts merge to two routers, each
verified end-to-end against this change on a 2-node Qwen3-30B-A3B P/D deployment
(NIXL + Mooncake), checked against a non-disaggregated oracle under greedy
decoding:

  • vllm-project/router
  • llm-d/llm-d-router

Router PRs: vllm-project/router#184 · llm-d/llm-d-router#1627

--enable-return-routed-experts currently hard-rejects any KV-transfer instance.
That blocks P/D disaggregation, where the routing captured on the prefill
replica simply needs to be spliced into the decode response by the router/proxy
(the decode replica pulls the prompt KV and never forwards the prompt, so its
prompt-region rows are invalid). Routers/proxies can and now do perform this
merge, so for a P/D setup this is a deployment concern, not a hard error.

Replace the ValueError for KV-transfer instances with a warning:
'You are using P/D disaggregation with routed_experts capture, for this to work
your router/proxy needs to support it'. The PP>1 incompatibility is unchanged.

Note: kv_role alone cannot distinguish P/D transfer from single-instance KV
offload (NixlConnector P/D itself runs as kv_both), which is why this is a
warning rather than a role-gated error.

Signed-off-by: Matej Sirovatka <S1ro1@users.noreply.github.com>
@S1ro1 S1ro1 force-pushed the feat/relax-routed-experts-kv-check branch from eaed832 to b32231b Compare June 12, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant