feat: add multi-prefill-pool support for modality-based routing by linzebing · Pull Request #103 · vllm-project/router

linzebing · 2026-03-05T21:55:54Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Enable independent scaling of text and perception (multimodal) prefill pods while sharing a single decode pool. Requests are auto-routed to the correct prefill pool based on modality detection (image_url content parts → perception, otherwise → text).

Key changes:

Add detect_prefill_pool() / detect_prefill_pool_from_json() in spec.rs
Add parse_prefill_selectors() for named pool CLI selectors (e.g. --prefill-selector=text:app=text-prefill)
Change DiscoveryConfig.prefill_selector to prefill_selectors map
Add WorkerRegistry.get_prefill_workers_by_pool() with label filtering
Update K8s service discovery PodType::Prefill to carry pool name
Add pool-aware routing in VllmPDRouter (route_chat, route_completion, route_transparent)
Add prefill_pool_routed_total Prometheus metric with pool label
Add 33 unit tests covering all new functionality
Backward compatible: no pool prefix defaults to "default" pool

Test Plan

Tested by deploying multi-prefill on k8s.

Test Result

Both text and perception evals passed, and we verified routing worked correctly.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results

Enable independent scaling of text and perception (multimodal) prefill pods while sharing a single decode pool. Requests are auto-routed to the correct prefill pool based on modality detection (image_url content parts → perception, otherwise → text). Key changes: - Add detect_prefill_pool() / detect_prefill_pool_from_json() in spec.rs - Add parse_prefill_selectors() for named pool CLI selectors (e.g. --prefill-selector=text:app=text-prefill) - Change DiscoveryConfig.prefill_selector to prefill_selectors map - Add WorkerRegistry.get_prefill_workers_by_pool() with label filtering - Update K8s service discovery PodType::Prefill to carry pool name - Add pool-aware routing in VllmPDRouter (route_chat, route_completion, route_transparent) - Add prefill_pool_routed_total Prometheus metric with pool label - Add 33 unit tests covering all new functionality - Backward compatible: no pool prefix defaults to "default" pool Signed-off-by: linzebing <linzebing1995@gmail.com>

linzebing force-pushed the feat/multi-prefill-pool branch from f02ac69 to 63f9942 Compare March 8, 2026 02:32

linzebing marked this pull request as ready for review March 8, 2026 02:33

linzebing force-pushed the feat/multi-prefill-pool branch 3 times, most recently from 0812ce9 to 65623d0 Compare March 8, 2026 20:56

linzebing force-pushed the feat/multi-prefill-pool branch from 65623d0 to 7c5fc18 Compare March 8, 2026 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add multi-prefill-pool support for modality-based routing#103

feat: add multi-prefill-pool support for modality-based routing#103
linzebing wants to merge 1 commit into
vllm-project:mainfrom
linzebing:feat/multi-prefill-pool

linzebing commented Mar 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

linzebing commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

linzebing commented Mar 5, 2026 •

edited

Loading