Skip to content

feat: add multi-prefill-pool support for modality-based routing#103

Open
linzebing wants to merge 1 commit into
vllm-project:mainfrom
linzebing:feat/multi-prefill-pool
Open

feat: add multi-prefill-pool support for modality-based routing#103
linzebing wants to merge 1 commit into
vllm-project:mainfrom
linzebing:feat/multi-prefill-pool

Conversation

@linzebing

@linzebing linzebing commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Enable independent scaling of text and perception (multimodal) prefill pods while sharing a single decode pool. Requests are auto-routed to the correct prefill pool based on modality detection (image_url content parts → perception, otherwise → text).

Key changes:

  • Add detect_prefill_pool() / detect_prefill_pool_from_json() in spec.rs
  • Add parse_prefill_selectors() for named pool CLI selectors (e.g. --prefill-selector=text:app=text-prefill)
  • Change DiscoveryConfig.prefill_selector to prefill_selectors map
  • Add WorkerRegistry.get_prefill_workers_by_pool() with label filtering
  • Update K8s service discovery PodType::Prefill to carry pool name
  • Add pool-aware routing in VllmPDRouter (route_chat, route_completion, route_transparent)
  • Add prefill_pool_routed_total Prometheus metric with pool label
  • Add 33 unit tests covering all new functionality
  • Backward compatible: no pool prefix defaults to "default" pool

Test Plan

Tested by deploying multi-prefill on k8s.

Test Result

Both text and perception evals passed, and we verified routing worked correctly.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results

@linzebing linzebing force-pushed the feat/multi-prefill-pool branch from f02ac69 to 63f9942 Compare March 8, 2026 02:32
@linzebing linzebing marked this pull request as ready for review March 8, 2026 02:33
@linzebing linzebing force-pushed the feat/multi-prefill-pool branch 3 times, most recently from 0812ce9 to 65623d0 Compare March 8, 2026 20:56
Enable independent scaling of text and perception (multimodal) prefill
pods while sharing a single decode pool. Requests are auto-routed to the
correct prefill pool based on modality detection (image_url content
parts → perception, otherwise → text).

Key changes:
- Add detect_prefill_pool() / detect_prefill_pool_from_json() in spec.rs
- Add parse_prefill_selectors() for named pool CLI selectors
  (e.g. --prefill-selector=text:app=text-prefill)
- Change DiscoveryConfig.prefill_selector to prefill_selectors map
- Add WorkerRegistry.get_prefill_workers_by_pool() with label filtering
- Update K8s service discovery PodType::Prefill to carry pool name
- Add pool-aware routing in VllmPDRouter (route_chat, route_completion,
  route_transparent)
- Add prefill_pool_routed_total Prometheus metric with pool label
- Add 33 unit tests covering all new functionality
- Backward compatible: no pool prefix defaults to "default" pool

Signed-off-by: linzebing <linzebing1995@gmail.com>
@linzebing linzebing force-pushed the feat/multi-prefill-pool branch from 65623d0 to 7c5fc18 Compare March 8, 2026 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant