feat: add multi-prefill-pool support for modality-based routing#103
Open
linzebing wants to merge 1 commit into
Open
feat: add multi-prefill-pool support for modality-based routing#103linzebing wants to merge 1 commit into
linzebing wants to merge 1 commit into
Conversation
f02ac69 to
63f9942
Compare
0812ce9 to
65623d0
Compare
Enable independent scaling of text and perception (multimodal) prefill pods while sharing a single decode pool. Requests are auto-routed to the correct prefill pool based on modality detection (image_url content parts → perception, otherwise → text). Key changes: - Add detect_prefill_pool() / detect_prefill_pool_from_json() in spec.rs - Add parse_prefill_selectors() for named pool CLI selectors (e.g. --prefill-selector=text:app=text-prefill) - Change DiscoveryConfig.prefill_selector to prefill_selectors map - Add WorkerRegistry.get_prefill_workers_by_pool() with label filtering - Update K8s service discovery PodType::Prefill to carry pool name - Add pool-aware routing in VllmPDRouter (route_chat, route_completion, route_transparent) - Add prefill_pool_routed_total Prometheus metric with pool label - Add 33 unit tests covering all new functionality - Backward compatible: no pool prefix defaults to "default" pool Signed-off-by: linzebing <linzebing1995@gmail.com>
65623d0 to
7c5fc18
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Enable independent scaling of text and perception (multimodal) prefill pods while sharing a single decode pool. Requests are auto-routed to the correct prefill pool based on modality detection (image_url content parts → perception, otherwise → text).
Key changes:
Test Plan
Tested by deploying multi-prefill on k8s.
Test Result
Both text and perception evals passed, and we verified routing worked correctly.
Essential Elements of an Effective PR Description Checklist