Stream router logits to disk#2
Open
fisherxue wants to merge 4 commits into
Open
Conversation
- Add set_logits_capture_fn() to FusedMoERouter ABC and BaseRouter - Call _logits_capture_fn(router_logits) in select_experts() before top-k/softmax - Add logits_buffer to _RoutedExpertsDeviceCache (L, N, E) float16 - Add _RoutedExpertsDiskCache: per-request temp mmap files with counter-based filenames, compacted to final .npy on completion - Extend _RoutedExpertsCapturerReal with logits staging buffers, D2H on same async stream, and disk scatter - Wire logits capture callback in bind_routing_capture_to_model() - Extend extract/free functions for logits paths - Thread num_experts and router_logits_output_dir through factory
- Add router_logits_paths to ModelRunnerOutput - Add router_logits_path to EngineCoreOutput - Thread through scheduler update_from_output() handoff - Thread through output_processor to RequestOutput - Add router_logits_path to RequestOutput - Config plumbing: enable_return_router_logits + router_logits_output_dir in ModelConfig, EngineArgs, LLM(), VllmConfig validation - gpu_model_runner: pass num_experts and output_dir to capturer init, unpack logits paths from extract function - Add _RoutedExpertsDiskCache unit tests (write, finalize, free, path safety)
aggregation, and filename collisions 1. extract_routed_experts_for_current_batch: early exits now return (None, None) instead of bare None 2. _RoutedExpertsDiskCache.finalize: chunked mmap-to-mmap copy instead of materializing the full tensor in host memory 3. RequestOutput.add: propagate router_logits_path during streaming/cumulative output aggregation 4. Disk cache filenames include PID prefix to avoid collisions across process restarts
aggregation, and PID filenames - test_extract_returns_tuple_when_capturer_disabled/is_none: verify (None, None) tuple return on disabled paths - test_finalize_chunked_copy_does_not_load_full_tensor: verify sparse writes + zero-fill + correct final shape - test_pid_prefix_in_filenames: verify PID in output names - test_request_output_add_propagates_router_logits_path: verify streaming aggregation carries the path forward - test_request_output_add_does_not_overwrite_with_none: verify existing path is preserved when next output has none
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)