Stream router logits to disk by fisherxue · Pull Request #2 · fisherxue/vllm

fisherxue · 2026-05-08T23:40:36Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

- Add set_logits_capture_fn() to FusedMoERouter ABC and BaseRouter - Call _logits_capture_fn(router_logits) in select_experts() before top-k/softmax - Add logits_buffer to _RoutedExpertsDeviceCache (L, N, E) float16 - Add _RoutedExpertsDiskCache: per-request temp mmap files with counter-based filenames, compacted to final .npy on completion - Extend _RoutedExpertsCapturerReal with logits staging buffers, D2H on same async stream, and disk scatter - Wire logits capture callback in bind_routing_capture_to_model() - Extend extract/free functions for logits paths - Thread num_experts and router_logits_output_dir through factory

- Add router_logits_paths to ModelRunnerOutput - Add router_logits_path to EngineCoreOutput - Thread through scheduler update_from_output() handoff - Thread through output_processor to RequestOutput - Add router_logits_path to RequestOutput - Config plumbing: enable_return_router_logits + router_logits_output_dir in ModelConfig, EngineArgs, LLM(), VllmConfig validation - gpu_model_runner: pass num_experts and output_dir to capturer init, unpack logits paths from extract function - Add _RoutedExpertsDiskCache unit tests (write, finalize, free, path safety)

aggregation, and filename collisions 1. extract_routed_experts_for_current_batch: early exits now return (None, None) instead of bare None 2. _RoutedExpertsDiskCache.finalize: chunked mmap-to-mmap copy instead of materializing the full tensor in host memory 3. RequestOutput.add: propagate router_logits_path during streaming/cumulative output aggregation 4. Disk cache filenames include PID prefix to avoid collisions across process restarts

aggregation, and PID filenames - test_extract_returns_tuple_when_capturer_disabled/is_none: verify (None, None) tuple return on disabled paths - test_finalize_chunked_copy_does_not_load_full_tensor: verify sparse writes + zero-fill + correct final shape - test_pid_prefix_in_filenames: verify PID in output names - test_request_output_add_propagates_router_logits_path: verify streaming aggregation carries the path forward - test_request_output_add_does_not_overwrite_with_none: verify existing path is preserved when next output has none

fisherxue added 4 commits May 8, 2026 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream router logits to disk#2

Stream router logits to disk#2
fisherxue wants to merge 4 commits into
mainfrom
stream-router-logits-to-disk

fisherxue commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fisherxue commented May 8, 2026

Purpose

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant