[rollout] fix: support SGLang FP8 ignored layers for Qwen3.x GatedDeltaNet in rollout#6905
[rollout] fix: support SGLang FP8 ignored layers for Qwen3.x GatedDeltaNet in rollout#6905gem-mint wants to merge 2 commits into
Conversation
Co-authored-by: OpenAI Codex <codex@openai.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the SGLang FP8 quantization configuration by extracting hardcoded settings into a centralized utility module (sglang_fp8_utils.py). It introduces helper functions to normalize, deduplicate, and match ignored layers from Hugging Face configurations and environment variables, and adds corresponding unit tests. Feedback on the changes highlights that _get_config_value should check for the presence of a .get() method (e.g., hasattr(config, "get")) rather than strictly checking isinstance(config, dict) to ensure compatibility with OmegaConf DictConfig objects commonly used in the codebase.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Co-authored-by: OpenAI Codex <codex@openai.com>
|
This PR has been replaced by #6906 after renaming the source branch. The code diff is unchanged. |
Summary
ignored_layers/modules_to_not_convertrules andSGLANG_FP8_IGNORED_LAYERSin verl-side FP8 weight conversion.Motivation
SGLang 0.5.12 supports FP8
ignored_layers, but verl's SGLang rollout path built its FP8 config inline and the verl-sideSGLangFP8QuantizerHelperdid not honor the same ignore rules beforeupdate_weights.This can break rollout FP8 for models with small projection layers, such as Qwen GatedDeltaNet
linear_attnprojections, where block-wise FP8 expects dimensions divisible by the 128x128 weight block size.In our Qwen3.5/3.6-35B-A3B GRPO rollout, the failure was triggered by GatedDeltaNet
linear_attn.in_proj_ba: after tensor-parallel sharding, the local projection dimension is not divisible by SGLang block-wise FP8'sweight_block_size=[128, 128], so verl-side FP8 conversion must apply the same ignored-layer rules beforeupdate_weights.With this change, users can pass ignored layers through the existing SGLang mechanisms, for example:
Duplicate Check
Before opening this PR, I checked for existing open PRs with:
No duplicate open PRs were found.
Tests
Also smoke-tested on a verl v0.7.1-based Qwen3.5/3.6-35B-A3B GRPO run with SGLang 0.5.12, rollout
quantization=fp8, andSGLANG_FP8_IGNORED_LAYERS=linear_attn.AI Assistance
This PR was prepared with AI assistance. I reviewed the generated changes and validated them on Qwen3.5/3.6-35B-A3B rollout FP8 training run.