Skip to content

[rollout] fix: support SGLang FP8 ignored layers for Qwen3.x GatedDeltaNet in rollout#6905

Closed
gem-mint wants to merge 2 commits into
verl-project:mainfrom
gem-mint:codex/sglang-rollout-fp8-ignored-layers
Closed

[rollout] fix: support SGLang FP8 ignored layers for Qwen3.x GatedDeltaNet in rollout#6905
gem-mint wants to merge 2 commits into
verl-project:mainfrom
gem-mint:codex/sglang-rollout-fp8-ignored-layers

Conversation

@gem-mint

@gem-mint gem-mint commented Jul 1, 2026

Copy link
Copy Markdown

Summary

  • Share SGLang rollout FP8 quantization config between server launch and weight sync.
  • Preserve SGLang's ignored_layers / modules_to_not_convert rules and SGLANG_FP8_IGNORED_LAYERS in verl-side FP8 weight conversion.
  • Add unit coverage for default config behavior and ignored-layer matching.

Motivation

SGLang 0.5.12 supports FP8 ignored_layers, but verl's SGLang rollout path built its FP8 config inline and the verl-side SGLangFP8QuantizerHelper did not honor the same ignore rules before update_weights.

This can break rollout FP8 for models with small projection layers, such as Qwen GatedDeltaNet linear_attn projections, where block-wise FP8 expects dimensions divisible by the 128x128 weight block size.

In our Qwen3.5/3.6-35B-A3B GRPO rollout, the failure was triggered by GatedDeltaNet linear_attn.in_proj_ba: after tensor-parallel sharding, the local projection dimension is not divisible by SGLang block-wise FP8's weight_block_size=[128, 128], so verl-side FP8 conversion must apply the same ignored-layer rules before update_weights.

With this change, users can pass ignored layers through the existing SGLang mechanisms, for example:

SGLANG_FP8_IGNORED_LAYERS=linear_attn

Duplicate Check

Before opening this PR, I checked for existing open PRs with:

gh pr list --repo verl-project/verl --state open --search "SGLang FP8 ignored_layers"
gh pr list --repo verl-project/verl --state open --search "rollout fp8 ignored_layers"
gh pr list --repo verl-project/verl --state open --search "SGLang quantization_config ignored_layers"

No duplicate open PRs were found.

Tests

python -m py_compile \
  verl/utils/sglang/sglang_fp8_utils.py \
  verl/workers/rollout/sglang_rollout/async_sglang_server.py \
  verl/workers/rollout/sglang_rollout/sglang_rollout.py \
  tests/utils/test_sglang_fp8_utils.py
git diff --check
PYTHONIOENCODING=utf-8 python tests/special_sanity/validate_structure.py \
  --allow-files tests/test_protocol_on_cpu.py tests/test_base_config_on_cpu.py tests/test_protocol_v2_on_cpu.py
PYTHONIOENCODING=utf-8 python tests/special_sanity/check_license.py \
  --directories verl/utils/sglang verl/workers/rollout/sglang_rollout
PYTHONIOENCODING=utf-8 python tests/special_sanity/check_device_api_usage.py --directory ./verl/utils/sglang
PYTHONIOENCODING=utf-8 python tests/special_sanity/check_device_api_usage.py --directory ./verl/workers/rollout/sglang_rollout

Also smoke-tested on a verl v0.7.1-based Qwen3.5/3.6-35B-A3B GRPO run with SGLang 0.5.12, rollout quantization=fp8, and SGLANG_FP8_IGNORED_LAYERS=linear_attn.

AI Assistance

This PR was prepared with AI assistance. I reviewed the generated changes and validated them on Qwen3.5/3.6-35B-A3B rollout FP8 training run.

Co-authored-by: OpenAI Codex <codex@openai.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the SGLang FP8 quantization configuration by extracting hardcoded settings into a centralized utility module (sglang_fp8_utils.py). It introduces helper functions to normalize, deduplicate, and match ignored layers from Hugging Face configurations and environment variables, and adds corresponding unit tests. Feedback on the changes highlights that _get_config_value should check for the presence of a .get() method (e.g., hasattr(config, "get")) rather than strictly checking isinstance(config, dict) to ensure compatibility with OmegaConf DictConfig objects commonly used in the codebase.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread verl/utils/sglang/sglang_fp8_utils.py Outdated
@gem-mint gem-mint changed the title fix: honor SGLang FP8 ignored layers in rollout [rollout] fix: support SGLang FP8 ignored layers in rollout Jul 1, 2026
@gem-mint gem-mint changed the title [rollout] fix: support SGLang FP8 ignored layers in rollout [rollout] fix: support SGLang FP8 ignored layers for Qwen3.x GatedDeltaNet in rollout Jul 1, 2026
@gem-mint gem-mint marked this pull request as ready for review July 1, 2026 03:43
Co-authored-by: OpenAI Codex <codex@openai.com>
@gem-mint gem-mint closed this Jul 1, 2026
@gem-mint gem-mint deleted the codex/sglang-rollout-fp8-ignored-layers branch July 1, 2026 05:44
@gem-mint

gem-mint commented Jul 1, 2026

Copy link
Copy Markdown
Author

This PR has been replaced by #6906 after renaming the source branch. The code diff is unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant