Skip to content

support Qwen3.5 FP8#30

Merged
Jintao-Huang merged 15 commits intomodelscope:mainfrom
Jintao-Huang:support_qwen3_5_fp8
Apr 15, 2026
Merged

support Qwen3.5 FP8#30
Jintao-Huang merged 15 commits intomodelscope:mainfrom
Jintao-Huang:support_qwen3_5_fp8

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the GatedDeltaNet module to support the Megatron-Core modular API by adding an explicit init method and importing GatedDeltaNetSubmodules. Feedback suggests addressing potential backward compatibility issues with the new import and handling runtime errors that occur if the base class falls back to object.

Comment thread src/mcore_bridge/model/modules/gated_delta_net.py
Comment thread src/mcore_bridge/model/modules/gated_delta_net.py Outdated
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the GatedDeltaNet implementation by splitting the input projection into two separate modules, in_proj_qkvz and in_proj_ba, and integrates Transformer Engine components like TEColumnParallelLinear and TENorm. It also introduces support for FP8 scale inversion during weight conversion and updates the transformers dependency range. The review feedback identifies several critical issues in the weight conversion logic within gpt_bridge.py, including an AttributeError caused by accessing a deleted attribute, potential KeyErrors from incorrect HuggingFace state dict keys, and a logic error where scale inversion values were overwriting weight tensors instead of being assigned to the correct scale_inv keys.

Comment thread src/mcore_bridge/bridge/gpt_bridge.py Outdated
Comment thread src/mcore_bridge/bridge/gpt_bridge.py Outdated
Comment thread src/mcore_bridge/bridge/gpt_bridge.py Outdated
Comment thread src/mcore_bridge/bridge/gpt_bridge.py Outdated
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for decoupled input projections within the GatedDeltaNet module, enabling separate linear layers for QKVZ and BA components. The changes include updates to the model configuration, state dictionary conversion logic in the bridge to handle both standard and LoRA weights, and specific layer specifications for Qwen 3.5 GDN. Critical feedback identifies a potential KeyError in the bridge due to redundant prefix handling and a NameError in the GatedDeltaNet forward pass where a variable is accessed outside its conditional definition scope.

Comment thread src/mcore_bridge/bridge/gpt_bridge.py Outdated
Comment thread src/mcore_bridge/model/modules/gated_delta_net.py
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for decoupled input projections in the GatedDeltaNet architecture, specifically for Qwen 3.5 models, including configuration updates and weight conversion logic for LoRA and FP8. Review feedback highlights a potential ImportError due to a top-level import of the optional transformer_engine library and a possible regression in word_embeddings export logic that may affect various models. Additionally, suggestions were provided to replace hardcoded CUDA device references with portable device selection logic to support non-GPU environments.

Comment thread src/mcore_bridge/model/modules/gated_delta_net.py
Comment thread src/mcore_bridge/bridge/gpt_bridge.py
Comment thread src/mcore_bridge/bridge/gpt_bridge.py
Comment thread src/mcore_bridge/bridge/gpt_bridge.py
@Jintao-Huang Jintao-Huang merged commit 4a8d965 into modelscope:main Apr 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants