[bugfix] fix qwen3.5 gpt_bridge lora#28
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the weight loading and exporting logic in gpt_bridge.py to support interleaved layouts for linear attention components, specifically affecting QKV, Z, B, and A projections. The changes include refactoring dimension calculations and updating tensor reshaping and splitting logic. Feedback identifies a critical issue where FP8 scale factors (in_scale_inv) are not interleaved, which will lead to incorrect scaling in Megatron-Core. Additionally, the reviewer suggests adding safeguards against ZeroDivisionError when num_key_heads is zero and handling edge cases in tensor slicing when a_dim is zero to prevent incorrect weight exports.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the weight conversion logic in gpt_bridge.py to support an interleaved layout for linear attention, specifically adjusting how key and value dimensions are handled across multiple heads for both standard and LoRA weights. The review identified a critical NameError where in_scale_inv is referenced at line 1271 despite its definition being removed in the same patch. Additionally, the reviewer noted that scale_inv has become an unused variable following the removal of FP8 scale export logic and suggested using an underscore to ignore the return value.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the weight loading and export logic in _set_linear_attn_state by adjusting the calculation of key_dim and value_dim and updating tensor splitting and reshaping operations to account for the number of key heads. However, the changes remove the handling of scale inversion factors (hf_scale_inv and scale_inv), which are essential for maintaining FP8 quantization support. It is recommended to restore this logic to ensure compatibility with quantized models during both weight loading and export.
No description provided.