'Linear' object has no attribute 'ds_grads_remaining'

### Checklist / 检查清单

- [x] I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues，确认这是一个新的 bug report。

### Bug Description / Bug 描述

训练qwen3-vl的时候，zero1是正常的，zero3会报错：AttributeError: 'Linear' object has no attribute 'ds_grads_remaining'



### How to Reproduce / 如何复现

Docker：modelscope-registry.cn-hangzhou.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.8.1-py311-torch2.9.0-vllm0.13.0-modelscope1.33.0-swift3.12.3

我直接用的ms-swift的docker作为环境，没用pip install改过任何包。

训练代码如下：

```
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
FPS_MAX_FRAMES=160 \
FPS=4 \
    swift sft \
    --model /yke/models/Qwen3-VL-32B-Instruct \
    --train_type lora \
    --dataset train_with_prompt.jsonl \
    --torch_dtype bfloat16 \
    --dataset_num_proc 8 \
    --dataloader_num_workers 4 \
    --dataset_shuffle true \
    --split_dataset_ratio 0.1 \
    --seed 42 \
    --num_train_epochs 3 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --warmup_ratio 0.1 \
    --max_grad_norm 1 \
    --weight_decay 0.01 \
    --lr_scheduler_type cosine \
    --learning_rate 1e-4 \
    --target_modules all-linear \
    --lora_rank 128 \
    --lora_alpha 256 \
    --lora_dropout 0.0 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 5 \
    --logging_steps 5 \
    --max_length 40960 \
    --output_dir output \
    --attn_impl flash_attn \
    --use_liger_kernel false \
    --gradient_checkpointing true \
    --gradient_checkpointing_kwargs '{"use_reentrant": false}' \
    --num_labels 1 \
    --problem_type regression \
    --task_type seq_cls \
    --use_chat_template true \
    --deepspeed zero3
```

### Additional Information / 补充信息

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'Linear' object has no attribute 'ds_grads_remaining' #8037

Checklist / 检查清单

Bug Description / Bug 描述

How to Reproduce / 如何复现

Additional Information / 补充信息

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

'Linear' object has no attribute 'ds_grads_remaining' #8037

Description

Checklist / 检查清单

Bug Description / Bug 描述

How to Reproduce / 如何复现

Additional Information / 补充信息

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions