Skip to content

'Linear' object has no attribute 'ds_grads_remaining' #8037

@vibe-viscot

Description

@vibe-viscot

Checklist / 检查清单

  • I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues,确认这是一个新的 bug report。

Bug Description / Bug 描述

训练qwen3-vl的时候,zero1是正常的,zero3会报错:AttributeError: 'Linear' object has no attribute 'ds_grads_remaining'

How to Reproduce / 如何复现

Docker:modelscope-registry.cn-hangzhou.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.8.1-py311-torch2.9.0-vllm0.13.0-modelscope1.33.0-swift3.12.3

我直接用的ms-swift的docker作为环境,没用pip install改过任何包。

训练代码如下:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
FPS_MAX_FRAMES=160 \
FPS=4 \
    swift sft \
    --model /yke/models/Qwen3-VL-32B-Instruct \
    --train_type lora \
    --dataset train_with_prompt.jsonl \
    --torch_dtype bfloat16 \
    --dataset_num_proc 8 \
    --dataloader_num_workers 4 \
    --dataset_shuffle true \
    --split_dataset_ratio 0.1 \
    --seed 42 \
    --num_train_epochs 3 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --warmup_ratio 0.1 \
    --max_grad_norm 1 \
    --weight_decay 0.01 \
    --lr_scheduler_type cosine \
    --learning_rate 1e-4 \
    --target_modules all-linear \
    --lora_rank 128 \
    --lora_alpha 256 \
    --lora_dropout 0.0 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 5 \
    --logging_steps 5 \
    --max_length 40960 \
    --output_dir output \
    --attn_impl flash_attn \
    --use_liger_kernel false \
    --gradient_checkpointing true \
    --gradient_checkpointing_kwargs '{"use_reentrant": false}' \
    --num_labels 1 \
    --problem_type regression \
    --task_type seq_cls \
    --use_chat_template true \
    --deepspeed zero3

Additional Information / 补充信息

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions