Qwen3-VL-30B-A3B-Instruct在GRPO LoRA训练时，模型保存报错

### Checklist / 检查清单

- [x] I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues，确认这是一个新的 bug report。

### Bug Description / Bug 描述

`[2026-02-13 23:08:04] iteration        2/       2 | consumed samples:          128 | elapsed time per iteration (ms): 36552.1 | memory(GiB): 52.07 | elapsed time: 3m 14s | remaining time: 0s | learning rate: 1.000000E-05 | global batch size:    64 | loss: 0.000000E+00 | reward: -6.945313E-01 | reward_std: 3.756505E-02 | frac_reward_zero_std: 9.375000E-01 | rewards/Reward/mean: -6.945312E-01 | rewards/Reward/std: 6.552404E-01 | clip_ratio/low_mean: 0.000000E+00 | clip_ratio/high_mean: 0.000000E+00 | clip_ratio/region_mean: 0.000000E+00 | completions/mean_length: 7.893281E+02 | completions/max_length: 1.117562E+03 | completions/min_length: 4.672500E+02 | clip_ratio/low_min: 0.000000E+00 | clip_ratio/high_max: 0.000000E+00 | load_balancing_loss: 1.735384E+00 | loss scale: 1.0 | grad norm: 0.064 | number of skipped iterations:   0 | number of nan iterations:   0 |
[after training is done] datetime: 2026-02-13 23:08:04 
saving checkpoint at iteration       2 to qwen3_vl_30b_a3b_instruct_grpo_v3/v8-20260213-225959/checkpoint-2 in torch_dist format
Storing distributed optimizer sharded state of type fully_sharded_model_space
  successfully saved checkpoint from iteration       2 to qwen3_vl_30b_a3b_instruct_grpo_v3/v8-20260213-225959/checkpoint-2 [ t 1/4, p 1/1 ]
[INFO:swift] Successfully saved `safetensors` model weights in `qwen3_vl_30b_a3b_instruct_grpo_v3/v8-20260213-225959/checkpoint-2-merged`.
[INFO:swift] End time of running main: 2026-02-13 23:13:54.162693
[rank4]:[W213 23:14:00.507426805 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank5]:[W213 23:14:00.565532420 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank2]:[W213 23:14:00.114979707 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W213 23:14:00.177886723 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank3]:[W213 23:14:01.336360975 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank1]:[W213 23:14:01.356616125 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
!!!!!!! Segfault encountered !!!!!!!

!!!!!!! Segfault encountered !!!!!!!`

### How to Reproduce / 如何复现

`#!/bin/bash

MEGATRON_LM_PATH= \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
megatron rlhf \
    --rlhf_type grpo \
    --load "${LOAD_PATH}" \
    --dataset "${DATASET_PATH}" \
    --save "${SAVE_PATH}" \
    --load_safetensors false \
    --save_safetensors true \
    --merge_lora true \
    --split_dataset_ratio 0 \
    --moe_permute_fusion true \
    --tensor_model_parallel_size 4 \
    --expert_tensor_parallel_size 1 \
    --expert_model_parallel_size 4 \
    --moe_grouped_gemm true \
    --moe_shared_expert_overlap true \
    --moe_aux_loss_coeff 1e-3 \
    --max_epochs 1 \
    --global_batch_size 64 \
    --micro_batch_size 2 \
    --steps_per_generation 2 \
    --num_generations 8 \
    --external_plugins "reward_func_v3.py" \
    --reward_funcs external_me \
    --use_vllm true \
    --vllm_mode colocate \
    --vllm_gpu_memory_utilization 0.3 \
    --vllm_tensor_parallel_size 4 \
    --vllm_max_model_len 16384 \
    --max_length 8192 \
    --max_completion_length 8192 \
    --train_type lora \
    --lora_rank 128 \
    --lora_alpha 256 \
    --target_modules all-linear \
    --freeze_vit true \
    --lr 5e-5 \
    --lr_warmup_fraction 0.05 \
    --min_lr 1e-5 \
    --bf16 true \
    --save_interval 200 \
    --beta 0.00 \
    --importance_sampling_level sequence \
    --epsilon 3e-4 \
    --epsilon_high 4e-4 \
    --dynamic_sample false \
    --overlong_filter true \
    --loss_type grpo \
    --sleep_level 2 \
    --offload_model true \
    --offload_bridge false \
    --offload_optimizer true \
    --log_interval 1 \
    --recompute_granularity full \
    --recompute_method uniform \
    --recompute_num_layers 1 \
    --finetune true\
    --num_workers 8 \
    --dataset_num_proc 8 \
    --no_save_optim true\
    --no_save_rng true\
    --attention_backend flash \
    --temperature 1.0 \
    --padding_free true \
    --sequence_parallel true \
    --log_completions true \
    --tensorboard_dir "${LOG_PATH}/tensorboard" \
    2>&1 | tee "${LOG_PATH}/training_$(date +%Y%m%d_%H%M%S).log"`

### Additional Information / 补充信息

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3-VL-30B-A3B-Instruct在GRPO LoRA训练时，模型保存报错 #8049

Checklist / 检查清单

Bug Description / Bug 描述

How to Reproduce / 如何复现

Additional Information / 补充信息

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3-VL-30B-A3B-Instruct在GRPO LoRA训练时，模型保存报错 #8049

Description

Checklist / 检查清单

Bug Description / Bug 描述

How to Reproduce / 如何复现

Additional Information / 补充信息

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions