SFT 和 DPO 两阶段Lora训练，如何训练同一个 adapter

### Checklist / 检查清单

- [x] I have searched existing issues, and this is a new question or discussion topic. / 我已经搜索过现有的 issues，确认这是一个新的问题与讨论。

### Question Description / 问题描述

swift rlhf \
   --rlhf_type dpo \
   --train_type lora \
   --model /cpfs_fundata/baolujia.blj/models/Qwen3-8B \
    --resume_from_checkpoint v0-20260212-195919/checkpoint-4578 \
    --resume_only_model \
        --ignore_data_skip \

和  直接在合并之后的模型上 lora
swift rlhf \
   --rlhf_type dpo \
   --train_type lora \
   --model  v0-20260212-195919/checkpoint-4578-merged \
    --resume_only_model \
        --ignore_data_skip \

loss完全不一样

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SFT 和 DPO 两阶段Lora训练，如何训练同一个 adapter #8047

Checklist / 检查清单

Question Description / 问题描述

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SFT 和 DPO 两阶段Lora训练，如何训练同一个 adapter #8047

Description

Checklist / 检查清单

Question Description / 问题描述

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions