Skip to content

SFT 和 DPO 两阶段Lora训练,如何训练同一个 adapter #8047

@wellhowtosay

Description

@wellhowtosay

Checklist / 检查清单

  • I have searched existing issues, and this is a new question or discussion topic. / 我已经搜索过现有的 issues,确认这是一个新的问题与讨论。

Question Description / 问题描述

swift rlhf
--rlhf_type dpo
--train_type lora
--model /cpfs_fundata/baolujia.blj/models/Qwen3-8B
--resume_from_checkpoint v0-20260212-195919/checkpoint-4578
--resume_only_model
--ignore_data_skip \

和 直接在合并之后的模型上 lora
swift rlhf
--rlhf_type dpo
--train_type lora
--model v0-20260212-195919/checkpoint-4578-merged
--resume_only_model
--ignore_data_skip \

loss完全不一样

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions