I see some discrepancy in the model training script and the paper training details. It would really helpful if someone from the team can clarify these:
- The script uses
"qkv_proj,o_proj,gate_up_proj,down_proj,k_proj,q_proj,out_proj,v_proj" as the target lora modules, but Qwen2 does not have qkv_proj, gate_up_proj but instead qkv proj gate_proj and up_proj modules. Is this a typo? What exact modules were trained with LoRA as I wish to reproduce the result and I am running into some issues with it.
- The script mentions that the lora scaling $\alpha$ is 64 (default), but the paper mentions it as 32.
I see some discrepancy in the model training script and the paper training details. It would really helpful if someone from the team can clarify these:
"qkv_proj,o_proj,gate_up_proj,down_proj,k_proj,q_proj,out_proj,v_proj"as the target lora modules, butQwen2does not haveqkv_proj,gate_up_projbut insteadqkvprojgate_projandup_projmodules. Is this a typo? What exact modules were trained with LoRA as I wish to reproduce the result and I am running into some issues with it.