[feat] Support ProFit: Extend DFT with Probability Threshold-based Token Filtering#7921
[feat] Support ProFit: Extend DFT with Probability Threshold-based Token Filtering#7921maybefunctionname wants to merge 5 commits intomodelscope:mainfrom
Conversation
- 在SFT训练中实现[DFT](https://arxiv.org/abs/2508.05629)损失的硬门控功能 - 添加环境变量HARD_GATING_PROBABILITY_THRESHOLD用于设置概率阈值 - 实现基于token概率的硬门控机制,低概率token将被排除在损失计算之外 - 更新文档说明DFT软门控和Profit硬门控机制的区别 - 在训练器中添加概率阈值验证和错误处理逻辑 - 提供示例脚本展示硬门控配置方法
- 在SFT训练中实现[DFT](https://arxiv.org/abs/2508.05629)损失的硬门控功能 - 添加环境变量HARD_GATING_PROBABILITY_THRESHOLD用于设置概率阈值 - 实现基于token概率的硬门控机制,低概率token将被排除在损失计算之外 - 更新文档说明DFT软门控和Profit硬门控机制的区别 - 在训练器中添加概率阈值验证和错误处理逻辑 - 提供示例脚本展示硬门控配置方法
Summary of ChangesHello @maybefunctionname, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new feature that extends the Dynamic Fine-Tuning (DFT) framework with support for ProFit's hard gating mechanism. This enhancement allows for more targeted supervised fine-tuning by masking out low-probability tokens, which can lead to improved model generalization and prevent overfitting to less significant linguistic elements. The implementation is designed to be flexible, enabling users to easily activate ProFit's hard gating via an environment variable while retaining the option for DFT's original soft gating. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for ProFit, a hard-gating mechanism for token filtering, as an extension to the existing DFT loss. The implementation correctly uses an environment variable to switch between soft and hard gating. The documentation and examples have also been updated accordingly.
My main feedback is regarding code duplication. The logic for applying the gating mechanism is repeated in three different places. I've left comments with suggestions to refactor this into a single helper function to improve code quality and maintainability.
- 将DFT损失中的门控因子计算逻辑提取到get_dft_gating_factor函数 - 移除trainer.py中的重复门控因子计算代码 - 统一处理软门控(DFT)和硬门控(ProFit)模式 - 保持原有的环境变量HARD_GATING_PROBABILITY_THRESHOLD支持 - 在多个损失函数中复用相同的门控因子计算逻辑 - 简化代码结构并提高可维护性
- 修正了 get_dft_gating_factor 的导入路径 - 从相对导入改为绝对导入方式 - 确保了模块引用的一致性
- 移除 trainer.py 中未使用的 os - 调整 trainer.py 中的模块导入顺序以提高代码可读性 - 在 utils.py 中为硬门控概率阈值环境变量使用单引号字符串 - 重新组织 trainer.py 中的导入语句位置以符合代码风格 - 修复了环境变量获取时的字符串引号格式问题
PR type
Overview
This PR extends the existing DFT (Dynamic Fine-Tuning) implementation to support the hard gating mechanism proposed in ProFit, which directly masks low-value tokens via probability thresholds to improve model generalization on reasoning tasks.
Key Improvements
DFT vs ProFit - Core Differences:
p(loss *= p), all tokens still contribute to gradientsτto directly mask tokens withp < τ(loss *= mask), retaining only high-probability core tokensAdvantages of ProFit:
Implementation Details
To maintain code simplicity, this implementation reuses the
enable_dft_lossparameter and switches modes via the environment variableHARD_GATING_PROBABILITY_THRESHOLD:0.1): Enables ProFit hard gating, masking tokens below thresholdUsage Example
References