- [Jan 12,2026] 💻 We have implemented EAFT loss in a ms-swift-EAFT project!
- [Jan 08,2026] 🔥 We are honored to be featured as 🤗 HuggingFace Daily Paper #1.
- [Jan 07,2026] 📄 Our paper is now available on arXiv and Hugging Face daily paper.
- [Jan 06,2026] ✨ Integration: EAFT has been merged into LLaMA-Factory! You can now use EAFT via
use_eaft_lossparameter in it. - [Jan 06,2026] 🚀 Code Release: Training code and scripts are now available! Please ⭐ Star this repo to stay tuned!
Supervised Fine-Tuning (SFT) is the standard paradigm for domain adaptation, yet it frequently incurs the cost of 🎯 catastrophic forgetting. In sharp contrast, on-policy Reinforcement Learning (RL) effectively preserves general capabilities.
We investigate this discrepancy and identify a fundamental distributional gap: while RL aligns with the model's internal belief, SFT forces the model to fit external supervision. This mismatch often manifests as "💥 Confident Conflicts"—tokens characterized by low probability but low entropy. In these instances, the model is highly confident in its own prediction but is forced to learn a divergent ground truth, triggering destructive gradient updates.
To address this, we propose 🚀 Entropy-Adaptive Fine-Tuning (EAFT). Unlike methods relying solely on prediction probability, EAFT utilizes token-level entropy as a gating mechanism to distinguish between epistemic uncertainty and knowledge conflict. This allows the model to learn from uncertain samples while suppressing gradients on conflicting data.
🔬 Key Findings:
- Extensive experiments on Qwen and GLM series (4B-32B parameters) across mathematical, medical, and agentic domains
- EAFT consistently matches the downstream performance of standard SFT
- Significantly mitigates the degradation of general capabilities
- "The right balance between learning and forgetting"
Figure 1: (a) Conceptual illustration of Confident Conflicts. (b) Token-level entropy-probability landscape comparison between SFT and On-Policy Rollouts.
We identify "Confident Conflicts" (Low Probability, Low Entropy) as the primary driver of catastrophic forgetting. These occur when:
- The model is confident about its prediction (low entropy)
- But the prediction conflicts with ground truth (low probability)
EAFT introduces an entropy-based gating mechanism to the standard Cross-Entropy loss:
Where
⚠️ Suppresses Conflicts: Down-weights gradients when the model is stubborn (Low Entropy), preventing destructive updates- ✨ Encourages Learning: Maintains high weights when the model is uncertain/exploring (High Entropy)
Unlike methods that rely solely on prediction probability, EAFT utilizes token-level entropy to distinguish between:
- Epistemic uncertainty: Model doesn't know → Strong learning signal
- Knowledge conflict: Model is confident but wrong → Suppressed updates
| Method | AIME24 | AIME25 | GSM8K | Math Avg. | MMLU | IFEval | CLUEWSC | General Avg. |
|---|---|---|---|---|---|---|---|---|
| Qwen3-4B-Instruct | 63.3 | 47.4 | 94.3 | 68.3 | 77.1 | 81.0 | 85.2 | 81.1 |
| + SFT | 63.3 | 50.0 | 94.8 | 69.4 | 76.5 | 79.5 | 74.5 | 76.5 |
| + SFT-KL | 63.3 | 50.0 | 93.6 | 69.0 | 74.5 | 74.9 | 89.4 | 79.6 |
| + FLOW | 66.7 | 46.7 | 94.3 | 69.2 | 76.2 | 78.3 | 82.8 | 79.1 |
| + DFT | 56.7 | 40.0 | 93.9 | 63.5 | 75.9 | 77.0 | 81.4 | 78.1 |
| + TALR | 50.0 | 50.0 | 93.3 | 64.4 | 76.2 | 78.1 | 74.5 | 76.2 |
| + EAFT (Ours) | 60.0 | 53.3 | 94.5 | 69.3 | 76.6 | 80.1 | 83.7 |
80.1 |
🩺 Medical Domain (Base: Qwen3-4B-Thinking)
| Method | MedMCQA | MedQA | PubMedQA | Medical Avg. | MMLU | IFEval | CLUEWSC | General Avg. |
|---|---|---|---|---|---|---|---|---|
| Qwen3-4B-Think | 63.5 | 78.2 | 76.0 | 72.6 | 79.3 | 85.0 | 94.1 | 86.1 |
| + SFT | 63.3 | 79.5 | 78.0 | 73.6 | 78.3 | 75.3 | 90.4 | 81.3 |
| + EAFT (Ours) | 63.9 | 80.0 | 77.2 | 73.7 | 80.1 | 81.7 | 91.8 | 84.5 |
🤖 Agent Domain (Base: Qwen3-4B-Instruct)
| Method | BFCL v3 (Target) | MMLU | IFEval | CLUEWSC | General Avg. |
|---|---|---|---|---|---|
| Qwen3-4B-Inst | 60.5 | 77.1 | 81.0 | 85.2 | 81.1 |
| + SFT | 61.4 | 74.5 | 77.8 | 72.2 | 74.8 |
| + EAFT (Ours) | 60.8 | 76.1 | 78.6 | 77.7 | 77.5 |
- Step 1: Clone the repository
git clone https://github.com/ymxyll/LlamaFactory-EAFT.git
cd LlamaFactory-EAFT- Step 2: Install dependencies
pip install -e .- Step 3: Run the training script
llamafactory-cli train --config examples/extras/eaft/qwen25_05b_eaft_full.yaml- Step 1: Clone the repository
git clone https://github.com/ymxyll/ms-swift-EAFT.git
cd ms-swift-EAFT- Step 2: Install dependencies
pip install -e .- Step 3: Run the training script
# megatron
bash examples/megatron/eaft.sh
# deepspeed
bash examples/train/eaft.shIf you find this work helpful for your research, please consider citing our paper:
@article{diao2026entropy,
title={Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting},
author={Diao, Muxi and Yang, Lele and Gong, Wuxuan and Zhang, Yutong and Yan, Zhonghao and Han, Yufei and Liang, Kongming and Xu, Weiran and Ma, Zhanyu},
journal={arXiv preprint arXiv:2601.02151},
year={2026}
}
