🎯 Entropy-Adaptive Fine-Tuning :
Resolving Confident Conflicts to Mitigate Forgetting

If you like our project, please give us a star ⭐ on GitHub for the latest update.

🗂️ Table of Contents

📣 Latest News
📖 Abstract
⚡ Method: EAFT
📊 Results
🚀 Quick Start
📝 Citation

📣 Latest News

[Jan 12,2026] 💻 We have implemented EAFT loss in a ms-swift-EAFT project!
[Jan 08,2026] 🔥 We are honored to be featured as 🤗 HuggingFace Daily Paper #1.
[Jan 07,2026] 📄 Our paper is now available on arXiv and Hugging Face daily paper.
[Jan 06,2026] ✨ Integration: EAFT has been merged into LLaMA-Factory! You can now use EAFT via use_eaft_loss parameter in it.
[Jan 06,2026] 🚀 Code Release: Training code and scripts are now available! Please ⭐ Star this repo to stay tuned!

📖 Abstract

Supervised Fine-Tuning (SFT) is the standard paradigm for domain adaptation, yet it frequently incurs the cost of 🎯 catastrophic forgetting. In sharp contrast, on-policy Reinforcement Learning (RL) effectively preserves general capabilities.

We investigate this discrepancy and identify a fundamental distributional gap: while RL aligns with the model's internal belief, SFT forces the model to fit external supervision. This mismatch often manifests as "💥 Confident Conflicts"—tokens characterized by low probability but low entropy. In these instances, the model is highly confident in its own prediction but is forced to learn a divergent ground truth, triggering destructive gradient updates.

To address this, we propose 🚀 Entropy-Adaptive Fine-Tuning (EAFT). Unlike methods relying solely on prediction probability, EAFT utilizes token-level entropy as a gating mechanism to distinguish between epistemic uncertainty and knowledge conflict. This allows the model to learn from uncertain samples while suppressing gradients on conflicting data.

🔬 Key Findings:

Extensive experiments on Qwen and GLM series (4B-32B parameters) across mathematical, medical, and agentic domains
EAFT consistently matches the downstream performance of standard SFT
Significantly mitigates the degradation of general capabilities
"The right balance between learning and forgetting"

Figure 1: (a) Conceptual illustration of Confident Conflicts. (b) Token-level entropy-probability landscape comparison between SFT and On-Policy Rollouts.

⚡ Method: EAFT

🔍 What are "Confident Conflicts"?

We identify "Confident Conflicts" (Low Probability, Low Entropy) as the primary driver of catastrophic forgetting. These occur when:

The model is confident about its prediction (low entropy)
But the prediction conflicts with ground truth (low probability)

🛠️ Our Solution: Entropy-Adaptive Fine-Tuning

EAFT introduces an entropy-based gating mechanism to the standard Cross-Entropy loss:

$$\mathcal{L}_{EAFT} (\theta) = - \sum_{t=1}^{T} \tilde{H}_t \cdot \log P_\theta(y_t | x, y_{<t})$$

Where $\tilde{H}_t$ is the normalized entropy. This mechanism:

⚠️ Suppresses Conflicts: Down-weights gradients when the model is stubborn (Low Entropy), preventing destructive updates
✨ Encourages Learning: Maintains high weights when the model is uncertain/exploring (High Entropy)

📈 Intuitive Understanding

Unlike methods that rely solely on prediction probability, EAFT utilizes token-level entropy to distinguish between:

Epistemic uncertainty: Model doesn't know → Strong learning signal
Knowledge conflict: Model is confident but wrong → Suppressed updates

Visualization: EAFT effectively suppresses destructive gradients in the Confident Conflict region.

📊 Results

🧮 Main Results on Math Domain

Method	AIME24	AIME25	GSM8K	Math Avg.	MMLU	IFEval	CLUEWSC	General Avg.
Qwen3-4B-Instruct	63.3	47.4	94.3	68.3	77.1	81.0	85.2	81.1
+ SFT	63.3	50.0	94.8	69.4	76.5	79.5	74.5	76.5 $\color{red}{(-4.6)}$
+ SFT-KL	63.3	50.0	93.6	69.0	74.5	74.9	89.4	79.6 $\color{red}{(-1.5)}$
+ FLOW	66.7	46.7	94.3	69.2	76.2	78.3	82.8	79.1 $\color{red}{(-2.0)}$
+ DFT	56.7	40.0	93.9	63.5	75.9	77.0	81.4	78.1 $\color{red}{(-3.0)}$
+ TALR	50.0	50.0	93.3	64.4	76.2	78.1	74.5	76.2 $\color{red}{(-4.9)}$
+ EAFT (Ours)	60.0	53.3	94.5	69.3	76.6	80.1	83.7	80.1 $\color{green}{(-1.0) \textbf{✓}}$

🏥 Universality: Medical & Agent Domains

🩺 Medical Domain (Base: Qwen3-4B-Thinking)

Method	MedMCQA	MedQA	PubMedQA	Medical Avg.	MMLU	IFEval	CLUEWSC	General Avg.
Qwen3-4B-Think	63.5	78.2	76.0	72.6	79.3	85.0	94.1	86.1
+ SFT	63.3	79.5	78.0	73.6	78.3	75.3	90.4	81.3 $\color{red}{(-4.8)}$
+ EAFT (Ours)	63.9	80.0	77.2	73.7	80.1	81.7	91.8	84.5 $\color{green}{(-1.6) \textbf{✓}}$

🤖 Agent Domain (Base: Qwen3-4B-Instruct)

Method	BFCL v3 (Target)	MMLU	IFEval	CLUEWSC	General Avg.
Qwen3-4B-Inst	60.5	77.1	81.0	85.2	81.1
+ SFT	61.4	74.5	77.8	72.2	74.8 $\color{red}{(-6.3)}$
+ EAFT (Ours)	60.8	76.1	78.6	77.7	77.5 $\color{green}{(-3.6) \textbf{✓}}$

🚀 Quick Start

LlamaFactory

Step 1: Clone the repository

git clone https://github.com/ymxyll/LlamaFactory-EAFT.git
cd LlamaFactory-EAFT

Step 2: Install dependencies

pip install -e .

Step 3: Run the training script

llamafactory-cli train --config examples/extras/eaft/qwen25_05b_eaft_full.yaml

ms-swift

Step 1: Clone the repository

git clone https://github.com/ymxyll/ms-swift-EAFT.git
cd ms-swift-EAFT

Step 2: Install dependencies

pip install -e .

Step 3: Run the training script

# megatron
bash examples/megatron/eaft.sh
# deepspeed
bash examples/train/eaft.sh

📝 Citation

If you find this work helpful for your research, please consider citing our paper:

@article{diao2026entropy,
  title={Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting},
  author={Diao, Muxi and Yang, Lele and Gong, Wuxuan and Zhang, Yutong and Yan, Zhonghao and Han, Yufei and Liang, Kongming and Xu, Weiran and Ma, Zhanyu},
  journal={arXiv preprint arXiv:2601.02151},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
LlamaFactory-EAFT @ e2ce19e		LlamaFactory-EAFT @ e2ce19e
assets		assets
ms-swift-EAFT @ 7683b1e		ms-swift-EAFT @ 7683b1e
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎯 Entropy-Adaptive Fine-Tuning :
Resolving Confident Conflicts to Mitigate Forgetting

🗂️ Table of Contents

📣 Latest News

📖 Abstract

⚡ Method: EAFT

🔍 What are "Confident Conflicts"?

🛠️ Our Solution: Entropy-Adaptive Fine-Tuning

📈 Intuitive Understanding

📊 Results

🧮 Main Results on Math Domain

🏥 Universality: Medical & Agent Domains

🚀 Quick Start

LlamaFactory

ms-swift

📝 Citation

🌟 Star History

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

PRIS-CV/EAFT

Folders and files

Latest commit

History

Repository files navigation

🎯 Entropy-Adaptive Fine-Tuning : Resolving Confident Conflicts to Mitigate Forgetting

🗂️ Table of Contents

📣 Latest News

📖 Abstract

⚡ Method: EAFT

🔍 What are "Confident Conflicts"?

🛠️ Our Solution: Entropy-Adaptive Fine-Tuning

📈 Intuitive Understanding

📊 Results

🧮 Main Results on Math Domain

🏥 Universality: Medical & Agent Domains

🚀 Quick Start

LlamaFactory

ms-swift

📝 Citation

🌟 Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

🎯 Entropy-Adaptive Fine-Tuning :
Resolving Confident Conflicts to Mitigate Forgetting

Packages