-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Description
Identified an issue in the evaluation pipeline that leads to significantly degraded evaluation results when using float16 together with flash_attention_2
Problem
The evaluation code converts the model to float16 after loading the checkpoint and enables flash_attention_2 in lines 147-148 of
src/modernvbert/contrastive_training/evaluate.py.
However, this approach does not work correctly for this checkpoint and produces very poor evaluation results.
Key points:
- The checkpoint parameters are stored in float32
- During evaluation, the model is:
- loaded in float32
- then converted to float16
- and evaluated with flash_attention_2
This configuration results in incorrect scores
When the checkpoint is evaluated without forcing float16 and FlashAttention, the results are significantly better and consistent.
- Training emits warnings indicating that
FlashAttention2requiresfloat16, suggesting it was not correctly enabled during training - This implies the model was not trained with
FlashAttention2infloat16
Suggested Fixes
- Do not convert the model to float16 after loading if the checkpoint was trained in float32
- Only enable flash_attention_2 when the model is trained and stored in compatible precision
Metadata
Metadata
Assignees
Labels
No labels