Skip to content

Incorrect evaluation results when converting model to float16 + FlashAttention2 after loading #4

@sjoerdgunneweg

Description

@sjoerdgunneweg

Description

Identified an issue in the evaluation pipeline that leads to significantly degraded evaluation results when using float16 together with flash_attention_2

Problem

The evaluation code converts the model to float16 after loading the checkpoint and enables flash_attention_2 in lines 147-148 of
src/modernvbert/contrastive_training/evaluate.py.
However, this approach does not work correctly for this checkpoint and produces very poor evaluation results.

Key points:

  • The checkpoint parameters are stored in float32
  • During evaluation, the model is:
    • loaded in float32
    • then converted to float16
    • and evaluated with flash_attention_2

This configuration results in incorrect scores

When the checkpoint is evaluated without forcing float16 and FlashAttention, the results are significantly better and consistent.

  • Training emits warnings indicating that FlashAttention2 requires float16, suggesting it was not correctly enabled during training
  • This implies the model was not trained with FlashAttention2 in float16

Suggested Fixes

  • Do not convert the model to float16 after loading if the checkpoint was trained in float32
  • Only enable flash_attention_2 when the model is trained and stored in compatible precision

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions