Skip to content

fix: scientific correctness issues suggestions in eval pipeline#7

Open
davide-beltrame wants to merge 9 commits into
mainfrom
dave
Open

fix: scientific correctness issues suggestions in eval pipeline#7
davide-beltrame wants to merge 9 commits into
mainfrom
dave

Conversation

@davide-beltrame

Copy link
Copy Markdown
Collaborator

Take the following as suggestions, I know some of them were simply missing due to testing and temporary settings @VittorioRossi.

Fixes for accurate benchmarking:

  • Dynamic pad_token_id (was hardcoded 126081, broke ModernBERT)
  • Count length prediction forward pass in FLOPS
  • NFE uses actual generation length, not max_new_tokens
  • Include vocab projection in FLOPS estimate
  • Consistent temperature=0.0 (greedy) across scripts
  • Syntax error in trainer.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant