fix: scientific correctness issues suggestions in eval pipeline by davide-beltrame · Pull Request #7 · giacomo-ciro/diffusion-llms

davide-beltrame · 2025-11-29T10:45:26Z

Take the following as suggestions, I know some of them were simply missing due to testing and temporary settings @VittorioRossi.

Fixes for accurate benchmarking:

…perature, humaneval_base.yaml

…length

davide-beltrame added 9 commits November 29, 2025 11:24

fix: replication gap fixes - add skip_chat_template, configurable tem…

1a1f77d

…perature, humaneval_base.yaml

Merge branch 'main' into dave

4f47233

fix: resolve pad_token_id dynamically instead of hardcoded 126081

34116e4

fix: syntax error - extra parenthesis in trainer.py

c9e9f33

fix: remove duplicate temperature kwarg line

68137cb

fix: count length prediction forward pass in total FLOPS

30f8378

fix: NFE calculation uses actual generation length, not max_new_tokens

5057eff

fix: use temperature=0.0 (greedy) consistently, remove redundant gen_…

ed02c80

…length

fix: include output projection to vocab in FLOPS estimation

3e9707f

davide-beltrame requested a review from VittorioRossi November 29, 2025 10:45

Provide feedback