Fix streaming validation infinite loop (#42) by vominh1919 · Pull Request #45 · PrimeIntellect-ai/OpenDiloco

vominh1919 · 2026-04-19T01:21:30Z

Fixes #42

Problem: In train_diloco_torch.py, the validation dataset is loaded with streaming=True, creating an IterableDataset that lacks __len__. The DataLoader iterates infinitely in evaluate_model(), never terminating.

Solution: Limit the streaming validation dataset to 1000 samples using eval_dataset.take(1000). This caps evaluation to a reasonable sample size for perplexity testing while avoiding the infinite loop.

Uses hasattr(eval_dataset, "take") guard so the fix is safe for non-streaming datasets (e.g., when using c4_tiny).

Limit streaming validation dataset to 1000 samples using .take(1000). IterableDataset has no __len__, causing DataLoader to loop forever when used with streaming=True. This caps evaluation to a reasonable sample size for perplexity testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix streaming validation infinite loop (#42)#45

Fix streaming validation infinite loop (#42)#45
vominh1919 wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
vominh1919:fix/42-streaming-validation-infinite-loop

vominh1919 commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vominh1919 commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant