Skip to content

Add seq4096 fp16 tok coarsen record#74

Open
takhir-iota wants to merge 1 commit intoopenai:mainfrom
takhir-iota:codex/non-sliding-pr
Open

Add seq4096 fp16 tok coarsen record#74
takhir-iota wants to merge 1 commit intoopenai:mainfrom
takhir-iota:codex/non-sliding-pr

Conversation

@takhir-iota
Copy link

Adds a new 10-minute 8xH100 record under records/track_10min_16mb/2026-03-19_Seq4096Fp16TokCoarsen.

Summary:

  • canonical exact score: 1.18838751 val_bpb
  • canonical total size: 15,937,608 bytes
  • training recipe: seq_len=4096, TRAIN_BATCH_TOKENS=393216, tuned Muon schedule
  • export recipe: keep tok_emb.weight in fp16 and coarsen only blocks.5.

Repro reruns included in the record folder:

  • seed=1338: 1.18896209 val_bpb, 15,942,078 bytes
  • seed=1339: 1.19055214 val_bpb, 15,942,841 bytes

Files included:

  • canonical train.log
  • exact train_gpt.py snapshot used for the run
  • submission.json
  • both rerun logs for reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants