Skip to content

Add seq4096 sliding-window fp16 tok coarsen record#75

Open
takhir-iota wants to merge 1 commit intoopenai:mainfrom
takhir-iota:codex/sliding-pr
Open

Add seq4096 sliding-window fp16 tok coarsen record#75
takhir-iota wants to merge 1 commit intoopenai:mainfrom
takhir-iota:codex/sliding-pr

Conversation

@takhir-iota
Copy link

Adds a new 10-minute 8xH100 sliding-window record under records/track_10min_16mb/2026-03-19_Seq4096SlidingWindowFp16TokCoarsen.

Summary:

  • canonical sliding exact score: 1.17675682 val_bpb
  • canonical standard exact score: 1.18874323 val_bpb
  • canonical total size: 15,943,260 bytes
  • training recipe: seq_len=4096, TRAIN_BATCH_TOKENS=393216, tuned Muon schedule
  • export recipe: keep tok_emb.weight in fp16 and coarsen only blocks.5.
  • eval recipe: EVAL_STRIDE=64, SW_EVAL_BATCH=32

Repro reruns included in the record folder:

  • seed=1338: sliding 1.17675183 val_bpb, 15,949,486 bytes
  • seed=1339: sliding 1.17910456 val_bpb, 15,950,789 bytes

Files included:

  • canonical train.log
  • exact train_gpt.py snapshot used for the run
  • submission.json
  • both rerun logs for reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant