Non-record: Stacked hyperparameter tuning + eval2048 (RTX 5090, val_bpb 1.336) by gwelinder · Pull Request #104 · openai/parameter-golf

gwelinder · 2026-03-19T17:11:22Z

Non-record submission: Stacked Hyperparameter Tuning + Eval2048

val_bpb: 1.3358 (post-quant int8+zlib) | 15.8MB artifact | RTX 5090, 20 train shards

What this is

40+ automated experiments via an autoresearch loop on the baseline 9x512 architecture. No architecture changes. 5 stacked config fixes improve val_bpb by 0.027.

Key finding

WARMDOWN_ITERS=1200 is broken at 600s wallclock. At ~620ms/step you get ~968 total steps, so 1200 > total steps and the cosine warmdown fires from step 1. Fix: WARMDOWN_ITERS=3000. (PRs #48 and #73 flagged the same thing.)

Stacked config

WARMDOWN_ITERS=3000, MATRIX_LR=0.06, LOGIT_SOFTCAP=15, MUON_MOMENTUM=0.99
TRAIN_BATCH_TOKENS=131072 (quarter-batch), EVAL_SEQ_LEN=2048

Negative results (also in the README)

Butterfly/Monarch MLP: 7MB artifact but 1.46 bpb
Reservoir random MLPs: 2.14 bpb
Depth recurrence (4x3=12 eff layers): 1.50 bpb
6 alternative shapes: none beat 9x512

Also in `train_gpt.py`

EVAL_SEQ_LEN decoupling (train short, eval long)
Alias-aware serialization (shared weights stored once)
Mixed int6/int8 quantization (INT6_ALL_BLOCK_MATRICES env var)
Sliding-window eval (EVAL_STRIDE env var, batched)
Depth recurrence support (NUM_UNIQUE_LAYERS, NUM_RECURRENCE)

Hardware: RTX 5090, RunPod. Not 8xH100. This is a dev iteration result with interesting negative findings.

…ards) val_bpb 1.336, 15.8MB artifact. 40+ experiments via autoresearch loop. Key finding: baseline WARMDOWN_ITERS=1200 is broken at 600s wallclock. Also includes negative results for butterfly MLP, reservoir MLPs, depth recurrence, and 6 iso-byte shapes.

notapplica mentioned this pull request Mar 19, 2026

Parameter Golf Live AI Commentary #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Stacked hyperparameter tuning + eval2048 (RTX 5090, val_bpb 1.336)#104

Non-record: Stacked hyperparameter tuning + eval2048 (RTX 5090, val_bpb 1.336)#104
gwelinder wants to merge 1 commit intoopenai:mainfrom
gwelinder:submission/stacked-hyperparams-rtx5090

gwelinder commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gwelinder commented Mar 19, 2026

Non-record submission: Stacked Hyperparameter Tuning + Eval2048

What this is

Key finding

Stacked config

Negative results (also in the README)

Also in train_gpt.py

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Also in `train_gpt.py`