Non-record: SwiGLU + warmdown fix + quarter batch (1x5090, 1.3281 bpb) by NishantDahal · Pull Request #73 · openai/parameter-golf

NishantDahal · 2026-03-19T09:40:21Z

Non-record submission documenting a 10-experiment systematic exploration on 1×RTX 5090.

Best val_bpb: 1.3281 (post-quant, under 16MB artifact cap)

Key findings:

Discovered warmdown schedule bug in stock train_gpt.py — default warmdown_iters=1200 with 600s wallclock causes LR to decay from step 1. Fixed with time-fraction approach (warmdown_frac=0.2). Worth -0.006 bpb alone.
SwiGLU activation replacing ReLU² (-0.004 bpb)
Quarter batch size (131K tokens) for 4× more optimizer steps (-0.016 bpb cumulative)
Gradient accumulation ×2 (-0.002 bpb)
Negative results: weight decay (no effect), layer recurrence (harmful)

Total improvement: -0.035 bpb over stock baseline

Score gap vs leaderboard baseline (1.2244) is explained by hardware throughput — 1×5090 gets ~3,773 steps vs ~13,780 on 8×H100. The improvements are hardware-agnostic and should transfer to multi-GPU runs.

Full experiment log and analysis in README.

Non-record: SwiGLU + warmdown fix + quarter batch (1x5090, 1.3281 bpb)

b8a1426

0hq added the record submission ready for review label Mar 19, 2026

gwelinder mentioned this pull request Mar 19, 2026

Non-record: Stacked hyperparameter tuning + eval2048 (RTX 5090, val_bpb 1.336) #104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: SwiGLU + warmdown fix + quarter batch (1x5090, 1.3281 bpb)#73

Non-record: SwiGLU + warmdown fix + quarter batch (1x5090, 1.3281 bpb)#73
NishantDahal wants to merge 1 commit intoopenai:mainfrom
NishantDahal:swiglu-warmdown-1x5090

NishantDahal commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NishantDahal commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants