Record: Int6 + 3x MLP + sliding window (val_bpb=1.1708) + 9 ablations by mrdavtan · Pull Request #212 · openai/parameter-golf

mrdavtan · 2026-03-20T13:06:35Z

Summary

val_bpb = 1.1708 — independent int6 implementation with 3x MLP expansion, currently #2 on the merged leaderboard.

Int6 per-row quantization ([-31,31]) + zstd-22 compression
3x MLP expansion (hidden=1536) — 21.8M params in 15.2MB artifact
FP16 tied embedding, WD=20000, tuned LRs, sliding window eval stride=64
8×H100 SXM, 12,507 steps at 48ms/step

What's different about this submission

This isn't just a score entry. It's accompanied by 9 controlled ablations testing techniques that top entries use but never isolate — all on the same hardware, same seed, one variable at a time.

Ablation results (6 negative findings)

Technique	val_bpb	vs Control (1.1929)	Verdict
SWA	1.1933	+0.0004	No effect at WD=1200
Doc-isolated eval	1.2015	+0.0086	Hurts at stride=64 (contradicts LoRA TTT)
Curriculum learning	1.1942	+0.0013	No effect
Multi-token prediction	1.1947	+0.0018	No effect
Int6 + 3x MLP	1.1708	-0.0221	Best result
+ SmearGate + BigramHash	1.1739	-0.019	Hurts on top of int6
Depth recurrence + Huginn	4.34-5.58	—	Catastrophic at 7.6M scale
Int8 QAT (PR #145)	1.2052	+0.012	Overhead exceeds recovery

Key findings for the community

Doc-isolated eval hurts at stride=64 — contradicts LoRA TTT entry's +0.011 at stride=256. Crossover exists between stride 64 and 256.
SmearGate + BigramHash don't help out-of-the-box with int6 — may require specific init or interaction with OrthoInit.
Huginn eval-time scaling fails at small scale — tested both U-Net skips and flat loops. 3 shared blocks at 7.6M params can't learn iterative refinement.
SWA bf16 accumulation bug — accumulating in bf16 for thousands of steps causes catastrophic precision loss.
torch.compile graph priming pitfall — pre-compiling conditional code paths causes 50% slowdown.

See README for full analysis.

Test plan

Artifact under 16,000,000 bytes (15,175,136)
Training completes within 600s (599.98s)
Eval completes within 600s (80s)
Training log included
Additional seeds for statistical validation (pending compute credits)

Built with Claude Code

Independent int6 implementation with 3x MLP expansion, FP16 embed, WD20k, sliding window eval. 21.8M params in 15.2MB artifact. Accompanied by 9 controlled ablations with 6 negative findings.

mrdavtan · 2026-03-20T16:08:09Z

Update: 5-seed statistical validation added

Seed	val_bpb
31337	1.1703
1337	1.1708
2024	1.1712
42	1.1732
7	1.1767
Mean	1.1724
Std	0.0026

Gap vs baseline: 0.036 nats (threshold: 0.005) | t-stat: 44.2 | p < 0.01

All 5 runs on 8×H100 SXM (RunPod Parameter Golf template), PyTorch 2.9.1+cu128, same config, only seed varied. README and submission.json updated with full results.

Record: Int6 + 3x MLP + sliding window (val_bpb=1.1708)

41c503f

Independent int6 implementation with 3x MLP expansion, FP16 embed, WD20k, sliding window eval. 21.8M params in 15.2MB artifact. Accompanied by 9 controlled ablations with 6 negative findings.

mrdavtan mentioned this pull request Mar 20, 2026

Non-record: FP16 embed + WD20k + seq2048 + doc-isolated sliding window (val_bpb=1.2045) #151

Closed

4 tasks

notapplica mentioned this pull request Mar 20, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

Add 5-seed statistical validation: mean 1.1724, p<0.01

3e2e462

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Int6 + 3x MLP + sliding window (val_bpb=1.1708) + 9 ablations#212

Record: Int6 + 3x MLP + sliding window (val_bpb=1.1708) + 9 ablations#212
mrdavtan wants to merge 2 commits intoopenai:mainfrom
mrdavtan:int6-3xMLP-pr

mrdavtan commented Mar 20, 2026

Uh oh!

mrdavtan commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mrdavtan commented Mar 20, 2026

Summary

What's different about this submission

Ablation results (6 negative findings)

Key findings for the community

Test plan

Uh oh!

mrdavtan commented Mar 20, 2026

Update: 5-seed statistical validation added

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant