Submission: Wider MLP 3x + int6 quant + sliding window eval, val_bpb=1.1659 by jfprincz · Pull Request #70 · openai/parameter-golf

jfprincz · 2026-03-19T08:54:56Z

Submission: Wider MLP 3x + int6 Quantization + Sliding Window Eval

val_bpb: 1.1659 | Total size: 14,855,508 bytes (under 16MB)

Three orthogonal improvements over the naive baseline:

Wider MLP (MLP_MULT=3.0) - 2x to 3x expansion (hidden=1536), ~0.019 BPB improvement
int6 per-row on MLP+attention - saves ~4MB artifact space, only +0.010 BPB degradation; zstd-22 compression
Sliding window eval (stride=256) - overlapping windows, batched forward_logits, ~0.033 BPB improvement, zero artifact cost

Run command

RUN_ID=official_v1_reach MAX_WALLCLOCK_SECONDS=600 VAL_LOSS_EVERY=0 TRAIN_LOG_EVERY=200 MATRIX_LR=0.020 SCALAR_LR=0.020 TIED_EMBED_LR=0.030 MUON_MOMENTUM=0.99 MUON_MOMENTUM_WARMUP_STEPS=1500 MUON_MOMENTUM_WARMUP_START=0.92 WARMDOWN_ITERS=3000 torchrun --standalone --nproc_per_node=8 train_gpt.py

Key metrics

Metric	Value
Steps (10 min cap)	12,485
int6 sliding val_bpb	1.1659
Artifact size	14,855,508 bytes
Two seeds: 1.16658, 1.16591 (submitted: 1338)

See README.md in the submission folder for full details.

…1.1659

Sliding window eval gives ~0.03 BPB free (proven by 5+ competitors): - stride=64 with seq_len=1024 → every token scored with 960+ context - forward_per_token_loss() method for per-token scoring - Only counts last `stride` positions per window (full context) - EVAL_STRIDE env var (0 = disable, default 64) MLP 3x gives ~0.02 BPB (proven by jfprincz, PR openai#70): - Hidden dim 1536 instead of 1024 - Needs INT6 middle layers to fit in 16MB (already implemented) Updated INTEL.md with latest competitive landscape (28→70+ PRs). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…_bpb 1.1652 Stack five techniques from systematic PR analysis: - MLP_MULT=3.0 (hidden=1536) for wider model capacity (from PR openai#70) - int6 per-row quant on MLP+attn, fp16 tied embed passthrough (from PR openai#70) - zstd-22 compression (from PR openai#70) - TRAIN_SEQ_LEN=4096 for richer per-step training signal (from PR openai#65) - Sliding window eval at stride=64 with compiled forward_logits Mean val_bpb=1.16520 (std=0.00102, t=92.15, p<<0.001). Three seeds: 1.16615, 1.16532, 1.16412. Artifact: 15.6MB (under 16,000,000 byte cap). Training: 9370 steps at 64ms/step on 8xH100 SXM. Made-with: Cursor

Every submission scoring <1.18 BPB uses these EXACT settings. We were running defaults — now matching the winners: MUON_MOMENTUM: 0.95 → 0.99 (stronger smoothing) MATRIX_LR: 0.04 → 0.02 (halved, reduces quant gap) SCALAR_LR: 0.04 → 0.02 (halved) TIED_EMBED_LR: 0.05 → 0.03 (halved) WARMDOWN_ITERS: 1200 → 3000 (longer warmdown) MUON_WARMUP_START: 0.85 → 0.92 (higher start) MUON_WARMUP_STEPS: 500 → 1500 (3x longer warmup) These settings are proven by PR openai#64 (1.0149), openai#66 (1.1652), openai#70 (1.1659), openai#65 (1.1808) — all top submissions. Applied to both v5 and v6. Both compile, 1498 lines each.

- add a PR-audit research log entry covering the clean takeaways from pull requests openai#36 through openai#70 - promote long-context training plus matching long-context eval as a first-class clean branch based on PR openai#61 and PR openai#63 - refine mixed-precision export notes to emphasize using int6/int8 byte savings to fund wider MLP capacity, based on PR openai#65 - update the current snapshot and research thesis so future agents do not over-focus on exporter-only ideas after the broader PR sweep

Major improvements based on competition intelligence (day 2 PRs): 1. Sliding window eval (stride=256): overlapping windows give each token more context. Free ~0.03 bpb improvement, zero artifact cost. Based on PRs openai#70, openai#77, openai#65. 2. Int6 quantization: configurable WEIGHT_QUANT_BITS (default 6) and EMBED_QUANT_BITS (default 8). Saves ~25% artifact space vs int8, allowing bigger models. Based on PRs openai#78, openai#70. 3. MLP 3x expansion: MLP_MULT_NUM=3 (up from 8/3). Wider MLP gives ~0.019 bpb improvement. Based on PRs openai#70, openai#66. 4. Default dim=512 with LR=0.03 (best config from experiments). 5. forward_logits() helper for sliding window (avoids model.forward which returns loss, not logits). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Submission: Wider MLP 3x + int6 quant + sliding window eval, val_bpb=…

2790ac0

…1.1659

jfprincz force-pushed the submission-jfprincz-1.1666 branch from 651ec21 to 2790ac0 Compare March 19, 2026 08:57

jfprincz changed the title ~~Submission: Wider MLP 3x + int6 quant + sliding window eval, val_bpb=1.1666~~ Submission: Wider MLP 3x + int6 quant + sliding window eval, val_bpb=1.1659 Mar 19, 2026

arjun-krishna1 mentioned this pull request Mar 19, 2026

ArjunAutoResearch: MLP 3x + STE int6 QAT + seq4096 + sliding window. val_bpb 1.1632 #66

Open

unixmadtoonslab mentioned this pull request Mar 19, 2026

Int6 11L + SmearGate + BigramHash + SWA + OrthoInit + MuonWD (val_bpb 1.1555) #76

Open

jordankzf mentioned this pull request Mar 19, 2026

Unofficial Leaderboard #83

Open

0hq added the record submission ready for review label Mar 19, 2026

rsavitt mentioned this pull request Mar 19, 2026

Record: Int6 MLP3x + STE QAT + Sliding Window (val_bpb=1.1594) #128

Open

5 tasks

notapplica mentioned this pull request Mar 20, 2026

Parameter Golf Live AI Commentary #140

Open

This was referenced Mar 20, 2026

Submission: OrthoInit + Int6 MLP3x + SmearGate + BigramHash (val_bpb: 1.1524) #164

Open

11-Layer Int6 + WD=0.04 + SWA + FA3 (val_bpb: 1.1318) #198

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submission: Wider MLP 3x + int6 quant + sliding window eval, val_bpb=1.1659#70

Submission: Wider MLP 3x + int6 quant + sliding window eval, val_bpb=1.1659#70
jfprincz wants to merge 1 commit intoopenai:mainfrom
jfprincz:submission-jfprincz-1.1666

jfprincz commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jfprincz commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Submission: Wider MLP 3x + int6 Quantization + Sliding Window Eval

Run command

Key metrics

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jfprincz commented Mar 19, 2026 •

edited

Loading