Draft: SOTA+ TTT + RoPE50K + EMA + Curriculum (pending H100 run) by 0xjaishy · Pull Request #223 · openai/parameter-golf

0xjaishy · 2026-03-20T15:36:03Z

SOTA+ submission: PR #198 base + 4 untried improvements

Target: sub-1.13 BPB (pending 8xH100 run)

Base: PR #198 Stack (current #1 at 1.1326 BPB)

11L, 512d, MLP 3x, SmearGate + BigramHash + OrthoInit
Mixed int6/int8 quantization + zstd-22
WD=0.04, Muon (momentum 0.99), sliding window eval (s64)
FA3 with PyTorch SDPA fallback

New techniques (none tried on the #198 stack before)

RoPE base 50K — smoother position interpolation at seq2048 (free, ~-0.002)
LAWA-EMA — exponential moving average (decay=0.995) replaces periodic SWA (~-0.002)
Context-length curriculum — seq1024 for first 60% of wallclock (60% more steps), then seq2048 (~-0.003)
Full-model SGD TTT — 1 epoch SGD (lr=3e-4) on val data before scoring (~-0.001 to -0.033)

Architecture

26.8M params, ~15.7MB artifact
All hyperparameters baked in — just torchrun --standalone --nproc_per_node=8 train_gpt.py

Expected outcome

Scenario	BPB	Note
Conservative	~1.125	TTT gain ~0.001 (overlaps SmearGate)
Moderate	~1.116	TTT gain ~0.010
Aggressive	<1.10	TTT gain ~0.033 (full effect)

Status

Local CPU smoke test (syntax, forward pass, quant roundtrip)
8xH100 SXM training run
3-seed verification

… validation script - records/track_10min_16mb/2026-03-20_AllInOne_SmearGate_Int6QAT_SlidingWindow/ - scripts/validate_submission.py (CPU checks, no CUDA) - docs/WITHOUT_GRANT.md, docs/GRANT_APPLICATION.md Made-with: Cursor

Made-with: Cursor

…nimal) Made-with: Cursor

Rebuild from the proven openai#1 submission (PR openai#198, 1.1326 BPB) and stack four untried improvements: - RoPE base 50K (smoother position interpolation at seq2048) - LAWA-EMA replacing periodic SWA (continuous exponential moving average) - Context-length curriculum (seq1024 early for 60% more steps, seq2048 late) - Full-model SGD test-time training (1 epoch, lr=3e-4, on val data) Architecture: 11L 512d MLP3x SmearGate BigramHash OrthoInit WD=0.04 Artifact: ~15.7MB (int6+zstd-22), 26.8M params, FA3 with SDPA fallback Pending 8xH100 run. Target: sub-1.13 BPB. Made-with: Cursor

Single map of GitHub vs Mac workspace; scripts are not part of the CUDA submission artifact but back up local workflow. Made-with: Cursor

…ission - Document one clone only (parameter-golf-fork); data/.venv stay local gitignored - README: sample_fineweb_tokens, Mac submission notes, prep checklist - HANDOFF: remove duplicate Desktop workspace; point to this repo only Made-with: Cursor

…DOFF validate cmd Made-with: Cursor

shivashish jaishy added 4 commits March 21, 2026 00:33

docs: link draft PR openai#223 in WITHOUT_GRANT

75fe166

Made-with: Cursor

chore: drop grant/validator extras from submission branch (keep PR mi…

90f5575

…nimal) Made-with: Cursor

0xjaishy changed the title ~~Draft: AllInOne SmearGate + Int6 QAT + Sliding Window (pending H100 run)~~ Draft: SOTA+ TTT + RoPE50K + EMA + Curriculum (pending H100 run) Mar 20, 2026

shivashish jaishy added 3 commits March 21, 2026 01:53

docs: add HANDOFF.md + local helper scripts (smoke test, shard sampling)

42b1be1

Single map of GitHub vs Mac workspace; scripts are not part of the CUDA submission artifact but back up local workflow. Made-with: Cursor

fix(scripts): validate SOTA train_gpt (mixed int6, default path); HAN…

4d22dc5

…DOFF validate cmd Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: SOTA+ TTT + RoPE50K + EMA + Curriculum (pending H100 run)#223

Draft: SOTA+ TTT + RoPE50K + EMA + Curriculum (pending H100 run)#223
0xjaishy wants to merge 7 commits intoopenai:mainfrom
0xjaishy:submission/allinone-smeargate-int6qat-slidingwindow

0xjaishy commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0xjaishy commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SOTA+ submission: PR #198 base + 4 untried improvements

Base: PR #198 Stack (current #1 at 1.1326 BPB)

New techniques (none tried on the #198 stack before)

Architecture

Expected outcome

Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

0xjaishy commented Mar 20, 2026 •

edited

Loading