Seq2048 + FP16 Tied Embedding + Tuned LR (val_bpb 1.2067) by yahya010 · Pull Request #63 · openai/parameter-golf

yahya010 · 2026-03-19T07:19:15Z

Summary

Five changes stacked on the Naive Baseline, achieving mean val_bpb 1.2067 (3 seeds):

Sequence length 2048 (from 1024): longer context per step outweighs reduced step count
FP16 tied embedding passthrough: eliminates ~0.007 BPB quant gap (dual-duty matrix is disproportionately sensitive to int8)
MLP hidden 960 (from 1024): trimmed to fit fp16 embedding under 16MB cap
Lower LRs: MATRIX_LR=0.032, SCALAR_LR=0.032, TIED_EMBED_LR=0.04
Warmdown 3600 (from 1200): ensures LR decay fires under wallclock cap

Results

Seed	Steps	val_loss	val_bpb	Artifact size
1337	10,408	2.0370	1.2064	15,632,845
42	10,403	2.0383	1.2072	15,635,682
3	10,375	2.0370	1.2064	15,633,777

Mean val_bpb: 1.2067 (std: 0.00044)
Improvement: 0.0353 nats (threshold: 0.005), t=-70.69, p << 0.01

Hardware: 8xH100 80GB HBM3, PyTorch 2.8.0+cu128, ~57.65ms/step.

Test plan

3 seeds on 8xH100, all under 600s wallclock
All artifacts under 16MB cap
Statistical significance p << 0.01
Post-quant roundtrip validation matches

3-seed validation: mean 1.2067 BPB (std 0.00044), improvement 0.0353 nats over baseline, t=-70.69 (p << 0.01). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- add a PR-audit research log entry covering the clean takeaways from pull requests openai#36 through openai#70 - promote long-context training plus matching long-context eval as a first-class clean branch based on PR openai#61 and PR openai#63 - refine mixed-precision export notes to emphasize using int6/int8 byte savings to fund wider MLP capacity, based on PR openai#65 - update the current snapshot and research thesis so future agents do not over-focus on exporter-only ideas after the broader PR sweep

Add Seq2048 + FP16 Tied Embedding submission (mean val_bpb 1.2067)

336157d

3-seed validation: mean 1.2067 BPB (std 0.00044), improvement 0.0353 nats over baseline, t=-70.69 (p << 0.01). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

devin-ai-integration bot mentioned this pull request Mar 19, 2026

11L + int6(1-9) + LR=0.025 + ROPE_BASE=200K: val_bpb=0.9924 — 0.2320 nats over baseline andrewgcodes/parameter-golf#1

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seq2048 + FP16 Tied Embedding + Tuned LR (val_bpb 1.2067)#63

Seq2048 + FP16 Tied Embedding + Tuned LR (val_bpb 1.2067)#63
yahya010 wants to merge 1 commit intoopenai:mainfrom
yahya010:submission/seq2048-fp16emb

yahya010 commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yahya010 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yahya010 commented Mar 19, 2026 •

edited

Loading