Skip to content

Seq2048 + FP16 Tied Embedding + Tuned LR (val_bpb 1.2067)#63

Open
yahya010 wants to merge 1 commit intoopenai:mainfrom
yahya010:submission/seq2048-fp16emb
Open

Seq2048 + FP16 Tied Embedding + Tuned LR (val_bpb 1.2067)#63
yahya010 wants to merge 1 commit intoopenai:mainfrom
yahya010:submission/seq2048-fp16emb

Conversation

@yahya010
Copy link

@yahya010 yahya010 commented Mar 19, 2026

Summary

Five changes stacked on the Naive Baseline, achieving mean val_bpb 1.2067 (3 seeds):

  • Sequence length 2048 (from 1024): longer context per step outweighs reduced step count
  • FP16 tied embedding passthrough: eliminates ~0.007 BPB quant gap (dual-duty matrix is disproportionately sensitive to int8)
  • MLP hidden 960 (from 1024): trimmed to fit fp16 embedding under 16MB cap
  • Lower LRs: MATRIX_LR=0.032, SCALAR_LR=0.032, TIED_EMBED_LR=0.04
  • Warmdown 3600 (from 1200): ensures LR decay fires under wallclock cap

Results

Seed Steps val_loss val_bpb Artifact size
1337 10,408 2.0370 1.2064 15,632,845
42 10,403 2.0383 1.2072 15,635,682
3 10,375 2.0370 1.2064 15,633,777

Mean val_bpb: 1.2067 (std: 0.00044)
Improvement: 0.0353 nats (threshold: 0.005), t=-70.69, p << 0.01

Hardware: 8xH100 80GB HBM3, PyTorch 2.8.0+cu128, ~57.65ms/step.

Test plan

  • 3 seeds on 8xH100, all under 600s wallclock
  • All artifacts under 16MB cap
  • Statistical significance p << 0.01
  • Post-quant roundtrip validation matches

3-seed validation: mean 1.2067 BPB (std 0.00044), improvement 0.0353 nats
over baseline, t=-70.69 (p << 0.01).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
South-33 added a commit to South-33/parameter-golf that referenced this pull request Mar 19, 2026
- add a PR-audit research log entry covering the clean takeaways from pull requests openai#36 through openai#70
- promote long-context training plus matching long-context eval as a first-class clean branch based on PR openai#61 and PR openai#63
- refine mixed-precision export notes to emphasize using int6/int8 byte savings to fund wider MLP capacity, based on PR openai#65
- update the current snapshot and research thesis so future agents do not over-focus on exporter-only ideas after the broader PR sweep
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant