Submission: Wider MLP 3x + int6 quant + sliding window eval, val_bpb=1.1659#70
Open
jfprincz wants to merge 1 commit intoopenai:mainfrom
Open
Submission: Wider MLP 3x + int6 quant + sliding window eval, val_bpb=1.1659#70jfprincz wants to merge 1 commit intoopenai:mainfrom
jfprincz wants to merge 1 commit intoopenai:mainfrom
Conversation
651ec21 to
2790ac0
Compare
keshav55
added a commit
to keshav55/parameter-golf
that referenced
this pull request
Mar 19, 2026
Sliding window eval gives ~0.03 BPB free (proven by 5+ competitors): - stride=64 with seq_len=1024 → every token scored with 960+ context - forward_per_token_loss() method for per-token scoring - Only counts last `stride` positions per window (full context) - EVAL_STRIDE env var (0 = disable, default 64) MLP 3x gives ~0.02 BPB (proven by jfprincz, PR openai#70): - Hidden dim 1536 instead of 1024 - Needs INT6 middle layers to fit in 16MB (already implemented) Updated INTEL.md with latest competitive landscape (28→70+ PRs). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
arjun-krishna1
added a commit
to arjun-krishna1/parameter-golf
that referenced
this pull request
Mar 19, 2026
…_bpb 1.1652 Stack five techniques from systematic PR analysis: - MLP_MULT=3.0 (hidden=1536) for wider model capacity (from PR openai#70) - int6 per-row quant on MLP+attn, fp16 tied embed passthrough (from PR openai#70) - zstd-22 compression (from PR openai#70) - TRAIN_SEQ_LEN=4096 for richer per-step training signal (from PR openai#65) - Sliding window eval at stride=64 with compiled forward_logits Mean val_bpb=1.16520 (std=0.00102, t=92.15, p<<0.001). Three seeds: 1.16615, 1.16532, 1.16412. Artifact: 15.6MB (under 16,000,000 byte cap). Training: 9370 steps at 64ms/step on 8xH100 SXM. Made-with: Cursor
manfromnowhere143
added a commit
to manfromnowhere143/parameter-golf
that referenced
this pull request
Mar 19, 2026
Every submission scoring <1.18 BPB uses these EXACT settings. We were running defaults — now matching the winners: MUON_MOMENTUM: 0.95 → 0.99 (stronger smoothing) MATRIX_LR: 0.04 → 0.02 (halved, reduces quant gap) SCALAR_LR: 0.04 → 0.02 (halved) TIED_EMBED_LR: 0.05 → 0.03 (halved) WARMDOWN_ITERS: 1200 → 3000 (longer warmdown) MUON_WARMUP_START: 0.85 → 0.92 (higher start) MUON_WARMUP_STEPS: 500 → 1500 (3x longer warmup) These settings are proven by PR openai#64 (1.0149), openai#66 (1.1652), openai#70 (1.1659), openai#65 (1.1808) — all top submissions. Applied to both v5 and v6. Both compile, 1498 lines each.
South-33
added a commit
to South-33/parameter-golf
that referenced
this pull request
Mar 19, 2026
- add a PR-audit research log entry covering the clean takeaways from pull requests openai#36 through openai#70 - promote long-context training plus matching long-context eval as a first-class clean branch based on PR openai#61 and PR openai#63 - refine mixed-precision export notes to emphasize using int6/int8 byte savings to fund wider MLP capacity, based on PR openai#65 - update the current snapshot and research thesis so future agents do not over-focus on exporter-only ideas after the broader PR sweep
xskuy
pushed a commit
to xskuy/parameter-golf
that referenced
this pull request
Mar 19, 2026
Major improvements based on competition intelligence (day 2 PRs): 1. Sliding window eval (stride=256): overlapping windows give each token more context. Free ~0.03 bpb improvement, zero artifact cost. Based on PRs openai#70, openai#77, openai#65. 2. Int6 quantization: configurable WEIGHT_QUANT_BITS (default 6) and EMBED_QUANT_BITS (default 8). Saves ~25% artifact space vs int8, allowing bigger models. Based on PRs openai#78, openai#70. 3. MLP 3x expansion: MLP_MULT_NUM=3 (up from 8/3). Wider MLP gives ~0.019 bpb improvement. Based on PRs openai#70, openai#66. 4. Default dim=512 with LR=0.03 (best config from experiments). 5. forward_logits() helper for sliding window (avoids model.forward which returns loss, not logits). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 tasks
This was referenced Mar 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Submission: Wider MLP 3x + int6 Quantization + Sliding Window Eval
val_bpb: 1.1659 | Total size: 14,855,508 bytes (under 16MB)
Three orthogonal improvements over the naive baseline:
Run command
Key metrics
See README.md in the submission folder for full details.