Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440)#56
Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440)#56cschubiner wants to merge 4 commits intoopenai:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c7ab65cd40
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
records/track_non_record_16mb/2026-03-19_Deep14x416_KV2_SP1024_MLX_750it/README.md
Outdated
Show resolved
Hide resolved
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f9ce10b899
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
records/track_non_record_16mb/2026-03-19_Deep14x416_KV2_SP1024_MLX_750it/README.md
Outdated
Show resolved
Hide resolved
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91592141cf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
records/track_non_record_16mb/2026-03-19_Deep14x416_KV2_SP1024_MLX_750it/README.md
Outdated
Show resolved
Hide resolved
|
Addressed the remaining reproducibility gap in e13e8db. The record now includes |
This PR adds a non-record unlimited-compute submission under
records/track_non_record_16mb/.The user-facing effect is a new reproducible Apple Silicon MLX result in the repository: a deeper/narrower SP-1024 model with 14 layers at width 416 and 2 KV heads, trained locally on an Apple M5 Max for 750 steps against a 10-shard FineWeb subset. The final post-quantized roundtrip metric recorded in the included log is
val_bpb=1.84404368, with an int8+zlib model payload of12,339,367bytes and total submission size of12,388,989bytes.The underlying motivation was to explore a simple parameter-budget trade: reduce width slightly, add depth, and use more aggressive KV sharing while staying well under the 16 MB artifact limit. This submission keeps the trainer straightforward by reusing the repository
train_gpt_mlx.pysnapshot exactly, and only changes the runtime configuration through environment variables. To make full validation tractable on local Apple Silicon hardware, the run also uses a larger validation batch and logit chunking; these settings affect execution efficiency, not the metric definition itself.The root cause this PR addresses is not a bug in the repo but a gap in the records folder: there was no local Apple Silicon submission documenting this deeper/narrower 14x416 KV2 configuration and its measured result. The fix is therefore additive only: a new record folder containing the copied training script, the exact train log, a README with command/config details, and
submission.jsonmetadata.Validation for this PR was done by actually running the training job to completion locally, then checking the copied script compiles with
python -m py_compile. The includedtrain.logcontains the full training trace, the pre-quant validation result, the compressed model size, and the finalfinal_int8_zlib_roundtrip_exactmetric.