Add mirror recurrence non-record submission by cschubiner · Pull Request #57 · openai/parameter-golf

cschubiner · 2026-03-19T06:42:00Z

Summary

add a non-record Apple Silicon MLX submission exploring mirrored depth recurrence
reuse 9 unique transformer blocks across 18 logical layers while keeping the int8+zlib artifact under the 16 MB cap
include the completed train log, submission metadata, and the serialization fix needed to exclude non-tensor schedule state from export

Run details

hardware: Apple M5 Max
tokenizer/data: fineweb10B_sp1024, full validation split, 1 training shard
final int8+zlib roundtrip: val_loss 3.77618886, val_bpb 2.23647175
bytes_total: 8040848

Notes

this is a non-record experiment, not a record-track claim
the architectural idea is mirrored recurrent depth: encoder uses blocks 0..8 and decoder reuses them in reverse order 8..0

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6853b49028

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-19T06:48:45Z

records/track_non_record_16mb/2026-03-19_MirrorRecurrence_MLX_M5Max_sp1024/train_gpt.py

+            self.encoder_schedule = [i % self.unique_layers for i in range(self.num_encoder_layers)]
+            mirrored_decoder = list(reversed(self.encoder_schedule[: self.num_skip_weights]))
+            trailing_decoder = [i % self.unique_layers for i in range(self.num_skip_weights, self.num_decoder_layers)]


Reject UNIQUE_LAYERS values the mirror schedule cannot reach

If a caller sets UNIQUE_LAYERS above the encoder depth but below NUM_LAYERS, this schedule never visits the extra blocks. For example, with NUM_LAYERS=18, UNIQUE_LAYERS=10, encoder_schedule is 0..8 and decoder_schedule is its reverse, so block 9 is allocated, counted, and serialized but never used in GPT.__call__. That silently changes the requested architecture for any UNIQUE_LAYERS > num_layers // 2 (except the unique_layers == num_layers fallback), so this should either be validated away or the schedule should be built to actually consume all unique blocks.

Useful? React with 👍 / 👎.

cschubiner · 2026-03-19T07:27:10Z

Closing this one as superseded by #56, which is the stronger Apple Silicon MLX local submission on the same account (val_bpb=1.8440 vs 2.2365). I explored additional local recurrent and width-scaling variants from this workspace, but none beat #56.

Add mirror recurrence non-record submission

6853b49

chatgpt-codex-connector bot reviewed Mar 19, 2026

View reviewed changes

cschubiner closed this Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mirror recurrence non-record submission#57

Add mirror recurrence non-record submission#57
cschubiner wants to merge 1 commit intoopenai:mainfrom
cschubiner:codex/parameter-golf-mlx-local-submission

cschubiner commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 19, 2026

Uh oh!

cschubiner commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cschubiner commented Mar 19, 2026

Summary

Run details

Notes

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

cschubiner commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant