Skip to content

Add mirror recurrence non-record submission#57

Closed
cschubiner wants to merge 1 commit intoopenai:mainfrom
cschubiner:codex/parameter-golf-mlx-local-submission
Closed

Add mirror recurrence non-record submission#57
cschubiner wants to merge 1 commit intoopenai:mainfrom
cschubiner:codex/parameter-golf-mlx-local-submission

Conversation

@cschubiner
Copy link

Summary

  • add a non-record Apple Silicon MLX submission exploring mirrored depth recurrence
  • reuse 9 unique transformer blocks across 18 logical layers while keeping the int8+zlib artifact under the 16 MB cap
  • include the completed train log, submission metadata, and the serialization fix needed to exclude non-tensor schedule state from export

Run details

  • hardware: Apple M5 Max
  • tokenizer/data: fineweb10B_sp1024, full validation split, 1 training shard
  • final int8+zlib roundtrip: val_loss 3.77618886, val_bpb 2.23647175
  • bytes_total: 8040848

Notes

  • this is a non-record experiment, not a record-track claim
  • the architectural idea is mirrored recurrent depth: encoder uses blocks 0..8 and decoder reuses them in reverse order 8..0

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6853b49028

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +410 to +412
self.encoder_schedule = [i % self.unique_layers for i in range(self.num_encoder_layers)]
mirrored_decoder = list(reversed(self.encoder_schedule[: self.num_skip_weights]))
trailing_decoder = [i % self.unique_layers for i in range(self.num_skip_weights, self.num_decoder_layers)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject UNIQUE_LAYERS values the mirror schedule cannot reach

If a caller sets UNIQUE_LAYERS above the encoder depth but below NUM_LAYERS, this schedule never visits the extra blocks. For example, with NUM_LAYERS=18, UNIQUE_LAYERS=10, encoder_schedule is 0..8 and decoder_schedule is its reverse, so block 9 is allocated, counted, and serialized but never used in GPT.__call__. That silently changes the requested architecture for any UNIQUE_LAYERS > num_layers // 2 (except the unique_layers == num_layers fallback), so this should either be validated away or the schedule should be built to actually consume all unique blocks.

Useful? React with 👍 / 👎.

@cschubiner
Copy link
Author

Closing this one as superseded by #56, which is the stronger Apple Silicon MLX local submission on the same account (val_bpb=1.8440 vs 2.2365). I explored additional local recurrent and width-scaling variants from this workspace, but none beat #56.

@cschubiner cschubiner closed this Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant