Add mirror recurrence non-record submission#57
Add mirror recurrence non-record submission#57cschubiner wants to merge 1 commit intoopenai:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6853b49028
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self.encoder_schedule = [i % self.unique_layers for i in range(self.num_encoder_layers)] | ||
| mirrored_decoder = list(reversed(self.encoder_schedule[: self.num_skip_weights])) | ||
| trailing_decoder = [i % self.unique_layers for i in range(self.num_skip_weights, self.num_decoder_layers)] |
There was a problem hiding this comment.
Reject UNIQUE_LAYERS values the mirror schedule cannot reach
If a caller sets UNIQUE_LAYERS above the encoder depth but below NUM_LAYERS, this schedule never visits the extra blocks. For example, with NUM_LAYERS=18, UNIQUE_LAYERS=10, encoder_schedule is 0..8 and decoder_schedule is its reverse, so block 9 is allocated, counted, and serialized but never used in GPT.__call__. That silently changes the requested architecture for any UNIQUE_LAYERS > num_layers // 2 (except the unique_layers == num_layers fallback), so this should either be validated away or the schedule should be built to actually consume all unique blocks.
Useful? React with 👍 / 👎.
Summary
Run details
Notes