Skip to content

RQZ-Golf v1: Depth recurrence for parameter efficiency#54

Open
TheCause wants to merge 1 commit intoopenai:mainfrom
TheCause:rqz-golf-v1
Open

RQZ-Golf v1: Depth recurrence for parameter efficiency#54
TheCause wants to merge 1 commit intoopenai:mainfrom
TheCause:rqz-golf-v1

Conversation

@TheCause
Copy link

Non-record experimental submission

Approach: Replace some unique layers with a single shared recurrent layer applied K times, saving parameters while increasing effective depth.

Architecture

  • 7 unique layers (encoder/decoder with U-Net skip connections)
  • 1 recurrent layer applied K=3 times with learned iteration embeddings
  • Effective depth: 10 layers (7 unique + 3 recurrent) vs baseline 9
  • Residual scaling by 1/sqrt(K) for stability

Key ideas

  1. Depth recurrence: shared weights across K passes saves ~20% parameters
  2. Iteration embeddings: per-pass learned vector (psi_k) for pass-awareness
  3. Test-time compute: increase K at inference (K'>K) for better BPB without changing model size

Status

  • Preliminary baseline: 1.5283 BPB (1 shard, 1xA100)
  • RQZ-Golf architecture implemented, not yet benchmarked on full dataset
  • Requesting compute credits for full evaluation

Theoretical basis

Inspired by Universal Transformers (Dehghani 2019) and Deep Equilibrium Models (Bai 2019).

Non-record experimental submission.
Architecture: 7 unique layers + 1 shared recurrent layer (K=3 passes)
with iteration embeddings and 1/sqrt(K) scaling.
Test-time compute: increase K at inference without changing model size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants