[WIP] SubSixteen: Ternary QAT + Depth Recurrence + TTT (val_bpb pending) by TevBenji · Pull Request #69 · openai/parameter-golf

TevBenji · 2026-03-19T08:52:18Z

Summary

val_bpb: pending 8×H100 training run (targeting sub-1.10)
Artifact size: ~3.7MB estimated (under 16,000,000 cap)
Architecture: 20.7M stored params → 55M effective params via depth recurrence

Stacks three orthogonal techniques not previously combined in a single submission:

Ternary QAT — BitLinear layers with {-1, 0, +1} weights via AbsMedian quantization + STE gradients. ~1.5 bits/weight vs 8 for int8, enabling 4-5× more parameters per byte. L1 regularization drives zero-heavy ternary distributions for better zlib ratios.
Depth recurrence — Prelude(1) + RecurrentBlockGroup(3 shared blocks × 10 loops) + Coda(1) = 32 effective transformer layers from 5 stored blocks. Per-loop LayerNorms + low-rank signals (rank 48) differentiate iterations. Progressive loss at loops 4 and 7 for gradient flow.
Test-time training — Per-document 1-step SGD on last 2 layers during eval using 30% prefix adaptation. No training data accessed during evaluation.

Configuration

Parameter	Value
model_dim	768
num_heads / num_kv_heads	12 / 3 (GQA)
head_dim	64
Effective depth	32 layers (1 + 3×10 + 1)
Stored params	~20.7M
Effective params	~55M
Quantization	Ternary {-1,0,+1} with continual QAT (switchover at 20%)
Compression	2-bit ternary packing + zlib
Tokenizer	SP-1024, tied embeddings
Code	1,285 lines

Key Metrics

Will be updated after 8×H100 RunPod training run

Metric	Value
Post-quant val_bpb	pending
Pre-quant val_bpb	pending
TTT val_bpb	pending
Training steps	pending
Train time	pending
Total artifact	pending

Validation

28/28 property-based tests passing (hypothesis, max_examples=100)
Covers: ternary round-trip, STE gradients, compression efficiency, recurrence depth, TTT reset, seed determinism, BPB validity, tied embeddings
Artifact size verified locally for both dim=768 (~3.7MB) and dim=1024 (~5.8MB)
torchrun compatible, __main__ guarded, syntax clean

Status

Single-file train_gpt.py (1,285 lines, under 1,500 limit)
All property tests passing
Artifact size under 16MB verified
README and submission.json prepared
8×H100 training run (waiting on RunPod compute credits)
Real val_bpb numbers
Statistical significance across 5+ seeds
Final submission artifacts (train.log, submission.json with real metrics)

…ests - Implement BitLinear class with ternary {-1, 0, +1} quantization-aware training - Use straight-through estimator for gradient flow during backward pass - Add AbsMedian quantization strategy with per-output-channel scaling - Implement L1 regularization helper to encourage ternary sparsity - Add comprehensive property-based tests using Hypothesis framework - Validate ternary quantization roundtrip, STE gradient flow, L1 proportionality - Verify drop-in compatibility with CastedLinear baseline - Test numerical stability and edge cases across random dimensions

…ockGroup - Add comprehensive property-based tests using Hypothesis for depth recurrence validation - Test recurrent depth multiplication (Property 8): verify each shared block executes exactly M times - Test per-loop differentiation (Property 9): validate M distinct LayerNorm instances and loop signal slices - Test input injection x0 influence (Property 10): ensure different x0 tensors produce different outputs - Test progressive loss generation (Property 11): verify auxiliary losses at designated loops - Update train_gpt.py to expose RecurrentBlockGroup for testing - Validates Requirements 3.2, 3.3, 3.4, 3.5, and 3.6 for recurrent architecture

- Add .hypothesis/ to ignore property-based test artifacts - Add .pytest_cache/ to ignore pytest cache files - Prevents test-generated files from being tracked in version control

…mpression - Add test_compression_properties.py with Property 14 (ternary packing efficiency) and Property 15 (compression pipeline round-trip) - Add test_hyperparams_properties.py with Property 12 (MLP hidden dimension validation) and Property 13 (model width expansion) - Validate ternary quantization achieves <=1.7 bits per weight with >=50% sparsity - Verify compression pipeline round-trip preserves model outputs within floating-point tolerance - Ensure MLP hidden dimensions scale correctly with model_dim and mlp_mult parameters - Update train_gpt.py to support property-based test imports and validation

…s and QAT schedule - Add test_optimizer_properties.py with Property 19 validating optimizer group assignment correctness for Muon (2D matrix params) and Adam (scalar params) - Add test_qat_schedule.py with Property 6 validating QAT state consistency before and after switchover - Update train_gpt.py to support QAT activation and learning rate schedule validation - Ensure all 2D BitLinear/CastedLinear weights from blocks are assigned to Muon optimizer group - Verify embedding, loop signals, and control tensor parameters are assigned to Adam optimizer group - Validate learning rate schedule shape maintains values in [0, 1] with proper warmup and cooldown phases - Validates Requirements 2.1, 2.2, 7.1, 7.2, and 7.3

…ss slow health checks - Add comprehensive property-based tests for Test-Time Training (TTTModule) covering prefix alignment and selective parameter adaptation - Create test_ttt_properties.py with 7 property tests validating TTT requirements including prefix fraction alignment, selective layer adaptation, and gradient flow - Suppress HealthCheck.too_slow in bitlinear property tests to prevent flaky test failures on slower systems - Update train_gpt.py with TTTModule implementation supporting configurable adaptation layers and prefix fractions - Add helper functions for building minimal GPT models and generating test tokens for efficient CPU-based testing

- Add test_integration_properties.py with property-based tests for tied embeddings, seed determinism, and BPB validity - Property 23: Verify tied embedding weight sharing (lm_head=None when tie_embeddings=True) - Property 24: Validate seed determinism produces bitwise-identical parameters across runs - Property 22: Ensure BPB computation validity for positive finite loss and token/byte counts - Suppress slow health checks for hypothesis tests to improve CI performance - Inline BitLinear implementation in train_gpt.py to reduce external dependencies - Update train_gpt.py module docstring to clarify 1500-line constraint - Add json import for potential metrics serialization

…ignore - Add comprehensive README documenting SubSixteen approach combining ternary QAT, depth recurrence, and test-time training - Include detailed parameter budget breakdown showing 20.7M stored params with 55M effective capacity - Document training schedule, configuration, and command for 8×H100 reproducibility - Add validation summary covering 28 property-based tests across all techniques - Update .gitignore to exclude .kiro/ directory for IDE artifacts

- Remove .kiro/ entry from gitignore - Preserve logs/ directory in ignore list - Simplify gitignore configuration for project cleanup

- Add RunPod 8xH100 setup and training script with full hyperparameter configuration - Add submission.json with SubSixteen model metadata and architecture description - Add placeholder training log for 8xH100 run results - Add complete train_gpt.py implementation featuring BitLinear ternary quantization, depth recurrence with shared blocks, test-time training, and progressive loss scheduling - Combines three orthogonal techniques (ternary QAT, recurrent depth, TTT) targeting sub-1.10 BPB performance

openai#77, openai#78) Analyzed techniques, ablations, and individual BPB contributions. Key finding: sliding window eval (~0.034) and int6+wider MLP (~0.029) are the dominant validated techniques. Several promising combinations remain untested across submissions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

TevBenji added 10 commits March 19, 2026 01:36

chore(.gitignore): Add test framework cache directories

2fd3a87

- Add .hypothesis/ to ignore property-based test artifacts - Add .pytest_cache/ to ignore pytest cache files - Prevents test-generated files from being tracked in version control

chore(.gitignore): Remove .kiro directory from ignore list

039b7eb

- Remove .kiro/ entry from gitignore - Preserve logs/ directory in ignore list - Simplify gitignore configuration for project cleanup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] SubSixteen: Ternary QAT + Depth Recurrence + TTT (val_bpb pending)#69

[WIP] SubSixteen: Ternary QAT + Depth Recurrence + TTT (val_bpb pending)#69
TevBenji wants to merge 10 commits intoopenai:mainfrom
TevBenji:subsixteen-submission

TevBenji commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TevBenji commented Mar 19, 2026

Summary

Configuration

Key Metrics

Validation

Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant