openai · mrdavtan · Mar 20, 2026 · Mar 20, 2026
diff --git a/records/track_10min_16mb/2026-03-19_QAT_Ablation/README.md b/records/track_10min_16mb/2026-03-19_QAT_Ablation/README.md
@@ -0,0 +1,113 @@
+# 2026-03-19_QAT_Ablation
+
+*Non-record: Does int8 quantization-aware training improve post-roundtrip val_bpb?*
+
+**Answer: No — the overhead costs more than it recovers.**
+
+---
+
+## Question
+
+The baseline loses ~0.007 BPB in the int8+zlib export step because bf16-trained weights are rounded cold onto the int8 grid. Every leaderboard entry so far attacks this gap indirectly — aggressive warmdown for tighter weight distributions, FP16 embedding bypass, or alternative quantization formats (int6). Nobody has trained directly against the int8 quantization grid.
+
+This submission tests whether QAT (straight-through fake-quantize matching the export pipeline exactly) recovers some of that gap. The experiment isolates QAT as the only variable — baseline architecture, baseline hyperparameters, no other changes.
+
+---
+
+## Method
+
+A `fake_quantize_int8_per_row` function is inserted into `CastedLinear.forward`. It matches the export pipeline's `quantize_float_tensor` exactly:
+- Same `INT8_CLIP_Q = 0.9999984` percentile clipping via `torch.quantile`
+- Same per-row scale: `clip_abs / 127.0`
+- Same rounding: `round().clamp(-127, 127)`
+- Straight-through estimator: gradients pass through as if no quantization happened
+
+**Schedule:** QAT activates at 30% of training steps (~step 6,000). Training runs bf16-only before that to let the loss landscape stabilize.
+
+**No other changes.** Architecture is 9L×512d, all hyperparameters are baseline defaults (WARMDOWN_ITERS=1200, MATRIX_LR=0.04, etc).
+
+---
+
+## Results
+
+| Metric | SlidingWindowEval (no QAT) | This run (QAT) |
+|--------|---------------------------|----------------|
+| Steps completed | 13,450 | 8,011 |
+| step_avg | 44.6ms | 75.2ms (64.5 pre-QAT, 77+ post-QAT) |
+| Pre-quant val_bpb (standard eval) | 1.2196 | 1.2327 |
+| **Post-quant val_bpb (sliding window)** | **1.1925** | **1.2052** |
+| Artifact bytes | 15,874,829 | 15,868,103 |
+| Eval time | 70s | 75s |
+
+**val_bpb: 1.2052 vs 1.1925 — QAT is 0.013 worse.**
+
+---
+
+## Why it didn't work
+
+The result is **not** evidence that QAT is a bad idea. It's evidence that **exact percentile-matching QAT is too expensive for int8 in this competition format.**
+
+### The core problem: `torch.quantile` overhead
+
+Matching the export pipeline exactly requires `torch.quantile(w.abs(), 0.9999984, dim=1)` on every weight matrix, every forward pass. This adds **~20% per-step overhead** (64ms → 77ms after QAT activates). Over a 600-second training budget, that costs ~2,000 training steps — roughly 1B fewer training tokens.
+
+The lost training tokens hurt more than the quantization gap recovery helps. The int8 quantization gap (~0.007 BPB) is smaller than the convergence loss from 40% fewer training steps.
+
+### Why this matters for the competition
+
+| Approach | Per-step cost | Quant gap reduction | Net effect |
+|----------|--------------|--------------------|----|
+| Aggressive warmdown (WD=20000) | 0% overhead | ~0.009 BPB | **Positive** |
+| FP16 tied embedding | 0% overhead, ~500KB artifact | ~0.004 BPB | **Positive** |
+| Int8 QAT (this submission) | ~20% overhead → ~2000 fewer steps | ~0.003-0.006 BPB theoretical | **Negative** (overhead > recovery) |
+| Int6 QAT (PRs #128, #137) | ~20% overhead | ~0.01+ BPB (larger gap) | **Likely positive** (larger gap to close) |
+
+### When QAT would work
+
+1. **With int6 quantization** — the quantization gap is larger (~0.01+ BPB), making the overhead worthwhile. PRs #128 and #137 confirm this with val_bpb 1.1594 and 1.1666 respectively.
+2. **With `amax` instead of `torch.quantile`** — near-zero overhead, but doesn't match the export pipeline exactly. The 0.0001% percentile difference may not matter in practice.
+3. **With a longer training budget** — if the wallclock cap were 30 minutes instead of 10, the overhead would be amortized over more steps.
+
+---
+
+## Graph priming finding
+
+An earlier version pre-primed the QAT compiled graph during warmup (running one forward/backward pass with `_qat=True`, then resetting to `_qat=False`). This caused `torch.compile` to use a slower compilation path for the non-QAT forward pass — step_avg was 65ms from step 1, even before QAT activated. Removing the graph priming restored baseline speed for the non-QAT phase. This is a useful finding for anyone implementing conditional code paths under `torch.compile(dynamic=False, fullgraph=True)`.
+
+---
+
+## Reproduction
+
+```bash
+cd /workspace
+git clone https://github.com/mrdavtan/parameter-golf.git
+cd parameter-golf && git checkout qat-sliding-window
+python3 data/cached_challenge_fineweb.py --variant sp1024
+
+# Set env vars
+export VOCAB_SIZE=1024 NUM_LAYERS=9 MODEL_DIM=512 NUM_HEADS=8 NUM_KV_HEADS=4
+export MLP_MULT=2 TIE_EMBEDDINGS=1 TRAIN_BATCH_TOKENS=524288 TRAIN_SEQ_LEN=1024
+export ITERATIONS=20000 WARMDOWN_ITERS=1200 WARMUP_STEPS=20
+export MAX_WALLCLOCK_SECONDS=600 TRAIN_LOG_EVERY=200 VAL_LOSS_EVERY=0
+export QAT=1 EVAL_STRIDE=64 EVAL_BATCH_SEQS=32 DOC_ISOLATED_EVAL=0
+export SEED=1337 RUN_ID=ablation_qat_slide64
+
+torchrun --standalone --nproc_per_node=8 \
+  records/track_10min_16mb/2026-03-19_QAT_Ablation/train_gpt.py
+```
+
+Hardware: 8×H100 SXM (RunPod), PyTorch 2.9.1+cu128
+
+---
+
+## Acknowledgments
+
+- `train_gpt.py` is based on the SlidingWindowEval entry (#50) by @mattqlf, which provides the sliding window evaluation infrastructure
+- Analysis informed by the WarmdownQuantization entry by @samuellarson (warmdown vs QAT tradeoffs) and the LoRA TTT ablation by @samacquaviva (doc-isolated eval gains)
+- Int6 QAT comparison data from PRs #128 (@rsavitt) and #137 (@abhishekgahlot2)
+- Built with [Claude Code](https://claude.com/claude-code)
+
+## Author
+
+GitHub: [@mrdavtan](https://github.com/mrdavtan)
+Date: 2026-03-20
diff --git a/records/track_10min_16mb/2026-03-19_QAT_Ablation/logs/ablation_qat_30pct_v3.txt b/records/track_10min_16mb/2026-03-19_QAT_Ablation/logs/ablation_qat_30pct_v3.txt
@@ -0,0 +1,170 @@
+logs/ablation_qat_30pct_v3.txt
+val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=./data/tokenizers/fineweb_1024_bpe.model
+train_loader:dataset:fineweb10B_sp1024 train_shards:80
+val_loader:shards pattern=./data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
+qat:True (activates at 30% of iterations = step 6000)
+model_params:17059912 (unique_layers:9 loops:1 effective_depth:9 lora_rank:0 lora_params:0)
+world_size:8 grad_accum_steps:1
+sdp_backends:cudnn=False flash=True mem_efficient=False math=False
+attention_mode:gqa num_heads:8 num_kv_heads:4
+tie_embeddings:True embed_lr:0.05 head_lr:0.0 matrix_lr:0.04 scalar_lr:0.04
+train_batch_tokens:524288 train_seq_len:1024 iterations:20000 warmup_steps:20 max_wallclock_seconds:600.000
+seed:1337
+warmup_step:1/20
+warmup_step:2/20
+warmup_step:3/20
+warmup_step:4/20
+warmup_step:5/20
+warmup_step:6/20
+warmup_step:7/20
+warmup_step:8/20
+warmup_step:9/20
+warmup_step:10/20
+warmup_step:11/20
+warmup_step:12/20
+warmup_step:13/20
+warmup_step:14/20
+warmup_step:15/20
+warmup_step:16/20
+warmup_step:17/20
+warmup_step:18/20
+warmup_step:19/20
+warmup_step:20/20
+step:1/20000 train_loss:6.9370 train_time:50ms step_avg:50.13ms
+step:2/20000 train_loss:16.8366 train_time:99ms step_avg:49.53ms
+step:3/20000 train_loss:8.7609 train_time:155ms step_avg:51.57ms
+step:4/20000 train_loss:6.6387 train_time:210ms step_avg:52.52ms
+step:5/20000 train_loss:6.6117 train_time:288ms step_avg:57.54ms
+step:6/20000 train_loss:7.4221 train_time:351ms step_avg:58.54ms
+step:7/20000 train_loss:6.3509 train_time:413ms step_avg:58.97ms
+step:8/20000 train_loss:6.1583 train_time:493ms step_avg:61.62ms
+step:9/20000 train_loss:6.0680 train_time:557ms step_avg:61.92ms
+step:10/20000 train_loss:5.9748 train_time:620ms step_avg:62.02ms
+step:200/20000 train_loss:2.8544 train_time:12864ms step_avg:64.32ms
+step:400/20000 train_loss:2.3536 train_time:25585ms step_avg:63.96ms
+step:600/20000 train_loss:2.5528 train_time:38924ms step_avg:64.87ms
+step:800/20000 train_loss:2.2956 train_time:52807ms step_avg:66.01ms
+step:1000/20000 train_loss:2.3710 train_time:66301ms step_avg:66.30ms
+step:1200/20000 train_loss:2.3861 train_time:79773ms step_avg:66.48ms
+step:1400/20000 train_loss:2.4330 train_time:92897ms step_avg:66.35ms
+step:1600/20000 train_loss:2.1007 train_time:105036ms step_avg:65.65ms
+step:1800/20000 train_loss:2.2012 train_time:117744ms step_avg:65.41ms
+step:2000/20000 train_loss:2.2521 train_time:131332ms step_avg:65.67ms
+step:2200/20000 train_loss:2.0783 train_time:144313ms step_avg:65.60ms
+step:2400/20000 train_loss:2.2024 train_time:157336ms step_avg:65.56ms
+step:2600/20000 train_loss:2.4112 train_time:169876ms step_avg:65.34ms
+step:2800/20000 train_loss:2.2358 train_time:183280ms step_avg:65.46ms
+step:3000/20000 train_loss:2.2263 train_time:196108ms step_avg:65.37ms
+step:3200/20000 train_loss:2.1873 train_time:209318ms step_avg:65.41ms
+step:3400/20000 train_loss:2.1570 train_time:222737ms step_avg:65.51ms
+step:3600/20000 train_loss:2.1152 train_time:235103ms step_avg:65.31ms
+step:3800/20000 train_loss:2.2241 train_time:247545ms step_avg:65.14ms
+step:4000/20000 train_loss:2.1641 train_time:259662ms step_avg:64.92ms
+step:4200/20000 train_loss:2.1776 train_time:274509ms step_avg:65.36ms
+step:4400/20000 train_loss:2.1126 train_time:287085ms step_avg:65.25ms
+step:4600/20000 train_loss:1.9722 train_time:299308ms step_avg:65.07ms
+step:4800/20000 train_loss:2.2631 train_time:311773ms step_avg:64.95ms
+step:5000/20000 train_loss:2.0304 train_time:324003ms step_avg:64.80ms
+step:5200/20000 train_loss:2.1743 train_time:336425ms step_avg:64.70ms
+step:5400/20000 train_loss:2.1880 train_time:349255ms step_avg:64.68ms
+step:5600/20000 train_loss:2.1843 train_time:362206ms step_avg:64.68ms
+step:5800/20000 train_loss:2.1458 train_time:374309ms step_avg:64.54ms
+qat_activated step:6000/20000
+step:6000/20000 train_loss:2.2221 train_time:386982ms step_avg:64.50ms
+step:6200/20000 train_loss:2.0886 train_time:481881ms step_avg:77.72ms
+step:6400/20000 train_loss:2.1616 train_time:494552ms step_avg:77.27ms
+step:6600/20000 train_loss:2.1236 train_time:507596ms step_avg:76.91ms
+step:6800/20000 train_loss:2.1860 train_time:520990ms step_avg:76.62ms
+step:7000/20000 train_loss:2.2116 train_time:534315ms step_avg:76.33ms
+step:7200/20000 train_loss:2.1757 train_time:547515ms step_avg:76.04ms
+step:7400/20000 train_loss:2.0941 train_time:560062ms step_avg:75.68ms
+step:7600/20000 train_loss:1.9693 train_time:572584ms step_avg:75.34ms
+step:7800/20000 train_loss:2.1111 train_time:584859ms step_avg:74.98ms
+step:8000/20000 train_loss:2.0699 train_time:598143ms step_avg:74.77ms
+step:8011/20000 val_loss:2.0814 val_bpb:1.2327 train_time:602064ms step_avg:75.15ms
+stopping_early: wallclock_cap train_time:602064ms step:8011/20000
+peak memory allocated: 10119 MiB reserved: 10424 MiB
+Serialized model: 67224983 bytes
+Code size: 63581 bytes
+Total submission size: 67288564 bytes
+Serialized model int8+zlib: 15804522 bytes (payload:17178912 raw_torch:17224025 payload_ratio:3.91x)
+Total submission size int8+zlib: 15868103 bytes
+final_eval_mode:sliding_window stride:64 batch_seqs:32 doc_isolated:False
+sliding_eval [  0.0%] 32/121134 windows running_bpb=1.203131
+sliding_eval [  1.3%] 1632/121134 windows running_bpb=1.195758
+sliding_eval [  2.7%] 3232/121134 windows running_bpb=1.197809
+sliding_eval [  4.0%] 4832/121134 windows running_bpb=1.192269
+sliding_eval [  5.3%] 6432/121134 windows running_bpb=1.204374
+sliding_eval [  6.6%] 8032/121134 windows running_bpb=1.205619
+sliding_eval [  8.0%] 9632/121134 windows running_bpb=1.207984
+sliding_eval [  9.3%] 11232/121134 windows running_bpb=1.203688
+sliding_eval [ 10.6%] 12832/121134 windows running_bpb=1.201027
+sliding_eval [ 11.9%] 14432/121134 windows running_bpb=1.202773
+sliding_eval [ 13.2%] 16032/121134 windows running_bpb=1.211795
+sliding_eval [ 14.6%] 17632/121134 windows running_bpb=1.210786
+sliding_eval [ 15.9%] 19232/121134 windows running_bpb=1.212110
+sliding_eval [ 17.2%] 20832/121134 windows running_bpb=1.210688
+sliding_eval [ 18.5%] 22432/121134 windows running_bpb=1.209239
+sliding_eval [ 19.8%] 24032/121134 windows running_bpb=1.209753
+sliding_eval [ 21.2%] 25632/121134 windows running_bpb=1.210995
+sliding_eval [ 22.5%] 27232/121134 windows running_bpb=1.211682
+sliding_eval [ 23.8%] 28832/121134 windows running_bpb=1.217680
+sliding_eval [ 25.1%] 30432/121134 windows running_bpb=1.214952
+sliding_eval [ 26.4%] 32032/121134 windows running_bpb=1.216245
+sliding_eval [ 27.8%] 33632/121134 windows running_bpb=1.214992
+sliding_eval [ 29.1%] 35232/121134 windows running_bpb=1.214376
+sliding_eval [ 30.4%] 36832/121134 windows running_bpb=1.214029
+sliding_eval [ 31.7%] 38432/121134 windows running_bpb=1.214833
+sliding_eval [ 33.0%] 40032/121134 windows running_bpb=1.212650
+sliding_eval [ 34.4%] 41632/121134 windows running_bpb=1.211820
+sliding_eval [ 35.7%] 43232/121134 windows running_bpb=1.212058
+sliding_eval [ 37.0%] 44832/121134 windows running_bpb=1.211127
+sliding_eval [ 38.3%] 46432/121134 windows running_bpb=1.210929
+sliding_eval [ 39.7%] 48032/121134 windows running_bpb=1.210144
+sliding_eval [ 41.0%] 49632/121134 windows running_bpb=1.211272
+sliding_eval [ 42.3%] 51232/121134 windows running_bpb=1.212395
+sliding_eval [ 43.6%] 52832/121134 windows running_bpb=1.212887
+sliding_eval [ 44.9%] 54432/121134 windows running_bpb=1.212423
+sliding_eval [ 46.3%] 56032/121134 windows running_bpb=1.212843
+sliding_eval [ 47.6%] 57632/121134 windows running_bpb=1.211920
+sliding_eval [ 48.9%] 59232/121134 windows running_bpb=1.208484
+sliding_eval [ 50.2%] 60832/121134 windows running_bpb=1.208299
+sliding_eval [ 51.5%] 62432/121134 windows running_bpb=1.209154
+sliding_eval [ 52.9%] 64032/121134 windows running_bpb=1.209189
+sliding_eval [ 54.2%] 65632/121134 windows running_bpb=1.209028
+sliding_eval [ 55.5%] 67232/121134 windows running_bpb=1.207807
+sliding_eval [ 56.8%] 68832/121134 windows running_bpb=1.207306
+sliding_eval [ 58.1%] 70432/121134 windows running_bpb=1.206638
+sliding_eval [ 59.5%] 72032/121134 windows running_bpb=1.206674
+sliding_eval [ 60.8%] 73632/121134 windows running_bpb=1.206673
+sliding_eval [ 62.1%] 75232/121134 windows running_bpb=1.206874
+sliding_eval [ 63.4%] 76832/121134 windows running_bpb=1.206570
+sliding_eval [ 64.7%] 78432/121134 windows running_bpb=1.207203
+sliding_eval [ 66.1%] 80032/121134 windows running_bpb=1.207558
+sliding_eval [ 67.4%] 81632/121134 windows running_bpb=1.207306
+sliding_eval [ 68.7%] 83232/121134 windows running_bpb=1.208383
+sliding_eval [ 70.0%] 84832/121134 windows running_bpb=1.210262
+sliding_eval [ 71.4%] 86432/121134 windows running_bpb=1.209648
+sliding_eval [ 72.7%] 88032/121134 windows running_bpb=1.210599
+sliding_eval [ 74.0%] 89632/121134 windows running_bpb=1.210889
+sliding_eval [ 75.3%] 91232/121134 windows running_bpb=1.210974
+sliding_eval [ 76.6%] 92832/121134 windows running_bpb=1.210459
+sliding_eval [ 78.0%] 94432/121134 windows running_bpb=1.210782
+sliding_eval [ 79.3%] 96032/121134 windows running_bpb=1.210236
+sliding_eval [ 80.6%] 97632/121134 windows running_bpb=1.213084
+sliding_eval [ 81.9%] 99232/121134 windows running_bpb=1.213028
+sliding_eval [ 83.2%] 100832/121134 windows running_bpb=1.213101
+sliding_eval [ 84.6%] 102432/121134 windows running_bpb=1.212770
+sliding_eval [ 85.9%] 104032/121134 windows running_bpb=1.212278
+sliding_eval [ 87.2%] 105632/121134 windows running_bpb=1.211539
+sliding_eval [ 88.5%] 107232/121134 windows running_bpb=1.211460
+sliding_eval [ 89.8%] 108832/121134 windows running_bpb=1.212012
+sliding_eval [ 91.2%] 110432/121134 windows running_bpb=1.212038
+sliding_eval [ 92.5%] 112032/121134 windows running_bpb=1.211971
+sliding_eval [ 93.8%] 113632/121134 windows running_bpb=1.212494
+sliding_eval [ 95.1%] 115232/121134 windows running_bpb=1.212217
+sliding_eval [ 96.4%] 116832/121134 windows running_bpb=1.211896
+sliding_eval [ 97.8%] 118432/121134 windows running_bpb=1.212275
+sliding_eval [ 99.1%] 120032/121134 windows running_bpb=1.212335
+final_int8_zlib_roundtrip val_loss:2.0349 val_bpb:1.2052 eval_time:74761ms
+final_int8_zlib_roundtrip_exact val_loss:2.03485247 val_bpb:1.20515425
diff --git a/records/track_10min_16mb/2026-03-19_QAT_Ablation/run_ablation.sh b/records/track_10min_16mb/2026-03-19_QAT_Ablation/run_ablation.sh
@@ -0,0 +1,76 @@
+#!/usr/bin/env bash
+# QAT Ablation — isolate QAT's effect on post-quantization val_bpb
+#
+# 4 runs, one variable (QAT on/off), two eval modes:
+#   1. Baseline (no QAT, standard eval)      — reproduces naive baseline
+#   2. Baseline (no QAT, sliding eval)       — reproduces SlidingWindowEval entry
+#   3. QAT (sliding eval)                    — measures QAT's contribution
+#   4. QAT (sliding eval, doc-isolated)      — measures doc isolation on top
+#
+# Architecture: 9L×512d, default hyperparams throughout.
+# No FP16 embed, no warmdown tuning, no leader hyperparams.
+#
+# Usage:
+#   cd /workspace/parameter-golf
+#   bash records/track_10min_16mb/2026-03-19_QAT_Ablation/run_ablation.sh
+
+set -euo pipefail
+
+SCRIPT="records/track_10min_16mb/2026-03-19_QAT_Ablation/train_gpt.py"
+
+# Baseline architecture + training — all defaults
+BASE="VOCAB_SIZE=1024 NUM_LAYERS=9 MODEL_DIM=512 NUM_HEADS=8 NUM_KV_HEADS=4"
+BASE="$BASE MLP_MULT=2 TIE_EMBEDDINGS=1 TRAIN_BATCH_TOKENS=524288 TRAIN_SEQ_LEN=1024"
+BASE="$BASE ITERATIONS=20000 WARMDOWN_ITERS=1200 WARMUP_STEPS=20"
+BASE="$BASE MAX_WALLCLOCK_SECONDS=600 TRAIN_LOG_EVERY=200 VAL_LOSS_EVERY=0"
+BASE="$BASE NUM_LOOPS=1 LORA_RANK=0 FP16_EMBED_EXPORT=0 SEED=1337"
+
+echo "============================================"
+echo "QAT Ablation — 4 runs, 8×H100"
+echo "============================================"
+
+# Run 1: Baseline — no QAT, standard eval (non-overlapping)
+echo ""
+echo ">>> Run 1/4: Baseline (no QAT, standard eval)"
+env $BASE RUN_ID=ablation_baseline QAT=0 EVAL_STRIDE=0 DOC_ISOLATED_EVAL=0 \
+  torchrun --standalone --nproc_per_node=8 "$SCRIPT"
+cp logs/ablation_baseline.txt records/track_10min_16mb/2026-03-19_QAT_Ablation/logs/ 2>/dev/null || true
+echo ">>> Run 1 done: $(grep 'final_int8_zlib_roundtrip_exact' logs/ablation_baseline.txt | tail -1)"
+
+# Run 2: No QAT, sliding eval (stride=64)
+echo ""
+echo ">>> Run 2/4: No QAT, sliding eval (stride=64)"
+env $BASE RUN_ID=ablation_slide64 QAT=0 EVAL_STRIDE=64 EVAL_BATCH_SEQS=32 DOC_ISOLATED_EVAL=0 \
+  torchrun --standalone --nproc_per_node=8 "$SCRIPT"
+cp logs/ablation_slide64.txt records/track_10min_16mb/2026-03-19_QAT_Ablation/logs/ 2>/dev/null || true
+echo ">>> Run 2 done: $(grep 'final_int8_zlib_roundtrip_exact' logs/ablation_slide64.txt | tail -1)"
+
+# Run 3: QAT + sliding eval (stride=64)
+echo ""
+echo ">>> Run 3/4: QAT + sliding eval (stride=64)"
+env $BASE RUN_ID=ablation_qat_slide64 QAT=1 EVAL_STRIDE=64 EVAL_BATCH_SEQS=32 DOC_ISOLATED_EVAL=0 \
+  torchrun --standalone --nproc_per_node=8 "$SCRIPT"
+cp logs/ablation_qat_slide64.txt records/track_10min_16mb/2026-03-19_QAT_Ablation/logs/ 2>/dev/null || true
+echo ">>> Run 3 done: $(grep 'final_int8_zlib_roundtrip_exact' logs/ablation_qat_slide64.txt | tail -1)"
+
+# Run 4: QAT + sliding eval + doc-isolated
+echo ""
+echo ">>> Run 4/4: QAT + sliding eval + doc-isolated"
+env $BASE RUN_ID=ablation_qat_slide64_dociso QAT=1 EVAL_STRIDE=64 EVAL_BATCH_SEQS=32 DOC_ISOLATED_EVAL=1 \
+  torchrun --standalone --nproc_per_node=8 "$SCRIPT"
+cp logs/ablation_qat_slide64_dociso.txt records/track_10min_16mb/2026-03-19_QAT_Ablation/logs/ 2>/dev/null || true
+echo ">>> Run 4 done: $(grep 'final_int8_zlib_roundtrip_exact' logs/ablation_qat_slide64_dociso.txt | tail -1)"
+
+echo ""
+echo "============================================"
+echo "ABLATION RESULTS"
+echo "============================================"
+for LOG in ablation_baseline ablation_slide64 ablation_qat_slide64 ablation_qat_slide64_dociso; do
+  echo "$LOG: $(grep 'final_int8_zlib_roundtrip_exact' logs/${LOG}.txt 2>/dev/null | tail -1)"
+done
+echo ""
+echo "Expected pattern:"
+echo "  baseline          ~1.2244  (reproduces naive baseline)"
+echo "  slide64           ~1.1925  (reproduces SlidingWindowEval entry)"
+echo "  qat+slide64       < slide64 if QAT helps"
+echo "  qat+slide64+doc   < qat+slide64 if doc isolation helps"