iso-bpe E8 vs scalar KV quant on Mistral-7B (n=60); 24-cell baseline not a valid HQMQ comparison#33
Draft
jagmarques wants to merge 4 commits into
Draft
iso-bpe E8 vs scalar KV quant on Mistral-7B (n=60); 24-cell baseline not a valid HQMQ comparison#33jagmarques wants to merge 4 commits into
jagmarques wants to merge 4 commits into
Conversation
Kernel: experiments/kaggle/nq_iso_e8_24cell/
- Tests the Hurwitz claim that 24-cell outperforms E8 at matched bpe.
- Three calibration-free quantizers (same Hadamard + RoPE pipeline):
E8 (8-dim blocks), 24-cell Hurwitz quaternion (4-dim blocks), scalar.
- Explicit bpe accounting per project rule:
E8 1-bit=1.125, E8 2-bit=2.125, 24-cell 1-round=1.2712,
24-cell 2-round=2.4175, Scalar 2-lev=1.125, Scalar 4-lev=2.125.
- 24-cell given MORE bits at every comparison point (fairest test).
- AQUA-iso paired-chunk PPL protocol on wikitext-2-raw-v1 (NF4 weights).
- CPU smoke test exits 0 (verified python3.11).
- GPU run queued on next Kaggle T4x2 quota window.
cell24_nearest_point was scaling each 4-dim block back by its L2 norm (nearest_unit * norms), giving the 24-cell 32 free fp32 magnitudes per 128-dim head. Those magnitudes were never charged in _bpe_info, silently biasing distortion down vs E8. Fix: return only the nearest unit vertex; the caller's per-head scale sc handles dequant, exactly matching E8 accounting. Both quantizers now carry one fp16 scale per head (0.125 bpe) + lattice indices, and nothing else. Smoke test (exit 0) confirms the fix is non-vacuous. New MSE shows 24-cell losing to E8 even at its higher bpe budget: Low budget (E8 1.125 bpe vs 24cell 1.271 bpe): E8 MSE=0.5947 < 24cell MSE=0.6302 Med budget (E8 2.125 bpe vs 24cell 2.417 bpe): E8 MSE=0.1450 < 24cell MSE=0.6206
…l-7B-v0.1 NousResearch/Mistral-7B-v0.1 does not exist on HF; mistralai/Mistral-7B-v0.1 is confirmed ungated (gated=False via HF API) and downloads anonymously on jagmardrop.
…ode)
Two bugs fixed per critic review:
(a) Direction-only was wrong: old code returned the nearest unit Hurwitz vertex
and discarded the per-block magnitude entirely, making it a direction-only
code, not HQMQ. Now separates r=||x|| (magnitude, br uniform bits per block)
and u=x/r (direction, nearest of 24 Hurwitz quaternions), reconstructing
x_hat = r_q * u_q per Swain et al. arXiv:2605.27646 Section 3.
(b) Broken 2-round residual dropped: the residual after round-1 has smaller norm
than a unit vertex, so adding another unit-norm codeword overshoots (59/60
segments worse). HQMQ is single-shot by construction; dropped the loop.
bpe now honestly charges magnitude bits:
HQMQ_br2: (log2(24)+2)/4 + 16/128 = 1.771 bpe
HQMQ_br3: (log2(24)+3)/4 + 16/128 = 2.021 bpe
Both sit below the E8 budget point directly above them, so E8 still wins if it
beats HQMQ on PPL at higher budget.
Smoke test updated: checks magnitude preservation, confirms old residual bug,
7/7 checks pass (exit 0).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Iso-bpe paired-PPL comparison of KV quantizers on Mistral-7B-v0.1, n=60 segments, Kaggle T4x2.
Clean finding (defensible): calibration-free E8 lattice substantially beats per-coordinate uniform scalar at aggressive budgets. At ~1.125 bpe E8 is +23.1% PPL vs scalar +100.7% (E8 wins 60/60 segments); at ~2.125 bpe they near-tie (E8 +1.34% vs scalar +1.67%). This corroborates the known result that E8 earns its place at low bit budgets.
NOT a valid HQMQ / Hurwitz comparison. The kernel's 24-cell path is a naive single 24-vertex direction snap, not the published HQMQ, which uses a product direction code (24 Hurwitz vertices times a secondary codebook = 24S codewords) plus median-multiplier outlier extraction, evaluated at 3.79-5 bpe. Our baseline is feature-stripped and below that budget band, so no E8-vs-HQMQ claim is made. A faithful HQMQ comparison at its published bpe is left as an open task.
Built and CPU-smoke-tested; GPU result above is from the secondary Kaggle account. Draft pending the scalar-finding review.