Skip to content

docs: turbo4 4.125 bpw + Jun-2026 rematch writeup; pin 3/4-bit centroid values#96

Draft
TheTom wants to merge 2 commits into
mainfrom
docs/turbo4-rematch-jun2026
Draft

docs: turbo4 4.125 bpw + Jun-2026 rematch writeup; pin 3/4-bit centroid values#96
TheTom wants to merge 2 commits into
mainfrom
docs/turbo4-rematch-jun2026

Conversation

@TheTom

@TheTom TheTom commented Jun 26, 2026

Copy link
Copy Markdown
Owner

Research-repo counterpart to llama-cpp-turboquant#197. Factual fixes + a writeup + a regression test, no algorithm change (the Python reference was already correct).

Changes

  • README: turbo4 is 4.125 bpw / 3.9×, not 4.25 / 3.8× — the no-QJL 4-bit block is norm(fp16) + 64 B indices = 66 B; the old number counted a phantom rnorm. (The C/CUDA port also dropped that dead field.)
  • docs/turbo4-rematch-2026-06.md: the June-2026 turbo4 work + head-to-head vs spiritbuun/master (KLD/prefill/decode at 4.125 bpw), the centroid-port-bug story, PDL backport + fused-MMA decode, and the asymmetric q8_0-K + turbo4-V result (−26% KLD; "V is free" holds at 4-bit, breaks at 2-bit).
  • tests/test_codebook.py: pin the exact 3-bit and 4-bit optimal centroids for d=128. The suite previously pinned 1-bit/2-bit but only sanity-checked 3/4-bit — exactly the gap that let a downstream C/CUDA port drift to a mis-scaled 4-bit table (outer 0.1739 vs correct 0.2402, ~2.1× excess MSE). Now any port can be checked against these.

Notes

  • The Python reference codebook was correct throughout; this is documentation + a guard, not a fix to optimal_centroids.
  • pytest tests/test_codebook.py → 26 passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant