Qwen3-8B faithful-HQMQ vs E8 Pareto comparison by jagmarques · Pull Request #36 · jagmarques/nexusquant

jagmarques · 2026-06-15T21:40:40Z

Summary

Extends the faithful-HQMQ Mistral kernel to Hurwitz's own evaluation model (Qwen3-8B). Measures the PPL-vs-bpe Pareto frontier for HQMQ (s96_r6 ~5 bpe, s96_r4 ~3.79 bpe) and E8 (K4V2, K3V2) under identical harness conditions.

HQMQ algorithm unchanged from spec (24*S product code, Med3x C=3, no rotation, per arXiv:2605.27646)
Model: Qwen/Qwen3-8B with NF4 weights (fits T4 x2); all configs share the same weight baseline
Paired PPL harness: n=60 WikiText-103 segments, prefix=1024, cont=512 tokens
Reports measured bpe per config (not hardcoded nominals); Pareto comparison honest on both axes

Status

GPU result pending. No PPL claim is made here. Draft until jagmardrop/nq-hqmq-qwen3 completes.

Proof of work

VERIFY-WITH: python3.11 experiments/kaggle/nq_hqmq_qwen3/smoke_test.py

TEST (a): codebook size = 24*S ... PASS (S=24,48,96,192 all have 24*S unit-norm codewords)
TEST (b): round-trip bounded error ... orig_norm=5.8334, err=0.4811, ratio=0.0825 PASS
TEST (c): Med3x outlier detection ... PASS (1 block(s) flagged; planted at idx 31 norm=10.50 > 3*1.00)
TEST (d): HQMQ bpe s96_r4 ~3.79 ... base_bpe_r4=3.7925 corrected(p=0.02)=4.2866 PASS
TEST (e): HQMQ bpe s96_r6 (base formula, no outlier correction) ... base=4.2925 with_flag=4.5425 corrected(p=0.07)=5.3620 PASS
TEST (f): E8 bpe formula K4V2 and K3V2 ... K4V2=2.0000 K3V2=1.9375 PASS (K4V2=2.000 > K3V2=1.938)
All smoke tests PASSED.
exit code: 0

Kaggle push: jagmardrop/nq-hqmq-qwen3 version 1 (RUNNING at push time)
https://www.kaggle.com/code/jagmardrop/nq-hqmq-qwen3

Diff stat: 3 files changed, 806 insertions(+)

Extends the Mistral faithful-HQMQ kernel to Hurwitz's own evaluation model. Tests HQMQ s96_r6 (Hurwitz Qwen3-8B operating point, ~5 bpe) and s96_r4 (~3.79 bpe) against E8 K4V2 and K3V2 on the PPL-vs-bpe Pareto frontier. GPU result pending. - HQMQ algorithm unchanged (24*S product code, Med3x C=3, no rotation, per spec) - Model: Qwen/Qwen3-8B with NF4 weights to fit T4 x2 - Paired PPL harness: n>=60 WikiText-103 segments, prefix=1024, cont=512 - smoke_test.py: all 6 assertions pass (exit 0) - Target: jagmardrop/nq-hqmq-qwen3

Add _get_kv/_set_kv/_n_layers_kv compat shims that handle both the old key_cache/value_cache list API and the new layers[i].keys/values API introduced in transformers>=5.12. Replace all direct cache.key_cache[i] and cache.value_cache[i] accesses in clone_cache, apply_hqmq, and apply_e8_cache with these shims.

jagmarques added 2 commits June 15, 2026 23:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3-8B faithful-HQMQ vs E8 Pareto comparison#36

Qwen3-8B faithful-HQMQ vs E8 Pareto comparison#36
jagmarques wants to merge 2 commits into
mainfrom
company/hqmq-qwen3

jagmarques commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jagmarques commented Jun 15, 2026

Summary

Status

Proof of work

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant