Skip to content

Qwen3-8B faithful-HQMQ vs E8 Pareto comparison#36

Draft
jagmarques wants to merge 2 commits into
mainfrom
company/hqmq-qwen3
Draft

Qwen3-8B faithful-HQMQ vs E8 Pareto comparison#36
jagmarques wants to merge 2 commits into
mainfrom
company/hqmq-qwen3

Conversation

@jagmarques

Copy link
Copy Markdown
Owner

Summary

Extends the faithful-HQMQ Mistral kernel to Hurwitz's own evaluation model (Qwen3-8B). Measures the PPL-vs-bpe Pareto frontier for HQMQ (s96_r6 ~5 bpe, s96_r4 ~3.79 bpe) and E8 (K4V2, K3V2) under identical harness conditions.

  • HQMQ algorithm unchanged from spec (24*S product code, Med3x C=3, no rotation, per arXiv:2605.27646)
  • Model: Qwen/Qwen3-8B with NF4 weights (fits T4 x2); all configs share the same weight baseline
  • Paired PPL harness: n=60 WikiText-103 segments, prefix=1024, cont=512 tokens
  • Reports measured bpe per config (not hardcoded nominals); Pareto comparison honest on both axes

Status

GPU result pending. No PPL claim is made here. Draft until jagmardrop/nq-hqmq-qwen3 completes.

Proof of work

VERIFY-WITH: python3.11 experiments/kaggle/nq_hqmq_qwen3/smoke_test.py

TEST (a): codebook size = 24*S ... PASS (S=24,48,96,192 all have 24*S unit-norm codewords)
TEST (b): round-trip bounded error ... orig_norm=5.8334, err=0.4811, ratio=0.0825 PASS
TEST (c): Med3x outlier detection ... PASS (1 block(s) flagged; planted at idx 31 norm=10.50 > 3*1.00)
TEST (d): HQMQ bpe s96_r4 ~3.79 ... base_bpe_r4=3.7925 corrected(p=0.02)=4.2866 PASS
TEST (e): HQMQ bpe s96_r6 (base formula, no outlier correction) ... base=4.2925 with_flag=4.5425 corrected(p=0.07)=5.3620 PASS
TEST (f): E8 bpe formula K4V2 and K3V2 ... K4V2=2.0000 K3V2=1.9375 PASS (K4V2=2.000 > K3V2=1.938)
All smoke tests PASSED.
exit code: 0

Kaggle push: jagmardrop/nq-hqmq-qwen3 version 1 (RUNNING at push time)
https://www.kaggle.com/code/jagmardrop/nq-hqmq-qwen3

Diff stat: 3 files changed, 806 insertions(+)

Extends the Mistral faithful-HQMQ kernel to Hurwitz's own evaluation model.
Tests HQMQ s96_r6 (Hurwitz Qwen3-8B operating point, ~5 bpe) and s96_r4 (~3.79 bpe)
against E8 K4V2 and K3V2 on the PPL-vs-bpe Pareto frontier. GPU result pending.

- HQMQ algorithm unchanged (24*S product code, Med3x C=3, no rotation, per spec)
- Model: Qwen/Qwen3-8B with NF4 weights to fit T4 x2
- Paired PPL harness: n>=60 WikiText-103 segments, prefix=1024, cont=512
- smoke_test.py: all 6 assertions pass (exit 0)
- Target: jagmardrop/nq-hqmq-qwen3
Add _get_kv/_set_kv/_n_layers_kv compat shims that handle both the old
key_cache/value_cache list API and the new layers[i].keys/values API
introduced in transformers>=5.12. Replace all direct cache.key_cache[i]
and cache.value_cache[i] accesses in clone_cache, apply_hqmq, and
apply_e8_cache with these shims.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant