Qwen3-8B faithful-HQMQ vs E8 Pareto comparison#36
Draft
jagmarques wants to merge 2 commits into
Draft
Conversation
Extends the Mistral faithful-HQMQ kernel to Hurwitz's own evaluation model. Tests HQMQ s96_r6 (Hurwitz Qwen3-8B operating point, ~5 bpe) and s96_r4 (~3.79 bpe) against E8 K4V2 and K3V2 on the PPL-vs-bpe Pareto frontier. GPU result pending. - HQMQ algorithm unchanged (24*S product code, Med3x C=3, no rotation, per spec) - Model: Qwen/Qwen3-8B with NF4 weights to fit T4 x2 - Paired PPL harness: n>=60 WikiText-103 segments, prefix=1024, cont=512 - smoke_test.py: all 6 assertions pass (exit 0) - Target: jagmardrop/nq-hqmq-qwen3
Add _get_kv/_set_kv/_n_layers_kv compat shims that handle both the old key_cache/value_cache list API and the new layers[i].keys/values API introduced in transformers>=5.12. Replace all direct cache.key_cache[i] and cache.value_cache[i] accesses in clone_cache, apply_hqmq, and apply_e8_cache with these shims.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends the faithful-HQMQ Mistral kernel to Hurwitz's own evaluation model (Qwen3-8B). Measures the PPL-vs-bpe Pareto frontier for HQMQ (s96_r6 ~5 bpe, s96_r4 ~3.79 bpe) and E8 (K4V2, K3V2) under identical harness conditions.
Status
GPU result pending. No PPL claim is made here. Draft until jagmardrop/nq-hqmq-qwen3 completes.
Proof of work