Skip to content

Kaggle T4x2 Mistral RULER multi-needle paired beneficial-quant kernel (4K, n>=50)#25

Draft
jagmarques wants to merge 2 commits into
mainfrom
company/gap-ruler-niah
Draft

Kaggle T4x2 Mistral RULER multi-needle paired beneficial-quant kernel (4K, n>=50)#25
jagmarques wants to merge 2 commits into
mainfrom
company/gap-ruler-niah

Conversation

@jagmarques

@jagmarques jagmarques commented Jun 14, 2026

Copy link
Copy Markdown
Owner

RULER-style multi-needle paired beneficial-quant kernels for Mistral-7B-Instruct-v0.3 at 4K (8-needle and 24-needle). Real E8 quant (inverse_rope -> Hadamard -> per-head amax -> E8Lattice.nearest_point -> forward_rope), paired protocol (single prefill, clone+quantize KV per config), n=50, value-specific recall.

RESULTS (both COMPLETE, re-derived from cells, errors={}, anti-vacuous clean):

  • 8 needles: FP16 48/50, K3V2_pb0 48/50, K4V2_pb0 47/50.
  • 24 needles: FP16 49/50, K3V2_pb0 50/50, K4V2_pb0 48/50.

HONEST FRAMING: these are LOSSLESS confirmations on a multi-needle 4K task, NOT a beneficial-quant rescue result. FP16 saturates at 96-98% at 4K regardless of needle count, so there is no FP16 dead zone to rescue and paired McNemar is non-discriminative (b approx c). K3V2 50/50 at 24 needles is within noise, not a claimable win.

CONCLUSION: the rescue-bulletproofing test needs an FP16 dead zone, which requires long context (8K-32K) that OOMs on T4x2 at depth. It is A100-bound. Kept DRAFT: the harness is reusable for that A100 run; no headline claim, ties to no criterion, so not merged.

… (4K, n>=50)

Adds experiments/kaggle/nq_mistral_ruler_niah/: a RULER-style multi-needle NIAH
kernel for Mistral-7B-Instruct-v0.3 at 4K context with n=50 trials and 8 needles
per trial. Tests FP16, K3V2_pb0, and K4V2_pb0 via a paired protocol (single
prefill, clone+quantize per config) to produce defensible beneficial-quant evidence.
Results pending queued Kaggle run.
8-needle variant hit 48/50 FP16 (saturated). Bumping to 24 needles
increases retrieval interference in the same 4K context to push FP16
below saturation and expose beneficial-quant deltas.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant