C6: Mixtral-8x7B T4x2 AQUA-iso paired PPL + NIAH kernel (v3, memory-fit) by jagmarques · Pull Request #39 · jagmarques/nexusquant

jagmarques · 2026-06-16T06:37:12Z

What

Adds experiments/kaggle/nq_mixtral_t4fit/ - the v3 Mixtral-8x7B NF4 kernel for C6 coverage (MoE architecture, Kaggle T4x2).

Why prior versions OOMed

v1: downloaded 93GB bf16 repo, crashed mid-download.
v2 (nq_mixtral_lean): re-prefilled per config (unpaired, n=25), still OOMed during 4K NIAH prefill with max_memory=15GiB/card.

v3 memory strategy

Three changes fix the OOM:

max_memory={0:"11GiB", 1:"11GiB", "cpu":"26GiB"} - pushes several expert blocks to CPU RAM. 25GB NF4 weights split across 2x T4 with overflow to CPU/disk.
offload_folder=/kaggle/working/offload + offload_state_dict=True - disk safety valve for anything that exceeds CPU too.
CPU-master pattern: after prefill, KV master moves to CPU immediately. Per config: clone CPU master to GPU, quantize, score identical continuation tokens, delete GPU clone. One GPU KV copy at a time. This enables AQUA-iso PAIRED deltas (contract requires this; v2 was unpaired).

Contract compliance (C6)

Mirror: unsloth/Mixtral-8x7B-Instruct-v0.1-bnb-4bit - verified ungated (gated=false, Apache-2.0, ~24.5GB, no HF_TOKEN).
PPL: AQUA-iso paired, n=80 segments (>= floor 60), 1024-prefix + 512-cont, mean +/- SEM, z, sig@2sigma.
NIAH: ctx=4096 preferred, fallback to ctx=2048 if OOM. FP16 baseline gate active.
JSON saved to /kaggle/working/nq_mixtral_t4fit.json before any print statements.
No hardcoded secrets; HF_TOKEN read from env (None is fine for ungated mirror).

Proof of work

VERIFY-WITH: python3 -m ast nq_mixtral_t4fit.py && python3.11 smoke_test (run inline)

AST parse: OK
LOCAL SMOKE outputs:
  imports OK
  K3V2_pb0: mean_k_err=0.1512, mean_v_err=0.3073
  K4V2_pb0: mean_k_err=0.0769, mean_v_err=0.3057
  E8 ordering check: K4 rel_err < K3 rel_err OK
  save-before-print + json round-trip: OK
  None guard: OK (K3 mean=n/a% +/-n/a%)
  LOCAL SMOKE: ALL PASS

Branch push: company/c6-mixtral-t4fit @ 08:35 Amsterdam (before 09:00 freeze)
Git commit: 1a08900

Kaggle push (jooand account): jooandrgomesmarques/nq-mixtral-t4fit v1
Status check: KernelWorkerStatus.RUNNING
URL: https://www.kaggle.com/code/jooandrgomesmarques/nq-mixtral-t4fit

Diff stat: 2 files changed, 801 insertions(+)

GPU result pending - merge after NIAH hits >0 confirmed in kernel output.

v1 OOMed (93GB bf16 download), v2 OOMed (15GiB/card, paired-clone on GPU). v3 fixes: max_memory={0:11GiB, 1:11GiB, cpu:26GiB} + offload_folder pushes expert blocks to CPU/disk; CPU-master pattern keeps one GPU KV copy at a time enabling AQUA-iso PAIRED deltas (n=80 >= contract floor 60). Mirror: unsloth/Mixtral-8x7B-Instruct-v0.1-bnb-4bit (24.5GB, Apache-2.0, gated=false, no HF_TOKEN). Local smoke: AST OK, E8 ordering OK, json OK.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C6: Mixtral-8x7B T4x2 AQUA-iso paired PPL + NIAH kernel (v3, memory-fit)#39

C6: Mixtral-8x7B T4x2 AQUA-iso paired PPL + NIAH kernel (v3, memory-fit)#39
jagmarques wants to merge 1 commit into
mainfrom
company/c6-mixtral-t4fit

jagmarques commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jagmarques commented Jun 16, 2026

What

Why prior versions OOMed

v3 memory strategy

Contract compliance (C6)

Proof of work

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant