Skip to content

C6: Mixtral-8x7B T4x2 AQUA-iso paired PPL + NIAH kernel (v3, memory-fit)#39

Draft
jagmarques wants to merge 1 commit into
mainfrom
company/c6-mixtral-t4fit
Draft

C6: Mixtral-8x7B T4x2 AQUA-iso paired PPL + NIAH kernel (v3, memory-fit)#39
jagmarques wants to merge 1 commit into
mainfrom
company/c6-mixtral-t4fit

Conversation

@jagmarques

Copy link
Copy Markdown
Owner

What

Adds experiments/kaggle/nq_mixtral_t4fit/ - the v3 Mixtral-8x7B NF4 kernel for C6 coverage (MoE architecture, Kaggle T4x2).

Why prior versions OOMed

  • v1: downloaded 93GB bf16 repo, crashed mid-download.
  • v2 (nq_mixtral_lean): re-prefilled per config (unpaired, n=25), still OOMed during 4K NIAH prefill with max_memory=15GiB/card.

v3 memory strategy

Three changes fix the OOM:

  1. max_memory={0:"11GiB", 1:"11GiB", "cpu":"26GiB"} - pushes several expert blocks to CPU RAM. 25GB NF4 weights split across 2x T4 with overflow to CPU/disk.
  2. offload_folder=/kaggle/working/offload + offload_state_dict=True - disk safety valve for anything that exceeds CPU too.
  3. CPU-master pattern: after prefill, KV master moves to CPU immediately. Per config: clone CPU master to GPU, quantize, score identical continuation tokens, delete GPU clone. One GPU KV copy at a time. This enables AQUA-iso PAIRED deltas (contract requires this; v2 was unpaired).

Contract compliance (C6)

  • Mirror: unsloth/Mixtral-8x7B-Instruct-v0.1-bnb-4bit - verified ungated (gated=false, Apache-2.0, ~24.5GB, no HF_TOKEN).
  • PPL: AQUA-iso paired, n=80 segments (>= floor 60), 1024-prefix + 512-cont, mean +/- SEM, z, sig@2sigma.
  • NIAH: ctx=4096 preferred, fallback to ctx=2048 if OOM. FP16 baseline gate active.
  • JSON saved to /kaggle/working/nq_mixtral_t4fit.json before any print statements.
  • No hardcoded secrets; HF_TOKEN read from env (None is fine for ungated mirror).

Proof of work

VERIFY-WITH: python3 -m ast nq_mixtral_t4fit.py && python3.11 smoke_test (run inline)

AST parse: OK
LOCAL SMOKE outputs:
  imports OK
  K3V2_pb0: mean_k_err=0.1512, mean_v_err=0.3073
  K4V2_pb0: mean_k_err=0.0769, mean_v_err=0.3057
  E8 ordering check: K4 rel_err < K3 rel_err OK
  save-before-print + json round-trip: OK
  None guard: OK (K3 mean=n/a% +/-n/a%)
  LOCAL SMOKE: ALL PASS

Branch push: company/c6-mixtral-t4fit @ 08:35 Amsterdam (before 09:00 freeze)
Git commit: 1a08900

Kaggle push (jooand account): jooandrgomesmarques/nq-mixtral-t4fit v1
Status check: KernelWorkerStatus.RUNNING
URL: https://www.kaggle.com/code/jooandrgomesmarques/nq-mixtral-t4fit

Diff stat: 2 files changed, 801 insertions(+)

GPU result pending - merge after NIAH hits >0 confirmed in kernel output.

v1 OOMed (93GB bf16 download), v2 OOMed (15GiB/card, paired-clone on GPU).
v3 fixes: max_memory={0:11GiB, 1:11GiB, cpu:26GiB} + offload_folder pushes
expert blocks to CPU/disk; CPU-master pattern keeps one GPU KV copy at a
time enabling AQUA-iso PAIRED deltas (n=80 >= contract floor 60).
Mirror: unsloth/Mixtral-8x7B-Instruct-v0.1-bnb-4bit (24.5GB, Apache-2.0,
gated=false, no HF_TOKEN). Local smoke: AST OK, E8 ordering OK, json OK.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant