C6: Mixtral-8x7B T4x2 AQUA-iso paired PPL + NIAH kernel (v3, memory-fit)#39
Draft
jagmarques wants to merge 1 commit into
Draft
C6: Mixtral-8x7B T4x2 AQUA-iso paired PPL + NIAH kernel (v3, memory-fit)#39jagmarques wants to merge 1 commit into
jagmarques wants to merge 1 commit into
Conversation
v1 OOMed (93GB bf16 download), v2 OOMed (15GiB/card, paired-clone on GPU).
v3 fixes: max_memory={0:11GiB, 1:11GiB, cpu:26GiB} + offload_folder pushes
expert blocks to CPU/disk; CPU-master pattern keeps one GPU KV copy at a
time enabling AQUA-iso PAIRED deltas (n=80 >= contract floor 60).
Mirror: unsloth/Mixtral-8x7B-Instruct-v0.1-bnb-4bit (24.5GB, Apache-2.0,
gated=false, no HF_TOKEN). Local smoke: AST OK, E8 ordering OK, json OK.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
experiments/kaggle/nq_mixtral_t4fit/- the v3 Mixtral-8x7B NF4 kernel for C6 coverage (MoE architecture, Kaggle T4x2).Why prior versions OOMed
nq_mixtral_lean): re-prefilled per config (unpaired, n=25), still OOMed during 4K NIAH prefill with max_memory=15GiB/card.v3 memory strategy
Three changes fix the OOM:
max_memory={0:"11GiB", 1:"11GiB", "cpu":"26GiB"}- pushes several expert blocks to CPU RAM. 25GB NF4 weights split across 2x T4 with overflow to CPU/disk.offload_folder=/kaggle/working/offload+offload_state_dict=True- disk safety valve for anything that exceeds CPU too.Contract compliance (C6)
unsloth/Mixtral-8x7B-Instruct-v0.1-bnb-4bit- verified ungated (gated=false, Apache-2.0, ~24.5GB, no HF_TOKEN)./kaggle/working/nq_mixtral_t4fit.jsonbefore any print statements.Proof of work
GPU result pending - merge after NIAH hits >0 confirmed in kernel output.