Skip to content

Add Mixtral-8x7B MoE KV-compression kernel and results#45

Merged
jagmarques merged 1 commit into
mainfrom
company/c6-mixtral-clean
Jun 19, 2026
Merged

Add Mixtral-8x7B MoE KV-compression kernel and results#45
jagmarques merged 1 commit into
mainfrom
company/c6-mixtral-clean

Conversation

@jagmarques

Copy link
Copy Markdown
Owner

Adds K3V2/K4V2 quant-only KV compression coverage for Mixtral-8x7B-Instruct (NF4 weights) on a single T4 via per-layer CPU/GPU streaming. PPL deltas (AQUA-iso paired, n=80): K3V2 +0.63%, K4V2 +0.32% off the NF4-weights baseline. Needle retrieval at 4K context holds at 5/5 for FP16, K3V2, and K4V2.

Closes the MoE-architecture coverage gap.

https://claude.ai/code/session_012T2q1cWGCTY963GGFXRZA7

Adds K3V2/K4V2 quant-only KV compression coverage for Mixtral-8x7B-Instruct (NF4 weights) on a single T4 via per-layer CPU/GPU streaming. PPL deltas (AQUA-iso paired, n=80): K3V2 +0.63%, K4V2 +0.32% off the NF4-weights baseline. Needle retrieval at 4K context holds at 5/5 for FP16, K3V2, and K4V2.

Claude-Session: https://claude.ai/code/session_012T2q1cWGCTY963GGFXRZA7
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@jagmarques jagmarques merged commit a48f990 into main Jun 19, 2026
3 checks passed
@jagmarques jagmarques deleted the company/c6-mixtral-clean branch June 19, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant