Add Mixtral-8x7B MoE KV-compression kernel and results by jagmarques · Pull Request #45 · jagmarques/nexusquant

jagmarques · 2026-06-19T20:57:40Z

Adds K3V2/K4V2 quant-only KV compression coverage for Mixtral-8x7B-Instruct (NF4 weights) on a single T4 via per-layer CPU/GPU streaming. PPL deltas (AQUA-iso paired, n=80): K3V2 +0.63%, K4V2 +0.32% off the NF4-weights baseline. Needle retrieval at 4K context holds at 5/5 for FP16, K3V2, and K4V2.

Closes the MoE-architecture coverage gap.

https://claude.ai/code/session_012T2q1cWGCTY963GGFXRZA7

Adds K3V2/K4V2 quant-only KV compression coverage for Mixtral-8x7B-Instruct (NF4 weights) on a single T4 via per-layer CPU/GPU streaming. PPL deltas (AQUA-iso paired, n=80): K3V2 +0.63%, K4V2 +0.32% off the NF4-weights baseline. Needle retrieval at 4K context holds at 5/5 for FP16, K3V2, and K4V2. Claude-Session: https://claude.ai/code/session_012T2q1cWGCTY963GGFXRZA7

chatgpt-codex-connector · 2026-06-19T20:57:46Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

jagmarques merged commit a48f990 into main Jun 19, 2026
3 checks passed

jagmarques mentioned this pull request Jun 19, 2026

C6: Mixtral-8x7B manual per-layer GPU streaming (bypass accelerate) #44

Closed

jagmarques deleted the company/c6-mixtral-clean branch June 19, 2026 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Mixtral-8x7B MoE KV-compression kernel and results#45

Add Mixtral-8x7B MoE KV-compression kernel and results#45
jagmarques merged 1 commit into
mainfrom
company/c6-mixtral-clean

jagmarques commented Jun 19, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jagmarques commented Jun 19, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant