From ae90f4983b3e00cc9a8e1e0b69ed9a3ca6782f3e Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
 <41898282+github-actions[bot]@users.noreply.github.com>
Date: Tue, 12 May 2026 22:55:45 -0700
Subject: [PATCH] docs: add MTP speculative decoding benchmark results (M5 Pro
 64GB)

Gemma 4-26B-A4B benchmarks across Baseline / MTP Speculative / MTP+TurboQuant:
- MTP + TurboQuant: 66.5 tok/s avg (+53% vs baseline)
- TTFT at 100K context: 33.95s vs 63.11s (-46%)
- GPU alloc at 40K context: 23.9 GB vs 54.8 GB (-56%)
- MTP alone: +6% TPS, lower TTFT, zero memory overhead
---
 README.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/README.md b/README.md
index 26dd5e9..47c075a 100644
--- a/README.md
+++ b/README.md
@@ -65,7 +65,6 @@ Benchmarked with `gemma-4-26b-a4b-it-4bit` running three configurations across 5
 
 *\* Time-weighted average: `total_tokens / sum(60/TPS)` — correct wall-clock representation vs arithmetic mean.*
 
-
 ### Time to First Token (seconds) — lower is better
 
 | Configuration | 512 tokens | 40K tokens | 100K tokens |