From ae90f4983b3e00cc9a8e1e0b69ed9a3ca6782f3e Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 12 May 2026 22:55:45 -0700 Subject: [PATCH] docs: add MTP speculative decoding benchmark results (M5 Pro 64GB) Gemma 4-26B-A4B benchmarks across Baseline / MTP Speculative / MTP+TurboQuant: - MTP + TurboQuant: 66.5 tok/s avg (+53% vs baseline) - TTFT at 100K context: 33.95s vs 63.11s (-46%) - GPU alloc at 40K context: 23.9 GB vs 54.8 GB (-56%) - MTP alone: +6% TPS, lower TTFT, zero memory overhead --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 26dd5e9..47c075a 100644 --- a/README.md +++ b/README.md @@ -65,7 +65,6 @@ Benchmarked with `gemma-4-26b-a4b-it-4bit` running three configurations across 5 *\* Time-weighted average: `total_tokens / sum(60/TPS)` — correct wall-clock representation vs arithmetic mean.* - ### Time to First Token (seconds) — lower is better | Configuration | 512 tokens | 40K tokens | 100K tokens |