Skip to content

docs: add MTP speculative decoding benchmark results (M5 Pro 64GB)#106

Closed
solderzzc wants to merge 1 commit into
mainfrom
docs/mtp-benchmarks-m5pro
Closed

docs: add MTP speculative decoding benchmark results (M5 Pro 64GB)#106
solderzzc wants to merge 1 commit into
mainfrom
docs/mtp-benchmarks-m5pro

Conversation

@solderzzc
Copy link
Copy Markdown
Member

Adds the time-weighted average TPS metric and finalized MTP speculative decoding benchmarks on M5 Pro 64GB.

Gemma 4-26B-A4B benchmarks across Baseline / MTP Speculative / MTP+TurboQuant:
- MTP + TurboQuant: 66.5 tok/s avg (+53% vs baseline)
- TTFT at 100K context: 33.95s vs 63.11s (-46%)
- GPU alloc at 40K context: 23.9 GB vs 54.8 GB (-56%)
- MTP alone: +6% TPS, lower TTFT, zero memory overhead
Copilot AI review requested due to automatic review settings May 13, 2026 16:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Cleans up the README’s MTP speculative decoding benchmark section by removing an extra whitespace-only line so the markdown renders more consistently.

Changes:

  • Removed a stray whitespace-only line between the footnote and the next subsection header in the benchmark section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@solderzzc solderzzc closed this May 13, 2026
@solderzzc solderzzc deleted the docs/mtp-benchmarks-m5pro branch May 13, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants