docs: add MTP speculative decoding benchmark results (M5 Pro 64GB) by solderzzc · Pull Request #106 · SharpAI/SwiftLM

solderzzc · 2026-05-13T16:53:37Z

Adds the time-weighted average TPS metric and finalized MTP speculative decoding benchmarks on M5 Pro 64GB.

Gemma 4-26B-A4B benchmarks across Baseline / MTP Speculative / MTP+TurboQuant: - MTP + TurboQuant: 66.5 tok/s avg (+53% vs baseline) - TTFT at 100K context: 33.95s vs 63.11s (-46%) - GPU alloc at 40K context: 23.9 GB vs 54.8 GB (-56%) - MTP alone: +6% TPS, lower TTFT, zero memory overhead

Copilot

Pull request overview

Cleans up the README’s MTP speculative decoding benchmark section by removing an extra whitespace-only line so the markdown renders more consistently.

Changes:

Removed a stray whitespace-only line between the footnote and the next subsection header in the benchmark section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings May 13, 2026 16:53

Copilot AI reviewed May 13, 2026

View reviewed changes

solderzzc closed this May 13, 2026

solderzzc deleted the docs/mtp-benchmarks-m5pro branch May 13, 2026 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add MTP speculative decoding benchmark results (M5 Pro 64GB)#106

docs: add MTP speculative decoding benchmark results (M5 Pro 64GB)#106
solderzzc wants to merge 1 commit into
mainfrom
docs/mtp-benchmarks-m5pro

solderzzc commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

solderzzc commented May 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants