[Perf] Enable missing FlashInfer MoE autotuning by mmangkad · Pull Request #290 · lightseekorg/tokenspeed

mmangkad · 2026-05-28T05:07:28Z

Summary

I noticed some FlashInfer MoE backends were missing the existing first-call autotune flow. This PR enables it for FP8 Cutlass, NVFP4 Cutlass, and NVFP4 TRTLLM.

Test Plan

Manually validated on GB300; CI is the primary validation.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cadaca8945

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ccb6103d5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mmangkad requested a review from a team as a code owner May 28, 2026 05:07

chatgpt-codex-connector Bot reviewed May 28, 2026

View reviewed changes

Comment thread python/tokenspeed/runtime/layers/moe/backends/nvfp4/flashinfer_trtllm.py Outdated

Enable missing FlashInfer MoE autotuning

6ccb610

Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>

mmangkad force-pushed the enable-flashinfer-moe-autotune branch from cadaca8 to 6ccb610 Compare May 28, 2026 05:14

chatgpt-codex-connector Bot reviewed May 28, 2026

View reviewed changes

Comment thread python/tokenspeed/runtime/layers/moe/backends/nvfp4/flashinfer_trtllm.py

Comment thread python/tokenspeed/runtime/layers/moe/backends/nvfp4/flashinfer_cutlass.py

Merge branch 'main' into enable-flashinfer-moe-autotune

bd1c207

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] Enable missing FlashInfer MoE autotuning#290

[Perf] Enable missing FlashInfer MoE autotuning#290
mmangkad wants to merge 2 commits into
lightseekorg:mainfrom
mmangkad-dev:enable-flashinfer-moe-autotune

mmangkad commented May 28, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mmangkad commented May 28, 2026

Summary

Test Plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant