perf: Optimize inter-iteration small op by yweng0828 · Pull Request #291 · lightseekorg/tokenspeed

yweng0828 · 2026-05-28T06:01:54Z

Summary

This PR optimizes some small operations between iterations (runtime).

Take Kimi2.5 + eagle3 as example

Before:

~160us (from the last kernel of the previous iteration to the first kernel of the next iteration)

34 kernels

After:

~85us (from the last kernel of the previous iteration to the first kernel of the next iteration)

14 kernels

Test Plan

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07075dffb2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-28T06:43:48Z

+        if batch_size < self.max_bs:
            self.seq_lens_buf[batch_size:].fill_(1)


Reset padded request slots before graph replay

When CUDA graph replay pads a decode batch (padded_bs > bs) and speculative decoding is enabled, _forward_step writes drafter outputs to future_input_map[self.input_buffers.req_pool_indices_buf[:ctx.bs]] using the persistent buffer up to the padded size. This block now refreshes only seq_lens_buf for the tail, so after a previous larger batch the padded rows can still contain stale real pool indices and the captured graph can overwrite those inactive requests' future_input_map entries; restore zero-filling req_pool_indices_buf[batch_size:] whenever batch_size < self.max_bs.

Useful? React with 👍 / 👎.

borontion

thanks. this is a place I also want to optimize.

yweng0828 · 2026-05-31T15:07:44Z

This PR is not ready yet (but it will be soon), and I changed its status to ready in order to trigger CI.

yweng0828 force-pushed the yweng/dev/opt_inter_iteration_small_kernel branch 3 times, most recently from 65ca7c9 to 07075df Compare May 28, 2026 06:32

yweng0828 marked this pull request as ready for review May 28, 2026 06:41

yweng0828 requested a review from a team as a code owner May 28, 2026 06:41

yweng0828 requested review from LorrinWWW, borontion, dongjiyingdjy, syuoni and zhyncs May 28, 2026 06:41

chatgpt-codex-connector Bot reviewed May 28, 2026

View reviewed changes

borontion approved these changes May 29, 2026

View reviewed changes

yweng0828 marked this pull request as draft May 29, 2026 02:59

yweng0828 force-pushed the yweng/dev/opt_inter_iteration_small_kernel branch from 07075df to b46047c Compare May 31, 2026 14:28

yweng0828 marked this pull request as ready for review May 31, 2026 15:06

Yue Weng added 2 commits May 31, 2026 22:52

opt inter-iteration small op

4975a0e

update

ba7d7da

yweng0828 force-pushed the yweng/dev/opt_inter_iteration_small_kernel branch from d474b78 to ba7d7da Compare June 1, 2026 05:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize inter-iteration small op#291

perf: Optimize inter-iteration small op#291
yweng0828 wants to merge 2 commits into
mainfrom
yweng/dev/opt_inter_iteration_small_kernel

yweng0828 commented May 28, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 28, 2026

Uh oh!

borontion left a comment

Uh oh!

yweng0828 commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if batch_size < self.max_bs:
		self.seq_lens_buf[batch_size:].fill_(1)

Conversation

yweng0828 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before:

After:

Test Plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

borontion left a comment

Choose a reason for hiding this comment

Uh oh!

yweng0828 commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yweng0828 commented May 28, 2026 •

edited

Loading