Skip to content

[WIP] refactor + perf(spec-decode): refactor #217 + add prefill scope#304

Open
rjzhb wants to merge 15 commits into
lightseekorg:mainfrom
rjzhb:feat/eagle-prefill-last-layer-skip
Open

[WIP] refactor + perf(spec-decode): refactor #217 + add prefill scope#304
rjzhb wants to merge 15 commits into
lightseekorg:mainfrom
rjzhb:feat/eagle-prefill-last-layer-skip

Conversation

@rjzhb
Copy link
Copy Markdown
Contributor

@rjzhb rjzhb commented May 28, 2026

WIP — follow-up to #217.

Why

#217 introduced spec-decode draft-head's "first-step reduce" optimization by
sprinkling if ctx.draft_first_step_reduce: x.index_select(0, ctx.gather_ids)
across 4 model files + flag-based scatter overrides in comm_manager. The
optimization works, but the if/else sites multiplied with every model added
(MLA, NextN, Qwen MTP) and made the model forwards harder to read.

This PR consolidates that pattern into two named helpers without changing
behavior, then opens up EXTEND / MIXED (prefill) on top.

…e helpers

Signed-off-by: rjzhb <rjzhb222@163.com>
@rjzhb rjzhb requested a review from a team as a code owner May 28, 2026 22:39
@rjzhb rjzhb changed the title [WIP] refactor + feat(spec-decode): refactor #217 + add prefill scope [WIP] refactor + perf(spec-decode): refactor #217 + add prefill scope May 28, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0739d4f786

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread python/tokenspeed/runtime/distributed/comm_manager.py
…r.py + fix MoE token-count timing

Signed-off-by: rjzhb <rjzhb222@163.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e79cfeefe

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread python/tokenspeed/runtime/execution/model_executor.py Outdated
…first step

Signed-off-by: rjzhb <rjzhb222@163.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e04ed786f6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread python/tokenspeed/runtime/models/llama_eagle3.py Outdated
…bal_bs

Signed-off-by: rjzhb <rjzhb222@163.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0eb9c3a05c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread python/tokenspeed/runtime/execution/model_executor.py Outdated
rjzhb added 2 commits May 29, 2026 00:49
…ove readability

Signed-off-by: rjzhb <rjzhb222@163.com>
…ce prewrite extend output

Signed-off-by: rjzhb <rjzhb222@163.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1984cb563a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread python/tokenspeed/runtime/spec_decode/helper.py Outdated
…dle) when global_bs absent

Signed-off-by: rjzhb <rjzhb222@163.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 69085b1fe9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread python/tokenspeed/runtime/execution/drafter/eagle.py Outdated
rjzhb added 2 commits May 29, 2026 01:41
…) + rename dispatch + extract _post_slice_global_num_tokens

Signed-off-by: rjzhb <rjzhb222@163.com>
…-layer limitation

Signed-off-by: rjzhb <rjzhb222@163.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edc56787fc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread python/tokenspeed/runtime/models/qwen3_5.py Outdated
…ranks mutate ctx for MoE

Signed-off-by: rjzhb <rjzhb222@163.com>
@rjzhb
Copy link
Copy Markdown
Contributor Author

rjzhb commented May 29, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05bf754229

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread python/tokenspeed/runtime/execution/drafter/eagle.py Outdated
…t only pure DECODE)

Signed-off-by: rjzhb <rjzhb222@163.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2be6f21a11

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread python/tokenspeed/runtime/execution/drafter/eagle.py Outdated
…g bound method)

Signed-off-by: rjzhb <rjzhb222@163.com>
Comment thread python/tokenspeed/runtime/execution/drafter/eagle.py Outdated
…tSliceAttnWrapper, delete helper.py

Signed-off-by: rjzhb <rjzhb222@163.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5562fdd4b0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread python/tokenspeed/runtime/execution/drafter/eagle.py Outdated
rjzhb added 2 commits May 29, 2026 19:51
…of multi-layer drafts)

Signed-off-by: rjzhb <rjzhb222@163.com>
…HA decode catch-up trims seq_lens

Signed-off-by: rjzhb <rjzhb222@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants