refactor(scheduler): unify hybrid cache model boundaries. by SimonCqk · Pull Request #243 · lightseekorg/tokenspeed

SimonCqk · 2026-05-25T03:21:48Z

Motivation

The scheduler path previously mixed scheduling logic with Mamba and paged-cache-specific lifecycle details. This
refactor makes the scheduler depend on a smaller hybrid-cache lifecycle boundary while keeping concrete family
behavior inside HybridPrefixCache.

This is a concrete module split, not a broad framework, plugin system, or universal allocator. Fuller family registry / policy abstractions are left as follow-ups once the common shape across more cache families is clearer.

Summary

Make HybridPrefixCache the concrete lifecycle boundary for MatchPrefix, typed admission ops, StepCommit, FinishRequest.
Replace phase-specific request bags/enums with typed cache ops for admission, publication, materialization,
request-local state, and worker metadata.
Split hybrid-cache implementation by responsibility:
- hybrid_prefix_cache_types.h for typed lifecycle request/result structures and recovery-plan plumbing.
- mamba_family_ops.cpp for Mamba-specific lifecycle behavior.
- paged_cache_family_ops.cpp for paged-cache group admission, match, ownership, and cleanup behavior.
Move Mamba / paged-cache admission, match augmentation, publication, metadata, and cleanup logic out of scheduler/FSM code.
Remove unused CacheCoordinator scaffold and the unused scheduler prefetch operation implementation from the build.
Add/adjust tests around durable lifecycle behavior and core cache invariants instead of intermediate phase
helpers.

Notes for Reviewers

A few follow-up-shaped pieces are deliberately not introduced in this PR to keep the size and review surface controlled:

HybridPrefixCache remains the concrete coordinator. The family-specific code is split into responsibility-
focused files, but not yet abstracted behind a generic family registry or polymorphic interface.
*_family_ops.cpp favors direct readable family logic over premature framework shape. A fuller registry/policy
layer will be introduced later if additional cache families prove a stable common lifecycle.
TreeNode attachment generalization is out of scope. This PR keeps the current concrete Mamba / paged-cache
attachment model and focuses on scheduler boundary cleanup, will be a follow up to do the generalization after this merged.
Existing prefetch state/event types remain; this PR only removes the unused scheduler prefetch operation implementation.

Test Plan & Regression Status

Focused hybrid/Mamba/PagedCache tests: 69 passed
Full scheduler C++ tests: 210 passed
Regression test report Ongoing :
- Accuracy (evalscope eval)
  - DeepSeek v4 Flash: gsm8k limit 200, accuracy 0.98.
  - Kimi-K2.5-NVFP4: gsm8k limit 200, accuracy 0.965.
  - Qwen-397B-NVFP4: gsm8k limit 200, accuracy 0.97.
- Multi-turn cache hit ratio (evalscope perf multi-turn)
  - Kimi-K2.5-NVFP4: 90.1%
  - Qwen-397B-NVFP4: 86.5%
  - DeepSeek v4 Flash: TODO
- Overlap scheduling regression: TODO

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3fb5a1eb3d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8327c214fb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tuanzhangCS · 2026-05-27T13:25:13Z

    std::optional<MambaChunkAllocator> mamba_allocator_{};
    std::optional<MambaHostAllocator> mamba_host_allocator_{};
    KVPrefixCache kv_prefix_cache_;
+    HybridPrefixCache hybrid_prefix_cache_;


Will kv_prefix_cache_ and hybrid_prefix_cache_ be merged into a single variable in the future?

yes, this will be addressed in the next follow-up PR. The current PR is mainly intended to keep the LOC manageable and merge the mamba & ds v4 cache models first, and avoiding introducing the KV ownership rewriting in step one, and now HybridPrefixCache already holds a reference to KVPrefixCache, so the follow-up change should be a relatively small one.

tuanzhangCS · 2026-05-27T13:34:13Z

+        },
+        cache::state::RefreshMambaCheckpoint{
+            .allocator = local_mamba_allocator.get(),
+            .checkpoint_raw_position = state.window.begin + state.window.size + tokens_this_round_,


As a follow-up: the FSM still explicitly composes family-specific ops (e.g. RefreshMambaCheckpoint, AcquireRequestLocalKV) at each StepCommit call site. Would it be possible to encapsulate these per-phase op bundles inside HybridPrefixCache (e.g. a single StepCommit(Phase::PrefillChunk, context)) so the scheduler stays agnostic when new cache families are added?

can be further folded into HybridPrefixCache as well with manageable size, I've pushed a new commit with the implementation.

Unify the scheduler-facing hybrid cache lifecycle around HybridPrefixCache so scheduler/FSM code calls lifecycle methods instead of spelling Mamba and paged-cache choreography directly. Add the family registry/recovery/lifecycle request types, RequestCacheContext, and focused diagnostics/stat helpers needed for the unified cache model. Split family-specific implementation into registry, Mamba, and paged-cache modules, while keeping concrete allocator semantics and avoiding a broad virtual family framework. Consolidate tests around durable facade behavior: request-local accounting, Mamba slot release/protection, worker metadata propagation, paged-cache matching, admission, eviction, and registry behavior. Signed-off-by: SimonCqk <cqk0100@gmail.com>

Deduplicate HybridPrefixCache recovery-plan construction and keep family-specific helpers file-local. Replace admission and worker metadata boolean selectors with explicit kinds, while keeping scheduler call sites on the lifecycle facade. Trim helper-mechanics tests in favor of durable facade/invariant coverage and remove obsolete hybrid cache design docs. Signed-off-by: SimonCqk <cqk0100@gmail.com>

Drop unused registry, slice, demand, and recovery-plan scaffolding that had no production consumers. Keep the live hybrid cache facade centered on MatchPrefix, Admit, StepCommit, FinishRequest, and Stats, and move diagnostics tests to the Stats contract. Signed-off-by: SimonCqk <cqk0100@gmail.com>

Collapse over-expanded hybrid cache test coverage into durable facade and invariant checks. Remove scaffold-oriented test files, internalize debug memory diagnostics, clean changed includes, and narrow low-level hybrid cache helpers behind the test peer where production callers do not need them. Signed-off-by: SimonCqk <cqk0100@gmail.com>

Fold the hybrid-cache correctness follow-ups into one reviewable fix commit. - refresh host-backed cache nodes when writeback completes so LRU state reflects recent writes - preserve upstream state-family paged-cache admission credit semantics after the family-ops split - restore Mamba finish-state working-slot fallback and make checkpoint refresh failures explicit Signed-off-by: SimonCqk <cqk0100@gmail.com>

Replace phase-specific hybrid cache request bags with typed operation structs for publication, admission, request-local state, and worker metadata commits. Keep scheduler-facing lifecycle calls explicit while pruning debug-only or implementation-shaped tests in favor of durable cache contract coverage. Signed-off-by: SimonCqk <cqk0100@gmail.com>

- Introduce RequestLocalCacheState managed by HybridPrefixCache, consolidating request-local KV / Mamba adjunct creation, refresh, and publish. - FSM state now holds a unique_ptr<RequestLocalCacheState>; remove LocalMambaAllocator, RefreshMambaCheckpoint, mamba_writeback_nodes. - Generalize writeback ack to PendingHostWritebackState; unify KV demote and Mamba host ack in HybridPrefixCache::CompleteHostWriteBack(). - Drop production-facing Mamba allocator view from request.h and request_cache_context.h. - Update test_mamba_cache.cpp to use HybridPrefixCacheTestPeer for test-only adjunct checks. Signed-off-by: Simon_CQK <cqk0100@gmail.com>

Signed-off-by: Simon_CQK <cqk0100@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7bc95dce4a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T07:30:01Z

+        }
+    };
+
+    std::vector<TreeNode*> path = CollectAncestorPathRootToLeaf(match.device.last_node);


Use the host match when assembling paged-cache hits

When a cached prefix has been written back to L2 and its device KV pages were evicted, KVPrefixCache::Match reports device.last_node at the root while host.last_node still points at the cached node. Starting the paged-cache snapshot walk from match.device.last_node therefore misses snapshots on host-only cached prefixes, cap_to_root() clears the host hit, and schedulePrefillFirstChunk will re-prefill instead of issuing the normal host→device loadback with borrowed paged-cache pages. This regresses L2 prefix reuse specifically after device eviction/writeback for requests using paged-cache adjuncts; the walk should consider the deeper usable host/device match rather than only the device path.

Useful? React with 👍 / 👎.

SimonCqk requested a review from a team as a code owner May 25, 2026 03:21

SimonCqk removed the request for review from a team May 25, 2026 03:21

SimonCqk changed the title ~~refactor(scheduler): unify hybrid cache model boundaries.~~ [WIP] refactor(scheduler): unify hybrid cache model boundaries. May 25, 2026

SimonCqk force-pushed the feat/hybrid-cache-abstraction branch 3 times, most recently from f1b2d2f to 3fb5a1e Compare May 26, 2026 03:58

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

Comment thread tokenspeed-scheduler/csrc/scheduler/outside_event_handler.cpp

SimonCqk force-pushed the feat/hybrid-cache-abstraction branch from 5cf9571 to 8327c21 Compare May 26, 2026 07:41

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

Comment thread tokenspeed-scheduler/csrc/resource/hybrid_prefix_cache/mamba_family_ops.cpp Outdated

Comment thread tokenspeed-scheduler/csrc/resource/hybrid_prefix_cache/mamba_family_ops.cpp Outdated

SimonCqk force-pushed the feat/hybrid-cache-abstraction branch 2 times, most recently from c77e7c9 to 57330b6 Compare May 26, 2026 15:20

SimonCqk changed the title ~~[WIP] refactor(scheduler): unify hybrid cache model boundaries.~~ refactor(scheduler): unify hybrid cache model boundaries. May 26, 2026

SimonCqk requested review from a team, FC-Li, XucSh, tuanzhangCS and wangbo981016 May 26, 2026 17:07

lightseek-bot assigned wangbo981016 May 27, 2026

tuanzhangCS reviewed May 27, 2026

View reviewed changes

SimonCqk force-pushed the feat/hybrid-cache-abstraction branch from 2a8e9ad to 43b2532 Compare May 27, 2026 16:35

tuanzhangCS approved these changes May 28, 2026

View reviewed changes

lightseek-bot added the high priority label May 28, 2026

SimonCqk force-pushed the feat/hybrid-cache-abstraction branch from 9ca52e1 to a7b0e84 Compare May 31, 2026 15:49

lightseek-bot assigned FC-Li Jun 1, 2026

lightseek-bot requested a review from tuanzhangCS June 1, 2026 05:19

SimonCqk added 5 commits June 1, 2026 15:24

SimonCqk added 3 commits June 1, 2026 15:24

Fix host node ref lifetime test cache state

7bc95dc

Signed-off-by: Simon_CQK <cqk0100@gmail.com>

SimonCqk force-pushed the feat/hybrid-cache-abstraction branch from a7b0e84 to 7bc95dc Compare June 1, 2026 07:25

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(scheduler): unify hybrid cache model boundaries.#243

refactor(scheduler): unify hybrid cache model boundaries.#243
SimonCqk wants to merge 8 commits into
lightseekorg:mainfrom
SimonCqk:feat/hybrid-cache-abstraction

SimonCqk commented May 25, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

tuanzhangCS May 27, 2026

Uh oh!

SimonCqk May 27, 2026

Uh oh!

tuanzhangCS May 27, 2026

Uh oh!

SimonCqk May 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

SimonCqk commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Summary

Notes for Reviewers

Test Plan & Regression Status

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

tuanzhangCS May 27, 2026

Choose a reason for hiding this comment

Uh oh!

SimonCqk May 27, 2026

Choose a reason for hiding this comment

Uh oh!

tuanzhangCS May 27, 2026

Choose a reason for hiding this comment

Uh oh!

SimonCqk May 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SimonCqk commented May 25, 2026 •

edited

Loading