Skip to content

refactor(scheduler): unify hybrid cache model boundaries.#243

Open
SimonCqk wants to merge 8 commits into
lightseekorg:mainfrom
SimonCqk:feat/hybrid-cache-abstraction
Open

refactor(scheduler): unify hybrid cache model boundaries.#243
SimonCqk wants to merge 8 commits into
lightseekorg:mainfrom
SimonCqk:feat/hybrid-cache-abstraction

Conversation

@SimonCqk
Copy link
Copy Markdown
Contributor

@SimonCqk SimonCqk commented May 25, 2026

Motivation

The scheduler path previously mixed scheduling logic with Mamba and paged-cache-specific lifecycle details. This
refactor makes the scheduler depend on a smaller hybrid-cache lifecycle boundary while keeping concrete family
behavior inside HybridPrefixCache.

This is a concrete module split, not a broad framework, plugin system, or universal allocator. Fuller family registry / policy abstractions are left as follow-ups once the common shape across more cache families is clearer.

Summary

  • Make HybridPrefixCache the concrete lifecycle boundary for MatchPrefix, typed admission ops, StepCommit, FinishRequest.
  • Replace phase-specific request bags/enums with typed cache ops for admission, publication, materialization,
    request-local state, and worker metadata.
  • Split hybrid-cache implementation by responsibility:
    • hybrid_prefix_cache_types.h for typed lifecycle request/result structures and recovery-plan plumbing.
    • mamba_family_ops.cpp for Mamba-specific lifecycle behavior.
    • paged_cache_family_ops.cpp for paged-cache group admission, match, ownership, and cleanup behavior.
  • Move Mamba / paged-cache admission, match augmentation, publication, metadata, and cleanup logic out of scheduler/FSM code.
  • Remove unused CacheCoordinator scaffold and the unused scheduler prefetch operation implementation from the build.
  • Add/adjust tests around durable lifecycle behavior and core cache invariants instead of intermediate phase
    helpers.

Notes for Reviewers

A few follow-up-shaped pieces are deliberately not introduced in this PR to keep the size and review surface controlled:

  • HybridPrefixCache remains the concrete coordinator. The family-specific code is split into responsibility-
    focused files, but not yet abstracted behind a generic family registry or polymorphic interface.
  • *_family_ops.cpp favors direct readable family logic over premature framework shape. A fuller registry/policy
    layer will be introduced later if additional cache families prove a stable common lifecycle.
  • TreeNode attachment generalization is out of scope. This PR keeps the current concrete Mamba / paged-cache
    attachment model and focuses on scheduler boundary cleanup, will be a follow up to do the generalization after this merged.
  • Existing prefetch state/event types remain; this PR only removes the unused scheduler prefetch operation implementation.

Test Plan & Regression Status

  • Focused hybrid/Mamba/PagedCache tests: 69 passed
  • Full scheduler C++ tests: 210 passed
  • Regression test report Ongoing :
    • Accuracy (evalscope eval)
      • DeepSeek v4 Flash: gsm8k limit 200, accuracy 0.98.
      • Kimi-K2.5-NVFP4: gsm8k limit 200, accuracy 0.965.
      • Qwen-397B-NVFP4: gsm8k limit 200, accuracy 0.97.
    • Multi-turn cache hit ratio (evalscope perf multi-turn)
      • Kimi-K2.5-NVFP4: 90.1%
      • Qwen-397B-NVFP4: 86.5%
      • DeepSeek v4 Flash: TODO
    • Overlap scheduling regression: TODO

@SimonCqk SimonCqk requested a review from a team as a code owner May 25, 2026 03:21
@SimonCqk SimonCqk removed the request for review from a team May 25, 2026 03:21
@SimonCqk SimonCqk changed the title refactor(scheduler): unify hybrid cache model boundaries. [WIP] refactor(scheduler): unify hybrid cache model boundaries. May 25, 2026
@SimonCqk SimonCqk force-pushed the feat/hybrid-cache-abstraction branch 3 times, most recently from f1b2d2f to 3fb5a1e Compare May 26, 2026 03:58
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3fb5a1eb3d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tokenspeed-scheduler/csrc/scheduler/outside_event_handler.cpp
@SimonCqk SimonCqk force-pushed the feat/hybrid-cache-abstraction branch from 5cf9571 to 8327c21 Compare May 26, 2026 07:41
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8327c214fb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tokenspeed-scheduler/csrc/resource/hybrid_prefix_cache/mamba_family_ops.cpp Outdated
Comment thread tokenspeed-scheduler/csrc/resource/hybrid_prefix_cache/mamba_family_ops.cpp Outdated
@SimonCqk SimonCqk force-pushed the feat/hybrid-cache-abstraction branch 2 times, most recently from c77e7c9 to 57330b6 Compare May 26, 2026 15:20
@SimonCqk SimonCqk changed the title [WIP] refactor(scheduler): unify hybrid cache model boundaries. refactor(scheduler): unify hybrid cache model boundaries. May 26, 2026
@SimonCqk SimonCqk requested review from a team, FC-Li, XucSh, tuanzhangCS and wangbo981016 May 26, 2026 17:07
std::optional<MambaChunkAllocator> mamba_allocator_{};
std::optional<MambaHostAllocator> mamba_host_allocator_{};
KVPrefixCache kv_prefix_cache_;
HybridPrefixCache hybrid_prefix_cache_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will kv_prefix_cache_ and hybrid_prefix_cache_ be merged into a single variable in the future?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this will be addressed in the next follow-up PR. The current PR is mainly intended to keep the LOC manageable and merge the mamba & ds v4 cache models first, and avoiding introducing the KV ownership rewriting in step one, and now HybridPrefixCache already holds a reference to KVPrefixCache, so the follow-up change should be a relatively small one.

},
cache::state::RefreshMambaCheckpoint{
.allocator = local_mamba_allocator.get(),
.checkpoint_raw_position = state.window.begin + state.window.size + tokens_this_round_,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-up: the FSM still explicitly composes family-specific ops (e.g. RefreshMambaCheckpoint, AcquireRequestLocalKV) at each StepCommit call site. Would it be possible to encapsulate these per-phase op bundles inside HybridPrefixCache (e.g. a single StepCommit(Phase::PrefillChunk, context)) so the scheduler stays agnostic when new cache families are added?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be further folded into HybridPrefixCache as well with manageable size, I've pushed a new commit with the implementation.

@SimonCqk SimonCqk force-pushed the feat/hybrid-cache-abstraction branch from 2a8e9ad to 43b2532 Compare May 27, 2026 16:35
SimonCqk added 5 commits June 1, 2026 15:24
Unify the scheduler-facing hybrid cache lifecycle around HybridPrefixCache so
scheduler/FSM code calls lifecycle methods instead of spelling Mamba and
paged-cache choreography directly.

Add the family registry/recovery/lifecycle request types, RequestCacheContext,
and focused diagnostics/stat helpers needed for the unified cache model. Split
family-specific implementation into registry, Mamba, and paged-cache modules,
while keeping concrete allocator semantics and avoiding a broad virtual family
framework.

Consolidate tests around durable facade behavior: request-local accounting,
Mamba slot release/protection, worker metadata propagation, paged-cache matching,
admission, eviction, and registry behavior.

Signed-off-by: SimonCqk <cqk0100@gmail.com>
Deduplicate HybridPrefixCache recovery-plan construction and keep family-specific helpers file-local.

Replace admission and worker metadata boolean selectors with explicit kinds, while keeping scheduler call sites on the lifecycle facade.

Trim helper-mechanics tests in favor of durable facade/invariant coverage and remove obsolete hybrid cache design docs.

Signed-off-by: SimonCqk <cqk0100@gmail.com>
Drop unused registry, slice, demand, and recovery-plan scaffolding that had no production consumers. Keep the live hybrid cache facade centered on MatchPrefix, Admit, StepCommit, FinishRequest, and Stats, and move diagnostics tests to the Stats contract.

Signed-off-by: SimonCqk <cqk0100@gmail.com>
Collapse over-expanded hybrid cache test coverage into durable facade and invariant checks. Remove scaffold-oriented test files, internalize debug memory diagnostics, clean changed includes, and narrow low-level hybrid cache helpers behind the test peer where production callers do not need them.

Signed-off-by: SimonCqk <cqk0100@gmail.com>
Fold the hybrid-cache correctness follow-ups into one reviewable fix commit.

- refresh host-backed cache nodes when writeback completes so LRU state reflects recent writes

- preserve upstream state-family paged-cache admission credit semantics after the family-ops split

- restore Mamba finish-state working-slot fallback and make checkpoint refresh failures explicit

Signed-off-by: SimonCqk <cqk0100@gmail.com>
SimonCqk added 3 commits June 1, 2026 15:24
Replace phase-specific hybrid cache request bags with typed operation structs for publication, admission, request-local state, and worker metadata commits.

Keep scheduler-facing lifecycle calls explicit while pruning debug-only or implementation-shaped tests in favor of durable cache contract coverage.

Signed-off-by: SimonCqk <cqk0100@gmail.com>
- Introduce RequestLocalCacheState managed by HybridPrefixCache, consolidating
  request-local KV / Mamba adjunct creation, refresh, and publish.
- FSM state now holds a unique_ptr<RequestLocalCacheState>; remove
  LocalMambaAllocator, RefreshMambaCheckpoint, mamba_writeback_nodes.
- Generalize writeback ack to PendingHostWritebackState; unify KV demote and
  Mamba host ack in HybridPrefixCache::CompleteHostWriteBack().
- Drop production-facing Mamba allocator view from request.h and
  request_cache_context.h.
- Update test_mamba_cache.cpp to use HybridPrefixCacheTestPeer for test-only
  adjunct checks.

Signed-off-by: Simon_CQK <cqk0100@gmail.com>
Signed-off-by: Simon_CQK <cqk0100@gmail.com>
@SimonCqk SimonCqk force-pushed the feat/hybrid-cache-abstraction branch from a7b0e84 to 7bc95dc Compare June 1, 2026 07:25
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7bc95dce4a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}
};

std::vector<TreeNode*> path = CollectAncestorPathRootToLeaf(match.device.last_node);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the host match when assembling paged-cache hits

When a cached prefix has been written back to L2 and its device KV pages were evicted, KVPrefixCache::Match reports device.last_node at the root while host.last_node still points at the cached node. Starting the paged-cache snapshot walk from match.device.last_node therefore misses snapshots on host-only cached prefixes, cap_to_root() clears the host hit, and schedulePrefillFirstChunk will re-prefill instead of issuing the normal host→device loadback with borrowed paged-cache pages. This regresses L2 prefix reuse specifically after device eviction/writeback for requests using paged-cache adjuncts; the walk should consider the deeper usable host/device match rather than only the device path.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants