refactor(scheduler): unify hybrid cache model boundaries.#243
Conversation
f1b2d2f to
3fb5a1e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3fb5a1eb3d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
5cf9571 to
8327c21
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8327c214fb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
c77e7c9 to
57330b6
Compare
| std::optional<MambaChunkAllocator> mamba_allocator_{}; | ||
| std::optional<MambaHostAllocator> mamba_host_allocator_{}; | ||
| KVPrefixCache kv_prefix_cache_; | ||
| HybridPrefixCache hybrid_prefix_cache_; |
There was a problem hiding this comment.
Will kv_prefix_cache_ and hybrid_prefix_cache_ be merged into a single variable in the future?
There was a problem hiding this comment.
yes, this will be addressed in the next follow-up PR. The current PR is mainly intended to keep the LOC manageable and merge the mamba & ds v4 cache models first, and avoiding introducing the KV ownership rewriting in step one, and now HybridPrefixCache already holds a reference to KVPrefixCache, so the follow-up change should be a relatively small one.
| }, | ||
| cache::state::RefreshMambaCheckpoint{ | ||
| .allocator = local_mamba_allocator.get(), | ||
| .checkpoint_raw_position = state.window.begin + state.window.size + tokens_this_round_, |
There was a problem hiding this comment.
As a follow-up: the FSM still explicitly composes family-specific ops (e.g. RefreshMambaCheckpoint, AcquireRequestLocalKV) at each StepCommit call site. Would it be possible to encapsulate these per-phase op bundles inside HybridPrefixCache (e.g. a single StepCommit(Phase::PrefillChunk, context)) so the scheduler stays agnostic when new cache families are added?
There was a problem hiding this comment.
can be further folded into HybridPrefixCache as well with manageable size, I've pushed a new commit with the implementation.
2a8e9ad to
43b2532
Compare
9ca52e1 to
a7b0e84
Compare
Unify the scheduler-facing hybrid cache lifecycle around HybridPrefixCache so scheduler/FSM code calls lifecycle methods instead of spelling Mamba and paged-cache choreography directly. Add the family registry/recovery/lifecycle request types, RequestCacheContext, and focused diagnostics/stat helpers needed for the unified cache model. Split family-specific implementation into registry, Mamba, and paged-cache modules, while keeping concrete allocator semantics and avoiding a broad virtual family framework. Consolidate tests around durable facade behavior: request-local accounting, Mamba slot release/protection, worker metadata propagation, paged-cache matching, admission, eviction, and registry behavior. Signed-off-by: SimonCqk <cqk0100@gmail.com>
Deduplicate HybridPrefixCache recovery-plan construction and keep family-specific helpers file-local. Replace admission and worker metadata boolean selectors with explicit kinds, while keeping scheduler call sites on the lifecycle facade. Trim helper-mechanics tests in favor of durable facade/invariant coverage and remove obsolete hybrid cache design docs. Signed-off-by: SimonCqk <cqk0100@gmail.com>
Drop unused registry, slice, demand, and recovery-plan scaffolding that had no production consumers. Keep the live hybrid cache facade centered on MatchPrefix, Admit, StepCommit, FinishRequest, and Stats, and move diagnostics tests to the Stats contract. Signed-off-by: SimonCqk <cqk0100@gmail.com>
Collapse over-expanded hybrid cache test coverage into durable facade and invariant checks. Remove scaffold-oriented test files, internalize debug memory diagnostics, clean changed includes, and narrow low-level hybrid cache helpers behind the test peer where production callers do not need them. Signed-off-by: SimonCqk <cqk0100@gmail.com>
Fold the hybrid-cache correctness follow-ups into one reviewable fix commit. - refresh host-backed cache nodes when writeback completes so LRU state reflects recent writes - preserve upstream state-family paged-cache admission credit semantics after the family-ops split - restore Mamba finish-state working-slot fallback and make checkpoint refresh failures explicit Signed-off-by: SimonCqk <cqk0100@gmail.com>
Replace phase-specific hybrid cache request bags with typed operation structs for publication, admission, request-local state, and worker metadata commits. Keep scheduler-facing lifecycle calls explicit while pruning debug-only or implementation-shaped tests in favor of durable cache contract coverage. Signed-off-by: SimonCqk <cqk0100@gmail.com>
- Introduce RequestLocalCacheState managed by HybridPrefixCache, consolidating request-local KV / Mamba adjunct creation, refresh, and publish. - FSM state now holds a unique_ptr<RequestLocalCacheState>; remove LocalMambaAllocator, RefreshMambaCheckpoint, mamba_writeback_nodes. - Generalize writeback ack to PendingHostWritebackState; unify KV demote and Mamba host ack in HybridPrefixCache::CompleteHostWriteBack(). - Drop production-facing Mamba allocator view from request.h and request_cache_context.h. - Update test_mamba_cache.cpp to use HybridPrefixCacheTestPeer for test-only adjunct checks. Signed-off-by: Simon_CQK <cqk0100@gmail.com>
Signed-off-by: Simon_CQK <cqk0100@gmail.com>
a7b0e84 to
7bc95dc
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7bc95dce4a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| }; | ||
|
|
||
| std::vector<TreeNode*> path = CollectAncestorPathRootToLeaf(match.device.last_node); |
There was a problem hiding this comment.
Use the host match when assembling paged-cache hits
When a cached prefix has been written back to L2 and its device KV pages were evicted, KVPrefixCache::Match reports device.last_node at the root while host.last_node still points at the cached node. Starting the paged-cache snapshot walk from match.device.last_node therefore misses snapshots on host-only cached prefixes, cap_to_root() clears the host hit, and schedulePrefillFirstChunk will re-prefill instead of issuing the normal host→device loadback with borrowed paged-cache pages. This regresses L2 prefix reuse specifically after device eviction/writeback for requests using paged-cache adjuncts; the walk should consider the deeper usable host/device match rather than only the device path.
Useful? React with 👍 / 👎.
Motivation
The scheduler path previously mixed scheduling logic with Mamba and paged-cache-specific lifecycle details. This
refactor makes the scheduler depend on a smaller hybrid-cache lifecycle boundary while keeping concrete family
behavior inside
HybridPrefixCache.This is a concrete module split, not a broad framework, plugin system, or universal allocator. Fuller family registry / policy abstractions are left as follow-ups once the common shape across more cache families is clearer.
Summary
HybridPrefixCachethe concrete lifecycle boundary forMatchPrefix, typed admission ops,StepCommit,FinishRequest.request-local state, and worker metadata.
hybrid_prefix_cache_types.hfor typed lifecycle request/result structures and recovery-plan plumbing.mamba_family_ops.cppfor Mamba-specific lifecycle behavior.paged_cache_family_ops.cppfor paged-cache group admission, match, ownership, and cleanup behavior.CacheCoordinatorscaffold and the unused scheduler prefetch operation implementation from the build.helpers.
Notes for Reviewers
A few follow-up-shaped pieces are deliberately not introduced in this PR to keep the size and review surface controlled:
HybridPrefixCacheremains the concrete coordinator. The family-specific code is split into responsibility-focused files, but not yet abstracted behind a generic family registry or polymorphic interface.
*_family_ops.cppfavors direct readable family logic over premature framework shape. A fuller registry/policylayer will be introduced later if additional cache families prove a stable common lifecycle.
attachment model and focuses on scheduler boundary cleanup, will be a follow up to do the generalization after this merged.
Test Plan & Regression Status
69 passed210 passed0.98.0.965.0.97.