Skip to content

(feat) L3 KVStore: prefetch and backup support#293

Open
ehuohz wants to merge 3 commits into
lightseekorg:mainfrom
ehuohz:main
Open

(feat) L3 KVStore: prefetch and backup support#293
ehuohz wants to merge 3 commits into
lightseekorg:mainfrom
ehuohz:main

Conversation

@ehuohz
Copy link
Copy Markdown

@ehuohz ehuohz commented May 28, 2026

Summary

Prefetch (L3 → Host)

On request submission, query Mooncake for existing KV pages. If hits exceed prefetch_threshold, take an async prefetch path (Submitted → Prefetching → PrefetchDone → Prefilling) instead of direct prefill. Completed pages are inserted into the radix tree's host layer with proper OwnedPages ownership transfer.

Backup (Host → L3)

On WriteBackDone, emit a fire-and-forget BackUpOperation to persist host pages to Mooncake. Backup metadata is captured at WriteBackOperation creation time while the Draining state's host node-ref is still alive.

Key changes

  • FSM: Add SchedulePrefillFirstChunkEvent::operator()(PrefetchDone&&) via templated applyFirstChunk() so prefetch-completed requests can enter prefill.
  • forward.cpp: Attempt schedulePrefetch for Submitted requests before falling through to prefill. Treat PrefetchDone with same scheduling priority.
  • outside_event_handler.cpp: Transfer host page ownership via OwnedPages into Insert(); RAII-free uncompleted pages. Add WriteBackDone hook to emit BackUpOperation.
  • scheduler.cpp: Capture L3 backup metadata (rolling hashes, host page IDs) in CacheOpSpec during newWriteBackOperation. Drain pending_prefetch_ops_ and pending_backup_ops_ in NextExecutionPlan().

Test Plan

  • Served kimi-k2.5 with Mooncake KVStore enabled in dev container
  • Sent a long-generation request (long prompt, max_tokens=262144), then sent multiple different requests to fill device/host cache and force eviction of the first request's KV pages to L3
  • Re-sent the same first request; verified L3 prefetch path activated (KV pages fetched back from Mooncake → host → device) and output matched the original response
  • Confirmed L3 fetch via Mooncake batch_get_into logs (439 tokens prompt, 439/64 = 6, 6 * TP4 = 24):
Mooncake log:
 | Requests (Success/Total): PutStart=4/4, PutEnd=4/4, PutRevoke=0/0, Get=4/4, Exist=4/4, Del=0/0, DelAll=0/0, Ping=2228/2228, CopyStart=0/0, CopyEnd=0/0, CopyRevoke=0/0, MoveStart=0/0, MoveEnd=0/0, MoveRevoke=0/0, EvictDiskReplica=0/0 | Batch Requests (Req=Success/PartialSuccess/Total, Item=Success/Total): PutStart:(Req=14/0/51, Item=1369/5238), PutEnd:(Req=14/0/14, Item=1369/1369), PutRevoke:(Req=0/0/0, Item=0/0), Get:(Req=4/0/4, Item=24/24), ExistKey:(Req=64/0/64, Item=5524/5524), QueryIp:(Req=0/0/0, Item=0/0), Clear:(Req=0/0/0, Item=0/0), CreateMoveTask:(Req=0/0), CreateCopyTask:(Req=0/0), QueryTask=(Req=0/0), FetchTasks=(Req=2228/2228), MarkTaskToComplete= (Req=0/0),  | Eviction: Success/Attempts=0/0, keys=0, size=0 B | Discard: Released/Total=0/0, StagingSize=0 B | Snapshots: Success=0, Fail=0}, ha={HA Metrics Summary: last_seq=0, applied_seq=0, lag=0, pending=0, mutation_queue=0, batch_commits=0, sync_commits=0, skipped=0, checksum_fail=0, etcd_fail=0, watch_disconn=0, state=0}

ts log:
[ts] I0528 04:36:33.135007 2732203 real_client.cpp:3556] Time taken for batch_get_into: 9814us, read store: 0us, with memory key count: 6, offload key count: 0
[ts] I0528 04:36:33.135859 2732205 real_client.cpp:3556] Time taken for batch_get_into: 10103us, read store: 0us, with memory key count: 6, offload key count: 0
[ts] I0528 04:36:33.136945 2732206 real_client.cpp:3556] Time taken for batch_get_into: 11312us, read store: 0us, with memory key count: 6, offload key count: 0
[ts] I0528 04:36:33.147481 2732204 real_client.cpp:3556] Time taken for batch_get_into: 22004us, read store: 0us, with memory key count: 6, offload key count: 0

@ehuohz ehuohz requested a review from a team as a code owner May 28, 2026 07:03
@XucSh XucSh self-assigned this May 28, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 487eba9340

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tokenspeed-scheduler/csrc/scheduler/operations/forward.cpp
Comment thread tokenspeed-scheduler/csrc/scheduler/scheduler.cpp Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c707d29204

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tokenspeed-scheduler/csrc/scheduler/outside_event_handler.cpp
@ehuohz ehuohz force-pushed the main branch 2 times, most recently from b6586ec to e732341 Compare May 28, 2026 08:04
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e732341112

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread python/tokenspeed/runtime/engine/event_loop.py
ehuohz added 3 commits May 28, 2026 08:25
Signed-off-by: He Zhou <zhouhe2025@gmail.com>
…ransition

Signed-off-by: He Zhou <zhouhe2025@gmail.com>
Signed-off-by: He Zhou <zhouhe2025@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 074c709457

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tokenspeed-scheduler/csrc/scheduler/operations/forward.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants