feat(connector): add save-only pegaflow mode by xiaguan · Pull Request #300 · novitalabs/pegaflow

xiaguan · 2026-05-28T07:49:34Z

Summary

Add pegaflow.mode extra config with read_write default and save_only mode.
Make save-only mode skip Pega query/load while still advancing save metadata from absolute computed-token watermarks.
Add unit coverage for save-only scheduling/resume behavior and an E2E that combines full-hit DecodeBenchConnector + Pega save-only, then verifies a read-write Pega instance can load the saved KV.

E2E evidence

Save-only phase metrics: save_bytes=9175040 insertions=5 hits=0 load_bytes=0.
Read-write phase metrics: hits=4 load_bytes=7340032.
Save-only server log contains no cache_lookup, query_prefetch, or load; read-write phase logs cache_lookup: hit_blocks=4.

Tests

uv run --extra dev ruff check python/pegaflow/connector/__init__.py python/pegaflow/connector/common.py python/pegaflow/connector/scheduler.py python/tests/test_combine_hashes.py python/tests/test_vllm_save_only_e2e.py python/tests/vllm_helpers.py
uv run --extra test pytest tests/test_vllm_save_only_e2e.py --collect-only -q -m e2e
uv run --extra test pytest tests/test_combine_hashes.py -q
PYTHONPATH=/data/pegadev/pegaflow-save-only-mode/python /data/pegadev/pegaflow/.venv/bin/python -m pytest tests/test_vllm_save_only_e2e.py -m e2e -q -s --model Qwen/Qwen3-0.6B --e2e-port 18100
pre-commit hooks, including cargo test --release

feifei-111

LGTM

## chore(release): bump version to 0.22.4 Bumps the Rust workspace, the `pegaflow-llm` Python package, the commitizen version, and the `Cargo.lock` workspace package versions from `0.22.3` to `0.22.4`. --- ## Release notes — 0.22.3 → 0.22.4 18 PRs landed on `master` since `v0.22.3` (2026-05-15). Grouped below for release notes. ### Highlights - **Disaggregated prefill/decode over RDMA push** (#297) — a brand-new vLLM v1 KV connector (`PdConnector`) plus a v2 RDMA transfer engine (`pegaflow-transfer/src/v2`). KV is pushed prefill→decode layer-by-layer via one-sided RDMA WRITE as each attention layer completes, overlapping transfer with the forward pass instead of pulling after prefill finishes (vLLM NIXL model). On H20 / Qwen3-8B the added TTFT is **2–4× lower than NIXL** across 512–16k input lengths. - **Query leases replace query pinning** (#284, #288) — the query/load/release control path moved from pin refcounts to lease-backed ownership. Query results collapse to `Loading`/`Ready` only; `Ready` carries `num_hit_blocks` plus an opaque lease that transfers scheduler→worker and is released on cleanup/failure, with a TTL sweeper reclaiming abandoned leases. - **Save-only connector mode** (#300) — new `pegaflow.mode` config; `save_only` skips Pega query/load while still advancing save metadata, so an instance can populate the cache without serving reads. ### Features - feat(pd): RDMA push connector for disaggregated prefill/decode (#297) - feat(connector): add save-only pegaflow mode (#300) - feat(storage): sharded SSD cache — cache spread across multiple files, uring engine dispatches across shards, prefetch ready-blocks now ordered by requested keys (#299) - feat(rdma): per-peer N QPs with WQE-level round-robin — new `--qps-per-peer` (default 2), round-robin at WQE level so one in-flight task saturates all QPs; handshake validates both sides agree on N (#291) - feat(metaserver): node lifecycle fencing — heartbeat-based node tracking with per-node UUID fencing, stale nodes hidden via `--node-stale-secs` (#285, closes #222) - feat(connector): replace query pinning with leases (#284) ### Fixes - fix(connector): preserve non-MLA KV layout registration so cross-layer layouts (e.g. GLM-4.7-FP8) register by block stride; limit logical/physical block splitting to MLA (#295, closes #294) - fix(numa): allocate pinned pools on GPU-local NUMA nodes instead of the full CPU NUMA set, avoiding wasted capacity on CPU-only nodes (#293) - fix(connector): handle split physical KV blocks — group split physical rows into one logical block when FlashMLA uses smaller physical blocks (#292) - fix(connector): allow a query lease to be consumed once per registered worker, fixing multi-worker `query lease is unknown or expired` (#288) - fix(server): fail on invalid RDMA NICs — accept comma/space-separated `--nics`, reject empty names, propagate RDMA init failures instead of silently disabling P2P (#283, fixes #276) - fix(connector): remove the unused scheduler pending-save request limit and save-drop accounting (#282) - fix(connector): demote `cache_lookup_reuse` log from INFO to DEBUG to stop log spam under cache pressure (#280) ### Performance - perf: CPU-path Criterion benchmarks + long-block save optimizations — e.g. `query_prefetch_lease/32768` ~12.3 ms → ~6.1 ms, `save_flush_unique/8192` ~21.3 ms → ~13.1 ms via reduced prefix-key cloning, ordered multi-layer save grouping, and RawBlock inline-segment allocation (#290) ### Internal / refactor / tests - refactor(metrics): centralize histogram buckets behind a `build_buckets` helper (#298) - refactor(core): make prefetch tasks terminal (Ready results carry RAM prefix blocks); default storage admission to no TinyLFU unless explicitly enabled (#287) - test(server): mock vLLM gRPC E2E harness covering save/query/load/release/session contracts (#289) - chore: tune transfer duration histogram buckets toward long-tail visibility (#281) ### Notable behavior & config changes (upgrade notes) - **Query API**: query results are now `Loading`/`Ready` only; `Ready` exposes `num_hit_blocks` + an opaque lease. Pin/unpin refcount semantics are gone (#284, #288). - **Release RPC**: unknown/expired leases now return `FailedPrecondition` instead of being silently accepted (#289). - **`--nics`**: now rejects empty entries (e.g. `mlx5_0,,mlx5_1`) and fails startup on RDMA init errors rather than silently falling back to no-P2P (#283). - **New CLI flags**: `--qps-per-peer` (default 2) (#291), `--node-stale-secs` for metaserver (#285). - **New connector config**: `pegaflow.mode` with `read_write` (default) / `save_only` (#300). - **Storage admission**: TinyLFU is now off unless explicitly enabled (#287). ### Full PR list ``` #298 refactor(metrics): use build_buckets helper for histogram buckets #300 feat(connector): add save-only pegaflow mode #299 feat(storage): add sharded SSD cache support #297 feat(pd): RDMA push connector for disaggregated prefill/decode #290 perf: add cpu path benchmarks and optimize long-block saves #295 fix(connector): preserve non-MLA kv layout registration #293 fix(numa): allocate pinned pools on GPU-local NUMA nodes #292 fix(connector): handle split physical kv blocks #291 feat(rdma): per-peer N QPs with WQE-level round-robin #285 feat(metaserver): add node lifecycle fencing #287 refactor(core): make prefetch task terminal #289 test(server): add mock vLLM RPC E2E coverage #288 fix(connector): allow query leases across workers #284 feat(connector): replace query pinning with leases #283 fix(server): fail on invalid rdma nics #282 fix(connector): remove scheduler save limit #281 chore: tune transfer duration buckets #280 fix(connector): demote cache_lookup_reuse log to debug ```

xiaguan added 2 commits May 28, 2026 15:48

feat(connector): add save-only pegaflow mode

284c519

docs(connector): document save-only mode

edff91b

feifei-111 approved these changes May 29, 2026

View reviewed changes

xiaguan merged commit cf6f9cd into master May 29, 2026
12 checks passed

xiaguan deleted the feat/pegaflow-save-only-mode branch May 29, 2026 03:17

xiaguan mentioned this pull request May 29, 2026

chore(release): bump version to 0.22.4 #302

Merged

xiaguan mentioned this pull request Jun 1, 2026

feat(pd): MLA cache layout support + PD RDMA perf/stability #308

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(connector): add save-only pegaflow mode#300

feat(connector): add save-only pegaflow mode#300
xiaguan merged 2 commits into
masterfrom
feat/pegaflow-save-only-mode

xiaguan commented May 28, 2026

Uh oh!

feifei-111 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xiaguan commented May 28, 2026

Summary

E2E evidence

Tests

Uh oh!

feifei-111 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants