Skip to content

PASAUCMerced/StrataCache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StrataCache

StrataCache is a standalone tiered storage library for inference workloads. It provides a unified storage API over multiple memory/media layers (CPU memory, CXL, and future tiers) while keeping model/runtime-specific logic in adapters.

The project is designed for two concurrent goals:

  • Reuse and manage KV cache for inference engines such as vLLM.
  • Reuse the same storage plane for non-KV artifacts such as model parameter chunks used by offload/prefetch flows.

In short, StrataCache is not “KV-only cache code”; it is a generalized artifact storage substrate for LLM systems.

Why StrataCache (Motivation)

Modern inference systems move data across heterogeneous memory/media (GPU, CPU, CXL, SSD). Most implementations solve this in a task-specific way (for example KV-only), which creates duplicated logic and inconsistent policies.

StrataCache addresses this by:

  • Defining a generalized artifact abstraction (ArtifactId, metadata, bytes payload).
  • Providing a single tier coordinator (TierChain) with write-through/write-back semantics.
  • Exposing one public API (StorageEngine) used by both adapters and direct clients.

This allows KV caching and parameter offload/prefetch to share the same storage management model without key collisions or policy conflicts.

Main Use Cases

  1. vLLM KV cache reuse
  • Use StrataCacheConnectorV1 to store/load KV chunks by prefix and chunk boundaries.
  1. Parameter offload and prefetch
  • Use ParameterStoreClient on top of StorageEngine to store/load parameter chunks.
  1. Mixed workloads in one storage plane
  • Run KV connector and parameter offload together with unified tier policies.

Installation

From the stratacache/ directory:

python -m pip install -U uv
uv venv -p 3.12 .venv
source .venv/bin/activate
uv pip install -e .

vllm==0.13.0 is already included in pyproject.toml dependencies.

Minimal Run (Qwen + CPU Memory)

vllm serve Qwen/Qwen2.5-7B-Instruct \
  --port 8000 \
  --kv-transfer-config '{"kv_connector":"StrataCacheConnectorV1","kv_connector_module_path":"stratacache.adapters.vllm.connector_v1","kv_role":"kv_both"}'

Configuration (YAML)

StrataCache connector reads YAML configuration.

  • Default config file in this repo: config.yaml
  • Packaged fallback config for installed module: src/stratacache/config.yaml
  • Optional override path in vLLM config: kv_connector_extra_config["stratacache.config_path"]

If no YAML file is found, built-in defaults are used.

Connector Config Keys

Key Type Default Meaning
use_cxl bool false Enable CXL tier
writeback bool false Use write-back between CPU and CXL
cpu_capacity_mb int 61440 CPU tier capacity in MB
chunk_size int 256 Token chunk size for KV matching/store/load
bundle_layers bool true Store/load layered KV in bundled format
tensor_codec str stable Tensor payload codec
tensor_header_in_payload bool false Keep dtype/shape in payload header
save_partial_chunks bool true Save/load last partial chunk
log_stats bool true Enable connector stats logging
log_every int 50 Stats logging interval
log_min_interval_s float 2.0 Min seconds between logs
debug bool false Debug logging
cxl_dax_device str/null null CXL DAX device path
cxl_reset_metadata bool false Reset CXL metadata on init
telemetry.export bool false Enable telemetry exports

Design Overview

StrataCache separates storage responsibilities from task semantics.

Core Components

  1. Artifact model (core/)
  • Generic object identity + metadata + payload.
  1. Memory layers (backend/)
  • Per-medium implementation (CpuMemoryLayer, CxlMemoryLayer).
  1. Tier coordinator (tiering/TierChain)
  • Ordered tiers, read-through lookup, promotion, write-through/write-back propagation.
  1. Public API (engine/StorageEngine)
  • store/load/contains/delete with chain/exact/prefer access modes.
  1. Task adapters (adapters/)
  • vLLM KV connector for prefix/chunk/slot mapping logic.
  • Torch parameter client for parameter chunk workflows.

Core Logic (Read/Write Path)

Write path:

  • Client calls StorageEngine.store(...).
  • TierChain writes to selected tier and applies link policy.
  • In write-back mode, dirty tracking and background convergence are handled internally.

Read path:

  • Client calls StorageEngine.load(...).
  • TierChain scans tiers top-down (chain) or uses directed tier strategy (exact / prefer).
  • On lower-tier hit, optional promotion updates upper tiers.

KV-specific matching and token-boundary logic are adapter responsibilities, not storage-core responsibilities.

Project Layout

stratacache/
├── config.yaml
├── pyproject.toml
├── README.md
├── src/stratacache/
│   ├── core/          # artifact schema, codecs, errors, key builder
│   ├── backend/       # CPU/CXL memory layers
│   ├── tiering/       # TierChain and link policies
│   ├── writeback/     # dirty tracking and worker
│   ├── engine/        # StorageEngine public facade
│   └── adapters/
│       ├── vllm/      # KVConnector v1 integration
│       └── torch/     # parameter chunk client
└── tests/             # assert-based tests

Testing

PYTHONPATH=$(pwd)/src python tests/run_all.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors