Let clients mark stable prompt anchors for Skippy cache by i386 · Pull Request #793 · Mesh-LLM/mesh-llm

i386 · 2026-06-04T10:43:57Z

Long agentic prompts can now tell Skippy which token prefix is intended to stay stable across turns. The request can carry prompt_cache_anchor_tokens alongside prompt_cache_key, and Skippy records that exact prefix as a resident cache anchor while leaving the existing exact/full/generic prefix restore path in charge.

This is aimed at tool-heavy and long-context sessions where the beginning of the prompt is durable, but the tail keeps changing with tool results, scratchpad state, or retrieved context.

Why

The previous prefix-cache work made repeated prompts much cheaper when the runtime could discover and replay an exact cached prefix. That helps, but it still leaves the client unable to say: "this many tokens are the stable system/tool/session prefix; please keep this boundary hot."

For agent workloads, that boundary is often known at request construction time. Making it explicit gives Skippy a durable cache target that is independent of the changing suffix.

Before

sequenceDiagram
    participant Client
    participant Stage0
    participant Stage1
    participant Stage2

    Client->>Stage0: prompt_cache_key + full prompt
    Stage0->>Stage1: prefill chunks
    Stage1->>Stage2: prefill chunks
    Stage2-->>Stage0: decode replies
    Note over Stage0,Stage2: Cache records discovered exact/grid prefixes

    Client->>Stage0: same prefix + changed suffix
    Stage0->>Stage0: find best discovered prefix
    Stage0->>Stage1: restore prefix, then prefill suffix

The runtime could reuse what it discovered, but the client had no protocol surface for naming the stable prefix that should survive changing prompt tails.

After

sequenceDiagram
    participant Client
    participant Stage0
    participant Stage1
    participant Stage2

    Client->>Stage0: prompt_cache_key + prompt_cache_anchor_tokens=N
    Stage0->>Stage0: validate N is a strict token prefix
    Stage0->>Stage1: prefill/decode as normal
    Stage0->>Stage0: record explicit N-token anchor
    Stage0->>Stage1: decode sideband carries N
    Stage1->>Stage1: record the same anchor prefix
    Stage1->>Stage2: decode sideband carries N
    Stage2->>Stage2: record the same anchor prefix

    Client->>Stage0: same key + changed suffix + N
    Stage0->>Stage0: prefer existing exact/full/generic restore
    Stage0->>Stage1: use anchor restore only if no prefix restored

The anchor is conservative by design:

ignored without prompt_cache_key
ignored when zero or not a strict prefix of the prompt
exact-token only; no lossy approximation
existing exact/full/generic prefix restore is preferred
anchor restore is attempted only when no prefix was restored by the existing cache path

Performance

This PR adds the control-plane and cache-policy hook for stable prompt anchors. It does not claim a universal wall-clock speedup from the local benchmark shape, because the existing multi-token replay cache already captured nearly the same prefix in that run.

Local sanity benchmark:

model: Qwen3 0.6B Q4_K_M
topology: 3 local Skippy stages
context: 8192
prompt: synthetic 35-line agent transcript with a shared stable prefix and changing suffix
output: max_tokens=4
cache grid: 256-token shared-prefix stride
requested anchor: 1600 tokens

Scenario	Cold TTFT	Warm TTFT median	Warm elapsed median	Observed cache behavior
Existing prefix cache	26.72s	1.53s	1.62s	1536-token generic/grid restore
Anchor hint requested	28.50s	1.63s	1.76s	anchor recorded; existing 1536-token restore still preferred

The important result here is correctness and safety: the anchor hint records cleanly and does not force a worse restore path over an already-good generic prefix restore. The expected win is in workloads where the useful stable prefix is not already captured by the grid, or where cache retention should favor a long-lived agent prefix over volatile prompt tails.

Protocol

Adds an optional OpenAI-compatible request extension:

prompt_cache_anchor_tokens

It is additive on the HTTP request surface. Existing clients do not need to send it. New servers only honor it when paired with prompt_cache_key.

Inside Skippy, downstream stages learn the requested anchor through the existing decode-record sideband shape, so this does not add a mesh protobuf field or a new public stream type. Mixed-version staged chains should not rely on this hint until all Skippy stages are updated, because older stages will not record or restore the anchor.

Validation

cargo fmt --all -- --check
git diff --check
just with-lld cargo test -p skippy-server --lib
just with-lld cargo clippy -p skippy-server --all-targets -- -D warnings
just with-lld cargo clippy -p openai-frontend --all-targets -- -D warnings
just with-lld cargo clippy -p mesh-llm-host-runtime --all-targets -- -D warnings
just with-lld cargo clippy -p mesh-llm --all-targets -- -D warnings
just build

Benchmark/sanity run output was captured under:

/tmp/prompt-anchor-cache-bench-20260604-203642

i386 · 2026-06-05T05:06:32Z

@michaelneale I think this one will be useful when we can classify traffic - e.g. long agent tool calls vs chat

michaelneale · 2026-06-05T05:11:43Z

yeah I think a very good idea.

i386 added 4 commits June 4, 2026 17:16

Expose prefix cache savings telemetry

9eee8ab

Implement exact full-prompt first-token cache

79a4b0a

Replay cached tokens for exact Skippy prompts

60e7068

Add explicit Skippy prompt anchor cache hints

f77cf6b

i386 marked this pull request as draft June 4, 2026 10:51

Base automatically changed from codex/skippy-multi-token-replay-cache to main June 5, 2026 05:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let clients mark stable prompt anchors for Skippy cache#793

Let clients mark stable prompt anchors for Skippy cache#793
i386 wants to merge 4 commits into
mainfrom
codex/skippy-prompt-anchor-cache

i386 commented Jun 4, 2026

Uh oh!

i386 commented Jun 5, 2026

Uh oh!

michaelneale commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

i386 commented Jun 4, 2026

Why

Before

After

Performance

Protocol

Validation

Uh oh!

i386 commented Jun 5, 2026

Uh oh!

michaelneale commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants