Skip to content

feat: anthropic prompt caching across BYO API call sites#24

Merged
DozaVisuals merged 1 commit into
mainfrom
claude/laughing-lovelace-9c5db7
May 14, 2026
Merged

feat: anthropic prompt caching across BYO API call sites#24
DozaVisuals merged 1 commit into
mainfrom
claude/laughing-lovelace-9c5db7

Conversation

@DozaVisuals
Copy link
Copy Markdown
Owner

Reduces input-token billing on the user's Anthropic key by ~70% on multi-pass analysis and chat-followup workloads. Same transcript text sent across multiple calls now hits the server-side cache instead of re-billing.

Changes

  • ai_providers/anthropic_provider.py: structured system-block support via build_cached_system_blocks() and build_cached_user_messages(). Transcript block goes first with cache_control ephemeral 1h; stable system/DNA share a 5m TTL block; per-call-varying system text sits after the cached prefix without its own marker so sibling calls with different system prompts still hit the same transcript cache. Sends the extended-cache-ttl-2025-04-11 beta header only on requests that actually use cache markers, so non-cached calls stay byte- identical. Logs cache_create / cache_read / hit_rate per response and warns when a repeat prefix re-creates a cache entry instead of reading the existing one.
  • ai_providers/anthropic_provider.py: bump MODEL_OPUS to claude-opus-4-7 and MODEL_DEFAULT to claude-sonnet-4-6, ahead of the 4.0 ID deprecation EOL on June 15 2026.
  • ai_analysis.py: CACHE_TX_START / CACHE_TX_END and CACHE_DNA_START / CACHE_DNA_END sentinels. Call sites wrap their transcript / DNA text in these sentinels; the four wrapper functions (_call_ai, _call_ai_json, _call_ai_chat, _call_ai_chat_stream) extract them and dispatch on provider — Anthropic gets a structured cached system list, Ollama / OpenAI get a sentinel-stripped string with no behavior change.
  • ai_analysis.py: wrap transcript bytes at the four story-analysis passes (soundbites / beats / overview / social), generate_segment_ vectors, build_story, build_story_from_vectors, the two chunked- chat-search workers, and the chunk-rerank call. Wrap the chat transcript message + the style block in the chat builder.
  • tests/test_anthropic_prompt_cache.py: 6 unit tests + 1 live integration test (auto-skips without an Anthropic key in ANTHROPIC_API_KEY or the keychain via the provider config loader).
  • tests/test_analysis_chunking.py, tests/test_editorial_dna_v21.py, tests/test_story_builder.py: widen narrow-arity _call_ai mock stubs to accept the new call_site kwarg.

Test results: 373 passed, 18 pre-existing failures unchanged (up from 363 passed / 19 failed before this change).

Reduces input-token billing on the user's Anthropic key by ~70% on
multi-pass analysis and chat-followup workloads. Same transcript text
sent across multiple calls now hits the server-side cache instead of
re-billing.

Changes
- ai_providers/anthropic_provider.py: structured system-block support
  via build_cached_system_blocks() and build_cached_user_messages().
  Transcript block goes first with cache_control ephemeral 1h; stable
  system/DNA share a 5m TTL block; per-call-varying system text sits
  after the cached prefix without its own marker so sibling calls
  with different system prompts still hit the same transcript cache.
  Sends the extended-cache-ttl-2025-04-11 beta header only on requests
  that actually use cache markers, so non-cached calls stay byte-
  identical. Logs cache_create / cache_read / hit_rate per response
  and warns when a repeat prefix re-creates a cache entry instead of
  reading the existing one.
- ai_providers/anthropic_provider.py: bump MODEL_OPUS to claude-opus-4-7
  and MODEL_DEFAULT to claude-sonnet-4-6, ahead of the 4.0 ID
  deprecation EOL on June 15 2026.
- ai_analysis.py: CACHE_TX_START / CACHE_TX_END and CACHE_DNA_START
  / CACHE_DNA_END sentinels. Call sites wrap their transcript / DNA
  text in these sentinels; the four wrapper functions (_call_ai,
  _call_ai_json, _call_ai_chat, _call_ai_chat_stream) extract them
  and dispatch on provider — Anthropic gets a structured cached
  system list, Ollama / OpenAI get a sentinel-stripped string with
  no behavior change.
- ai_analysis.py: wrap transcript bytes at the four story-analysis
  passes (soundbites / beats / overview / social), generate_segment_
  vectors, build_story, build_story_from_vectors, the two chunked-
  chat-search workers, and the chunk-rerank call. Wrap the chat
  transcript message + the style block in the chat builder.
- tests/test_anthropic_prompt_cache.py: 6 unit tests + 1 live
  integration test (auto-skips without an Anthropic key in
  ANTHROPIC_API_KEY or the keychain via the provider config loader).
- tests/test_analysis_chunking.py, tests/test_editorial_dna_v21.py,
  tests/test_story_builder.py: widen narrow-arity _call_ai mock
  stubs to accept the new call_site kwarg.

Test results: 373 passed, 18 pre-existing failures unchanged
(up from 363 passed / 19 failed before this change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
❌ Deployment failed
View logs
doza-assist 6bc62f5 May 14 2026, 06:27 PM

@DozaVisuals DozaVisuals merged commit b4a4b17 into main May 14, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant