feat: anthropic prompt caching across BYO API call sites by DozaVisuals · Pull Request #24 · DozaVisuals/doza-assist

DozaVisuals · 2026-05-14T18:44:18Z

Reduces input-token billing on the user's Anthropic key by ~70% on multi-pass analysis and chat-followup workloads. Same transcript text sent across multiple calls now hits the server-side cache instead of re-billing.

Changes

ai_providers/anthropic_provider.py: structured system-block support via build_cached_system_blocks() and build_cached_user_messages(). Transcript block goes first with cache_control ephemeral 1h; stable system/DNA share a 5m TTL block; per-call-varying system text sits after the cached prefix without its own marker so sibling calls with different system prompts still hit the same transcript cache. Sends the extended-cache-ttl-2025-04-11 beta header only on requests that actually use cache markers, so non-cached calls stay byte- identical. Logs cache_create / cache_read / hit_rate per response and warns when a repeat prefix re-creates a cache entry instead of reading the existing one.
ai_providers/anthropic_provider.py: bump MODEL_OPUS to claude-opus-4-7 and MODEL_DEFAULT to claude-sonnet-4-6, ahead of the 4.0 ID deprecation EOL on June 15 2026.
ai_analysis.py: CACHE_TX_START / CACHE_TX_END and CACHE_DNA_START / CACHE_DNA_END sentinels. Call sites wrap their transcript / DNA text in these sentinels; the four wrapper functions (_call_ai, _call_ai_json, _call_ai_chat, _call_ai_chat_stream) extract them and dispatch on provider — Anthropic gets a structured cached system list, Ollama / OpenAI get a sentinel-stripped string with no behavior change.
ai_analysis.py: wrap transcript bytes at the four story-analysis passes (soundbites / beats / overview / social), generate_segment_ vectors, build_story, build_story_from_vectors, the two chunked- chat-search workers, and the chunk-rerank call. Wrap the chat transcript message + the style block in the chat builder.
tests/test_anthropic_prompt_cache.py: 6 unit tests + 1 live integration test (auto-skips without an Anthropic key in ANTHROPIC_API_KEY or the keychain via the provider config loader).
tests/test_analysis_chunking.py, tests/test_editorial_dna_v21.py, tests/test_story_builder.py: widen narrow-arity _call_ai mock stubs to accept the new call_site kwarg.

Test results: 373 passed, 18 pre-existing failures unchanged (up from 363 passed / 19 failed before this change).

Reduces input-token billing on the user's Anthropic key by ~70% on multi-pass analysis and chat-followup workloads. Same transcript text sent across multiple calls now hits the server-side cache instead of re-billing. Changes - ai_providers/anthropic_provider.py: structured system-block support via build_cached_system_blocks() and build_cached_user_messages(). Transcript block goes first with cache_control ephemeral 1h; stable system/DNA share a 5m TTL block; per-call-varying system text sits after the cached prefix without its own marker so sibling calls with different system prompts still hit the same transcript cache. Sends the extended-cache-ttl-2025-04-11 beta header only on requests that actually use cache markers, so non-cached calls stay byte- identical. Logs cache_create / cache_read / hit_rate per response and warns when a repeat prefix re-creates a cache entry instead of reading the existing one. - ai_providers/anthropic_provider.py: bump MODEL_OPUS to claude-opus-4-7 and MODEL_DEFAULT to claude-sonnet-4-6, ahead of the 4.0 ID deprecation EOL on June 15 2026. - ai_analysis.py: CACHE_TX_START / CACHE_TX_END and CACHE_DNA_START / CACHE_DNA_END sentinels. Call sites wrap their transcript / DNA text in these sentinels; the four wrapper functions (_call_ai, _call_ai_json, _call_ai_chat, _call_ai_chat_stream) extract them and dispatch on provider — Anthropic gets a structured cached system list, Ollama / OpenAI get a sentinel-stripped string with no behavior change. - ai_analysis.py: wrap transcript bytes at the four story-analysis passes (soundbites / beats / overview / social), generate_segment_ vectors, build_story, build_story_from_vectors, the two chunked- chat-search workers, and the chunk-rerank call. Wrap the chat transcript message + the style block in the chat builder. - tests/test_anthropic_prompt_cache.py: 6 unit tests + 1 live integration test (auto-skips without an Anthropic key in ANTHROPIC_API_KEY or the keychain via the provider config loader). - tests/test_analysis_chunking.py, tests/test_editorial_dna_v21.py, tests/test_story_builder.py: widen narrow-arity _call_ai mock stubs to accept the new call_site kwarg. Test results: 373 passed, 18 pre-existing failures unchanged (up from 363 passed / 19 failed before this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-05-14T18:44:24Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
❌ Deployment failed View logs	doza-assist	`6bc62f5`	May 14 2026, 06:27 PM

DozaVisuals merged commit b4a4b17 into main May 14, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: anthropic prompt caching across BYO API call sites#24

feat: anthropic prompt caching across BYO API call sites#24
DozaVisuals merged 1 commit into
mainfrom
claude/laughing-lovelace-9c5db7

DozaVisuals commented May 14, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

DozaVisuals commented May 14, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented May 14, 2026

Deploying with Cloudflare Workers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant