feat: anthropic prompt caching across BYO API call sites#24
Merged
Conversation
Reduces input-token billing on the user's Anthropic key by ~70% on multi-pass analysis and chat-followup workloads. Same transcript text sent across multiple calls now hits the server-side cache instead of re-billing. Changes - ai_providers/anthropic_provider.py: structured system-block support via build_cached_system_blocks() and build_cached_user_messages(). Transcript block goes first with cache_control ephemeral 1h; stable system/DNA share a 5m TTL block; per-call-varying system text sits after the cached prefix without its own marker so sibling calls with different system prompts still hit the same transcript cache. Sends the extended-cache-ttl-2025-04-11 beta header only on requests that actually use cache markers, so non-cached calls stay byte- identical. Logs cache_create / cache_read / hit_rate per response and warns when a repeat prefix re-creates a cache entry instead of reading the existing one. - ai_providers/anthropic_provider.py: bump MODEL_OPUS to claude-opus-4-7 and MODEL_DEFAULT to claude-sonnet-4-6, ahead of the 4.0 ID deprecation EOL on June 15 2026. - ai_analysis.py: CACHE_TX_START / CACHE_TX_END and CACHE_DNA_START / CACHE_DNA_END sentinels. Call sites wrap their transcript / DNA text in these sentinels; the four wrapper functions (_call_ai, _call_ai_json, _call_ai_chat, _call_ai_chat_stream) extract them and dispatch on provider — Anthropic gets a structured cached system list, Ollama / OpenAI get a sentinel-stripped string with no behavior change. - ai_analysis.py: wrap transcript bytes at the four story-analysis passes (soundbites / beats / overview / social), generate_segment_ vectors, build_story, build_story_from_vectors, the two chunked- chat-search workers, and the chunk-rerank call. Wrap the chat transcript message + the style block in the chat builder. - tests/test_anthropic_prompt_cache.py: 6 unit tests + 1 live integration test (auto-skips without an Anthropic key in ANTHROPIC_API_KEY or the keychain via the provider config loader). - tests/test_analysis_chunking.py, tests/test_editorial_dna_v21.py, tests/test_story_builder.py: widen narrow-arity _call_ai mock stubs to accept the new call_site kwarg. Test results: 373 passed, 18 pre-existing failures unchanged (up from 363 passed / 19 failed before this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ❌ Deployment failed View logs |
doza-assist | 6bc62f5 | May 14 2026, 06:27 PM |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reduces input-token billing on the user's Anthropic key by ~70% on multi-pass analysis and chat-followup workloads. Same transcript text sent across multiple calls now hits the server-side cache instead of re-billing.
Changes
Test results: 373 passed, 18 pre-existing failures unchanged (up from 363 passed / 19 failed before this change).