Skip to content

[python] update anthropic llmobs tests for new cache ttl metrics#6480

Draft
Yun-Kim wants to merge 3 commits intomainfrom
yunkim/llmobs-anthropic-ttl-cache-metrics
Draft

[python] update anthropic llmobs tests for new cache ttl metrics#6480
Yun-Kim wants to merge 3 commits intomainfrom
yunkim/llmobs-anthropic-ttl-cache-metrics

Conversation

@Yun-Kim
Copy link
Contributor

@Yun-Kim Yun-Kim commented Mar 12, 2026

Motivation

Account for additional cache creation ttl breakdown metrics in the anthropic integration.

Changes

Workflow

  1. ⚠️ Create your PR as draft ⚠️
  2. Work on you PR until the CI passes
  3. Mark it as ready for review
    • Test logic is modified? -> Get a review from RFC owner.
    • Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed and the CI green, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

  • Anything but tests/ or manifests/ is modified ? I have the approval from R&P team
  • A docker base image is modified?
    • the relevant build-XXX-image label is present
  • A scenario is added, removed or renamed?

@Yun-Kim Yun-Kim requested a review from a team as a code owner March 12, 2026 14:22
@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

CODEOWNERS have been resolved as:

manifests/python.yml                                                    @DataDog/apm-python @DataDog/asm-python
tests/integration_frameworks/llm/anthropic/test_anthropic_llmobs.py     @DataDog/ml-observability

Copy link
Contributor

@sabrenner sabrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would:

  1. update the manifest to say these tests are a missing_feature, so that we can land these test changes.
  2. update dd-trace-py to use the most recent system-test hash (if that's still the process) - then your PR should pass and these missing_feature will XPASS
  3. Then, once the feature is released, we can add the version to the manifest for these tests.

@Yun-Kim Yun-Kim requested review from a team as code owners March 12, 2026 19:47
@Yun-Kim Yun-Kim requested review from gnufede and quinna-h and removed request for a team March 12, 2026 19:47
@datadog-prod-us1-4
Copy link

datadog-prod-us1-4 bot commented Mar 12, 2026

⚠️ Tests

Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🧪 52 Tests failed

tests.integration_frameworks.llm.anthropic.test_anthropic_llmobs.TestAnthropicLlmObsMessages.test_create_content_block[False, anthropic-js@0.71.0] from system_tests_suite (Datadog) (Fix with Cursor)
assert {'_dd': {'spa...00, ...}, ...} == {'_dd': <ANY>...Y>, ...}, ...}
  Omitting 9 identical items, use -vv to show
  Differing items:
  {'metrics': {'cache_read_input_tokens': 0, 'cache_write_input_tokens': 0, 'input_tokens': 28, 'output_tokens': 100, ...}} != {'metrics': {'cache_read_input_tokens': <ANY>, 'cache_write_input_tokens': <ANY>, 'ephemeral_1h_input_tokens': <ANY>, 'ephemeral_5m_input_tokens': <ANY>, ...}}
  Full diff:
    {
  -  '_dd': <ANY>,
  -  'duration': <ANY>,
  +  '_dd': {'span_id': '956636476188409651',
  +          'trace_id': '69b319bf000000000d46a7ce7e643f33'},
...
tests.integration_frameworks.llm.anthropic.test_anthropic_llmobs.TestAnthropicLlmObsMessages.test_create_content_block[False, anthropic-js@0.71.0] from system_tests_suite (Datadog) (Fix with Cursor)
assert {'_dd': {'spa...00, ...}, ...} == {'_dd': <ANY>...Y>, ...}, ...}
  Omitting 9 identical items, use -vv to show
  Differing items:
  {'metrics': {'cache_read_input_tokens': 0, 'cache_write_input_tokens': 0, 'input_tokens': 28, 'output_tokens': 100, ...}} != {'metrics': {'cache_read_input_tokens': <ANY>, 'cache_write_input_tokens': <ANY>, 'ephemeral_1h_input_tokens': <ANY>, 'ephemeral_5m_input_tokens': <ANY>, ...}}
  Full diff:
    {
  -  '_dd': <ANY>,
  -  'duration': <ANY>,
  +  '_dd': {'span_id': '937739240664233781',
  +          'trace_id': '69b31a03000000000d0384defbbc7735'},
...
tests.integration_frameworks.llm.anthropic.test_anthropic_llmobs.TestAnthropicLlmObsMessages.test_create_content_block[False, anthropic-py@0.75.0] from system_tests_suite (Datadog) (Fix with Cursor)
assert {'_dd': {'apm...00, ...}, ...} == {'_dd': <ANY>...Y>, ...}, ...}
  Omitting 9 identical items, use -vv to show
  Differing items:
  {'metrics': {'cache_read_input_tokens': 0, 'cache_write_input_tokens': 0, 'input_tokens': 28, 'output_tokens': 100, ...}} != {'metrics': {'cache_read_input_tokens': <ANY>, 'cache_write_input_tokens': <ANY>, 'ephemeral_1h_input_tokens': <ANY>, 'ephemeral_5m_input_tokens': <ANY>, ...}}
  Full diff:
    {
  -  '_dd': <ANY>,
  -  'duration': <ANY>,
  +  '_dd': {'apm_trace_id': '69b319bb000000009eafebae9303f2ac',
  +          'span_id': '2820139080436260687',
...
View all

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: a33f54b | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

@Yun-Kim Yun-Kim marked this pull request as draft March 12, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants