refactor(metrics): replace swap-and-drain delta collection with in-place iteration#3420
Merged
cijothomas merged 6 commits intoopen-telemetry:mainfrom Apr 25, 2026
Conversation
This was referenced Mar 14, 2026
cijothomas
reviewed
Mar 14, 2026
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3420 +/- ##
=====================================
Coverage 83.2% 83.3%
=====================================
Files 128 128
Lines 25086 25177 +91
=====================================
+ Hits 20896 20978 +82
- Misses 4190 4199 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cijothomas
reviewed
Mar 14, 2026
bryantbiggs
added a commit
to bryantbiggs/opentelemetry-rust
that referenced
this pull request
Mar 14, 2026
Add bound_count field to TrackerEntry to track live bound instrument handles. Entries with bound_count > 0 are never evicted during delta collection, ensuring bound handles always point to a live tracker. Moved from open-telemetry#3420 per review feedback — bound_count belongs with the bound instruments feature, not the delta collection refactor.
cijothomas
reviewed
Mar 14, 2026
c055494 to
0e888e0
Compare
bryantbiggs
added a commit
to bryantbiggs/opentelemetry-rust
that referenced
this pull request
Mar 14, 2026
Add bound_count field to TrackerEntry to track live bound instrument handles. Entries with bound_count > 0 are never evicted during delta collection, ensuring bound handles always point to a live tracker. Moved from open-telemetry#3420 per review feedback — bound_count belongs with the bound instruments feature, not the delta collection refactor.
bryantbiggs
added a commit
to bryantbiggs/opentelemetry-rust
that referenced
this pull request
Mar 15, 2026
Add bound_count field to TrackerEntry to track live bound instrument handles. Entries with bound_count > 0 are never evicted during delta collection, ensuring bound handles always point to a live tracker. Moved from open-telemetry#3420 per review feedback — bound_count belongs with the bound instruments feature, not the delta collection refactor.
cijothomas
reviewed
Mar 15, 2026
0e888e0 to
1f5fd4c
Compare
Contributor
Author
|
CI integration test failure ( |
Contributor
Author
|
Opened #3424 to fix the flaky integration test that showed up in CI here. |
lalitb
reviewed
Mar 23, 2026
lalitb
reviewed
Mar 23, 2026
cijothomas
reviewed
Apr 19, 2026
…ace iteration Replace the two-HashMap swap-and-drain pattern in collect_and_reset with in-place iteration using TrackerEntry status tracking. TrackerEntry wraps each aggregator with has_been_updated (AtomicBool) and bound_count (AtomicUsize) fields. collect_and_reset now: - Iterates under a read lock (no write lock in steady state) - Exports only entries updated since last collection - Evicts stale unbound entries under a write lock with TOCTOU re-check A new drain_and_reset method preserves the old map-clearing behavior for Observable/async instruments that need staleness detection. This eliminates O(n) write-lock acquisitions on the hot path per collection cycle when attribute sets are reused (the common case). Splits from open-telemetry#3392. Refs open-telemetry#2328.
Per review feedback, bound_count belongs in the bound instruments PR since it's not used by the delta collection refactor alone.
Remove redundant has_no_attribute_value field from ValueMap — the no_attribute_tracker.has_been_updated flag now serves the same purpose. Rewrite changelog entry to focus on user-facing behavior: cardinality overflow recovery now requires 2 collect cycles.
… HashMap keys When a measurement arrives with attributes in non-sorted order, measure() inserts two keys into the trackers HashMap for the same logical attribute set (original order + sorted canonical order), both pointing to the same Arc<TrackerEntry>. Adds a targeted test for collect_and_reset stale eviction confirming that both keys are removed, leaving no zombie entries in the map.
485f6e6 to
4b47197
Compare
The no_attribute_tracker correctly used Release/Acquire, but per-attribute trackers used Relaxed. Since concurrent measure() and collect_and_reset() can both hold RwLock read locks (which don't synchronize with each other), the flag is the sole ordering point for aggregator data visibility. Relaxed is insufficient on weakly-ordered architectures (ARM).
bryantbiggs
added a commit
to bryantbiggs/opentelemetry-rust
that referenced
this pull request
Apr 25, 2026
Add bound_count field to TrackerEntry to track live bound instrument handles. Entries with bound_count > 0 are never evicted during delta collection, ensuring bound handles always point to a live tracker. Moved from open-telemetry#3420 per review feedback — bound_count belongs with the bound instruments feature, not the delta collection refactor.
bryantbiggs
added a commit
to bryantbiggs/opentelemetry-rust
that referenced
this pull request
Apr 29, 2026
Add bound_count field to TrackerEntry to track live bound instrument handles. Entries with bound_count > 0 are never evicted during delta collection, ensuring bound handles always point to a live tracker. Moved from open-telemetry#3420 per review feedback — bound_count belongs with the bound instruments feature, not the delta collection refactor.
bryantbiggs
added a commit
to bryantbiggs/opentelemetry-rust
that referenced
this pull request
Apr 30, 2026
Add bound_count field to TrackerEntry to track live bound instrument handles. Entries with bound_count > 0 are never evicted during delta collection, ensuring bound handles always point to a live tracker. Moved from open-telemetry#3420 per review feedback — bound_count belongs with the bound instruments feature, not the delta collection refactor.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Splits from #3392, per maintainer feedback. Refs #2328.
Summary
collect_and_resetwith in-place iteration usingTrackerEntrystatus trackingTrackerEntrywraps each aggregator withhas_been_updated(AtomicBool) to track whether measurements occurred since the last collection cyclehas_no_attribute_value— theno_attribute_tracker.has_been_updatedflag now serves the same purposecollect_and_resetnow iterates under a read lock (no write lock in steady state), exports only updated entries, and evicts stale entries under a write lock with TOCTOU re-checkdrain_and_resetfor Observable/async instruments that need true staleness detection via map clearingUser-facing behavior change
Recovery from cardinality overflow now requires 2 collect cycles instead of 1 — the first cycle marks entries as stale (no updates), the second cycle evicts them. This trade-off enables the in-place iteration pattern that avoids write-lock contention in steady state.
Locking trade-offs (discussed with @cijothomas and @utpilla in #3392)
Test plan
counter_aggregation_overflow_deltatests updated for two-phase eviction semanticscollect_readonly)cargo fmt --all -- --checkpassescargo clippy -p opentelemetry_sdk --features metrics -- -D warningspasses