Skip to content

perf(metrics): partition ValueMap into per-thread shards to reduce lock contention#3473

Open
souyang-dev wants to merge 4 commits intoopen-telemetry:mainfrom
souyang-dev:metrics-lock-rebase
Open

perf(metrics): partition ValueMap into per-thread shards to reduce lock contention#3473
souyang-dev wants to merge 4 commits intoopen-telemetry:mainfrom
souyang-dev:metrics-lock-rebase

Conversation

@souyang-dev
Copy link
Copy Markdown

@souyang-dev souyang-dev commented Apr 28, 2026

Metrics ValueMap is protected by a RwLock. It can be a source of contention under high concurrency. What's more, the cost to update a histogram is even higher because you need to grab a mutex lock after the read lock before you can update a bucket.
There is a previous attempt to shard the ValueMap (#1564), but that PR partitions metrics updates based on attribute-sets, therefore updates to the same attribute-set still land on the same shard, so it's not effective in reducing locking contention when most updates are focused on a small set of attributes.

Fixes #
This PR partitions the single HashMap into shards. Each thread update different shards based on thread id to minimize locking contention. If the number of shards >= number of CPUs you won't see locking contentions when updating metrics.

Here are Benchcmark results (metrics_histogram, metrics) with AMD EPYC 9555 CPUs (255 CPU threads).
The numbers in the tables are updates/second in millions. "shards=16" is this PR with 16 shards, "shards=32" is this PR with 32 shards.

Counter update throughput (cargo run --release --package stress --bin metrics -- N)

number of threads 1 2 4 8 16
upstream 10 5.7 6.8 6 5.5
shards=16 10 22 32 40 43
shards=32 10 22 31 48 100

Histogram update throughput (cargo run --release --package stress --bin metrics_histogram -- N)

number of threads 1 2 4 8 16
upstream 9.5 4.8 5.2 6 5.5
shards=16 9.5 19 28 40 42
shards=32 9.5 19 28 45 84

Changes

Two changes are made:
1, Partition ValueMap into shards and let different threads update different shards to minimize locking contention.
2, Implement merge function for Aggregators, so collector can merge the shards.

Merge requirement checklist

  • [* ] CONTRIBUTING guidelines followed
  • [* ] Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@souyang-dev souyang-dev requested a review from a team as a code owner April 28, 2026 21:22
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Apr 28, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 93.90756% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.9%. Comparing base (c992379) to head (ff20752).

Files with missing lines Patch % Lines
...-sdk/src/metrics/internal/exponential_histogram.rs 92.5% 19 Missing ⚠️
opentelemetry-sdk/src/metrics/internal/mod.rs 94.9% 10 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #3473     +/-   ##
=======================================
+ Coverage   83.7%   83.9%   +0.1%     
=======================================
  Files        126     126             
  Lines      25386   25784    +398     
=======================================
+ Hits       21255   21635    +380     
- Misses      4131    4149     +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@souyang-dev souyang-dev force-pushed the metrics-lock-rebase branch 2 times, most recently from 30be6f1 to be5ab0b Compare April 28, 2026 22:02
@souyang-dev souyang-dev changed the title Partition ValueMap into per-thread shards to increase metrics update … perf(metrics): partition ValueMap into per-thread shards to reduce lock contention Apr 28, 2026
@souyang-dev souyang-dev force-pushed the metrics-lock-rebase branch 4 times, most recently from 7ab756e to 072ba19 Compare April 28, 2026 22:37
Comment thread opentelemetry-sdk/src/metrics/internal/mod.rs Outdated
Comment thread opentelemetry-sdk/src/metrics/internal/mod.rs Outdated
@souyang-dev souyang-dev force-pushed the metrics-lock-rebase branch 3 times, most recently from f7aa92f to 6af0215 Compare April 29, 2026 01:28
@scottgerring scottgerring reopened this Apr 30, 2026
count: AtomicUsize::new(0),
config,
cardinality_limit,
all_shards_cardinality_limit: per_shard_capacity.saturating_mul(n),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @cijothomas because metrics is really his area :) !

I'm not sure about this part but it might be a gap in my mental model. The metrics SDK spec defines aggregation_cardinality_limit as "the maximum number of data points allowed to be emitted in a collection cycle by a single instrument" and says that, for a given metric, the cardinality limit is a "hard limit" on the number of metric points collected during a collection cycle:

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#cardinality-limits

It looks like with this change, the effective limit becomes roughly configured_limit * num_shards where num_shards is the smaller of CPU count or 64, so a configured limit of 2000 can produce many more than 2000 points.

I think sharding needs to remain an internal implementation detail and preserve the configured stream cardinality limit. The user-visible impact is that a limit chosen to cap SDK memory usage and downstream time-series cardinality would no longer provide that cap; the same configuration could produce very different numbers of exported
series depending on CPU count.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous shard attempts were also blocked on this same problem. It is non-trivial to solve with sharding approach, unfortunately. We'd need to solve contention issue differently where consciously opts in to this model. (I'll write what I have in my mind soon)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @scottgerring for the comments.
I have updated the patch to enforce cardinality limit across all shards.

@scottgerring scottgerring requested a review from cijothomas April 30, 2026 14:55
@souyang-dev souyang-dev force-pushed the metrics-lock-rebase branch 3 times, most recently from 46111fc to f85e882 Compare May 1, 2026 00:36
Comment thread opentelemetry-sdk/src/metrics/internal/last_value.rs
@souyang-dev souyang-dev force-pushed the metrics-lock-rebase branch from afaf92e to f787b10 Compare May 1, 2026 04:57
@souyang-dev souyang-dev requested a review from cijothomas May 1, 2026 05:14
Comment thread opentelemetry-sdk/src/metrics/internal/mod.rs Outdated
Comment thread opentelemetry-sdk/src/metrics/internal/exponential_histogram.rs Outdated
@souyang-dev souyang-dev force-pushed the metrics-lock-rebase branch from f787b10 to 9640833 Compare May 4, 2026 05:48
…ck contention

Two changes are made:
1, Partition ValueMap into shards and let different threads update different shards to minimize locking contention.
2, Implement merge function for Aggregators, so collector can merge the shards.
Comply with otel spec:  aggregation_cardinality_limit as "the maximum number of data points allowed to be emitted in a collection cycle by a single instrument".
Use only 1 shard if the aggregator needs to preserve order of update,
so all updates will be serialized.
@souyang-dev souyang-dev force-pushed the metrics-lock-rebase branch from 9640833 to 901e587 Compare May 4, 2026 20:28
@souyang-dev souyang-dev requested a review from lalitb May 4, 2026 20:29
1, hold a shard write lock to protect flag swap and delta collection.

2, downscale both src and dest histogram to the same scale before doing merge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants