Conversation
Adds six benchmark groups to measure the existing SimpleMapStore's algorithm complexity before the inverted-index replacement in PR #175: - insert/batch_size: O(B) insert scaling across 10–5000 items - insert/num_agg_ids: lock overhead across 1–200 aggregation IDs - query/range_store_size: O(W·log W + k) range query across 100–5000 windows - query/exact_store_size: O(1) HashMap lookup verified across store sizes - store_analyze/num_agg_ids: O(A) earliest-timestamp scan across 10–1000 IDs - concurrent_reads/thread_count: write-lock serialisation with 1–8 threads Both Global and PerKey lock strategies are profiled in each group. Results land in target/criterion/ as HTML reports. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Apply rustfmt to simple_store_bench.rs (alignment, closure brace style) - Add dummy benches/simple_store_bench.rs stub to Dockerfile dep-cache layer so cargo can parse the [[bench]] manifest entry during docker build Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Increase parameter ranges (batch: 100→50k, windows: 500→50k, agg IDs: 1→1k/5k, threads: 1→16) - Add DatasketchesKLLAccumulator k=200 variant to insert and range query benchmarks to expose realistic sketch clone cost - KLL results show ~10x overhead on range queries vs trivial SumAccumulator Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_bench Same fix as asap-query-engine/Dockerfile — the workspace Cargo.toml for asap-query-engine declares [[bench]] simple_store_bench, so Cargo requires the file to exist even during dependency-only builds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This was referenced Mar 20, 2026
milindsrivastava1997
approved these changes
Mar 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Criterion benchmark suite to profile the existing
SimpleMapStorebefore the replacement in #175. BothPerKeyandGloballock strategies are covered, and bothSumAccumulator(trivial baseline) andDatasketchesKLLAccumulatork=200 (realistic sketch) are benchmarked to expose clone cost during queries.Algorithm complexity
insert_precomputed_output_batchquery_precomputed_output(range)query_precomputed_output_exactget_earliest_timestamp(analyze)Space: O(A · W · L · sketch_size) — sketches are heap-allocated per entry, no sharing across windows.
Concurrency: both strategies serialize readers —
query_precomputed_outputtakes a write lock to updateread_counts, so concurrent reads degrade linearly with thread count.Benchmarks
insert/batch_sizesum,kllinsert/num_agg_idssumquery/range_store_sizesum,kllquery/exact_store_sizesumstore_analyze/num_agg_idssumconcurrent_reads/thread_countsumAccumulator types:
sum—SumAccumulator(singlef64, ~0 clone cost, lock/sort baseline)kll—DatasketchesKLLAccumulatork=200 (sketch populated with 20 values, realistic clone cost)How to run
cargo bench -p query_engine_rust --bench simple_store_bench # HTML reports → target/criterion/Benchmark results
Measured on this machine (Linux, optimized build). All times are median of 100 samples.
insert/batch_size — O(B), throughput roughly constant
sumat small batches, growing to ~2.5× at 50 000 items due to sketch construction cost (20 updates per sketch).insert/num_agg_ids — fixed 1 000-item batch, spread across N agg IDs
Per-key latency grows 2.6× from 1 → 1 000 agg IDs; global stays nearly flat (~1.6×). At 1 000 agg IDs global is 2.3× faster — each agg ID in per-key hits a separate DashMap shard and acquires a separate RwLock.
query/range_store_size — O(W·log W + k), KLL clone cost dominates
The
sumbaseline confirms O(W·log W) sorting behaviour. KLL adds a consistent ~10× overhead —clone_boxed_core()on a k=200 sketch dominates the query cost at every scale. With real sketches, range queries scale far worse than the sort alone suggests. Lock strategy makes no practical difference for range queries at any scale.query/exact_store_size — O(1) confirmed, flat regardless of store size
Flat ~290–305 ns across 500 → 50 000 windows. O(1) HashMap lookup confirmed. Lock strategy irrelevant.
store_analyze/num_agg_ids — global is 58–130× faster
Global maintains a pre-built
HashMapcloned under one Mutex in nanoseconds. PerKey must iterate all DashMap shards with per-shard locking — O(A) with high constant. The gap remains 2–3 orders of magnitude across all scales.concurrent_reads/thread_count — both strategies serialize (5 000 windows)
Both strategies degrade linearly with thread count (~2× per doubling, confirming full serialisation). The write lock taken during
query_precomputed_output(to updateread_counts) eliminates any concurrency benefit from either design. Global and per-key are indistinguishable at scale.Test plan
cargo build --benchespasses with no new warnings🤖 Generated with Claude Code