Rename SimpleMapStore impls to Legacy* (PR #175 split, part a)#220
Merged
milindsrivastava1997 merged 1 commit intosimple_store_optfrom Mar 23, 2026
Merged
Conversation
Move `SimpleMapStoreGlobal` → `LegacySimpleMapStoreGlobal` and `SimpleMapStorePerKey` → `LegacySimpleMapStorePerKey` under a new `simple_map_store/legacy/` submodule, in preparation for introducing optimised replacements that will reclaim the original names (PR #175 part b). - `legacy/global.rs` / `legacy/per_key.rs`: original implementations, renamed with the `Legacy` prefix throughout (struct, impl, log messages) - `legacy/mod.rs`: re-exports both legacy types - `simple_map_store/mod.rs`: references legacy module; `SimpleMapStore` enum now wraps `LegacySimpleMapStoreGlobal` / `LegacySimpleMapStorePerKey` - `benches/simple_store_bench.rs`: doc comment updated to reflect that the bench profiles the legacy store implementation Public API (`SimpleMapStore`, `Store`) is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
milindsrivastava1997
approved these changes
Mar 23, 2026
zzylol
added a commit
that referenced
this pull request
Mar 24, 2026
#221) * Rename SimpleMapStore impls to Legacy* and move to legacy/ submodule (#220) Move `SimpleMapStoreGlobal` → `LegacySimpleMapStoreGlobal` and `SimpleMapStorePerKey` → `LegacySimpleMapStorePerKey` under a new `simple_map_store/legacy/` submodule, in preparation for introducing optimised replacements that will reclaim the original names (PR #175 part b). - `legacy/global.rs` / `legacy/per_key.rs`: original implementations, renamed with the `Legacy` prefix throughout (struct, impl, log messages) - `legacy/mod.rs`: re-exports both legacy types - `simple_map_store/mod.rs`: references legacy module; `SimpleMapStore` enum now wraps `LegacySimpleMapStoreGlobal` / `LegacySimpleMapStorePerKey` - `benches/simple_store_bench.rs`: doc comment updated to reflect that the bench profiles the legacy store implementation Public API (`SimpleMapStore`, `Store`) is unchanged. Co-authored-by: zz_y <zz_y@node0.zz-y-296227.softmeasure-pg0.wisc.cloudlab.us> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix cargo fmt formatting in simple_map_store/mod.rs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: zz_y <zz_y@node0.zz-y-296227.softmeasure-pg0.wisc.cloudlab.us> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
zzylol
added a commit
that referenced
this pull request
Mar 24, 2026
…220) Move `SimpleMapStoreGlobal` → `LegacySimpleMapStoreGlobal` and `SimpleMapStorePerKey` → `LegacySimpleMapStorePerKey` under a new `simple_map_store/legacy/` submodule, in preparation for introducing optimised replacements that will reclaim the original names (PR #175 part b). - `legacy/global.rs` / `legacy/per_key.rs`: original implementations, renamed with the `Legacy` prefix throughout (struct, impl, log messages) - `legacy/mod.rs`: re-exports both legacy types - `simple_map_store/mod.rs`: references legacy module; `SimpleMapStore` enum now wraps `LegacySimpleMapStoreGlobal` / `LegacySimpleMapStorePerKey` - `benches/simple_store_bench.rs`: doc comment updated to reflect that the bench profiles the legacy store implementation Public API (`SimpleMapStore`, `Store`) is unchanged. Co-authored-by: zz_y <zz_y@node0.zz-y-296227.softmeasure-pg0.wisc.cloudlab.us> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
zzylol
added a commit
that referenced
this pull request
Mar 24, 2026
…220) Move `SimpleMapStoreGlobal` → `LegacySimpleMapStoreGlobal` and `SimpleMapStorePerKey` → `LegacySimpleMapStorePerKey` under a new `simple_map_store/legacy/` submodule, in preparation for introducing optimised replacements that will reclaim the original names (PR #175 part b). - `legacy/global.rs` / `legacy/per_key.rs`: original implementations, renamed with the `Legacy` prefix throughout (struct, impl, log messages) - `legacy/mod.rs`: re-exports both legacy types - `simple_map_store/mod.rs`: references legacy module; `SimpleMapStore` enum now wraps `LegacySimpleMapStoreGlobal` / `LegacySimpleMapStorePerKey` - `benches/simple_store_bench.rs`: doc comment updated to reflect that the bench profiles the legacy store implementation Public API (`SimpleMapStore`, `Store`) is unchanged. Co-authored-by: zz_y <zz_y@node0.zz-y-296227.softmeasure-pg0.wisc.cloudlab.us> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
milindsrivastava1997
pushed a commit
that referenced
this pull request
Mar 27, 2026
* Rename SimpleMapStore impls to Legacy* and move to legacy/ submodule (#220) Move `SimpleMapStoreGlobal` → `LegacySimpleMapStoreGlobal` and `SimpleMapStorePerKey` → `LegacySimpleMapStorePerKey` under a new `simple_map_store/legacy/` submodule, in preparation for introducing optimised replacements that will reclaim the original names (PR #175 part b). - `legacy/global.rs` / `legacy/per_key.rs`: original implementations, renamed with the `Legacy` prefix throughout (struct, impl, log messages) - `legacy/mod.rs`: re-exports both legacy types - `simple_map_store/mod.rs`: references legacy module; `SimpleMapStore` enum now wraps `LegacySimpleMapStoreGlobal` / `LegacySimpleMapStorePerKey` - `benches/simple_store_bench.rs`: doc comment updated to reflect that the bench profiles the legacy store implementation Public API (`SimpleMapStore`, `Store`) is unchanged. Co-authored-by: zz_y <zz_y@node0.zz-y-296227.softmeasure-pg0.wisc.cloudlab.us> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix cargo fmt formatting in simple_map_store/mod.rs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add Store correctness contract test suite Defines a run_contract_suite() function that tests every observable behaviour of a Store implementation: - Empty-store edge cases (range query, exact query, earliest timestamp) - Single insert: range query hit/miss, exact query hit/miss (wrong start, wrong end) - Batch insert: count correctness, chronological ordering guaranteed - Partial range filtering (windows outside query range excluded) - Aggregation-ID isolation (inserts into agg 1 not visible to agg 2) - Earliest-timestamp tracking: global minimum, per agg-ID - Cleanup — CircularBuffer: oldest window evicted, newest 8 retained - Cleanup — ReadBased: evicted after threshold reads, unread window kept - Concurrency: 8-thread concurrent inserts (no data loss), 8-thread concurrent reads (each returns full result set) Two test entry points exercise both existing implementations: contract_per_key — LockStrategy::PerKey (reference) contract_global — LockStrategy::Global Adding a new Store implementation requires only a new #[test] function that calls run_contract_suite() with the new factory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Extend store contract tests: all accumulator types, keyed entries, DeltaSet exclusion - Add SerializableToSink import so clone-fidelity tests compile on concrete types - Clone fidelity tests for all 11 accumulator types: SumAccumulator, MinMaxAccumulator, DatasketchesKLLAccumulator, IncreaseAccumulator, MultipleSumAccumulator, MultipleMinMaxAccumulator, SetAggregatorAccumulator, DeltaSetAggregatorAccumulator, CountMinSketchAccumulator, CountMinSketchWithHeapAccumulator, HydraKllSketchAccumulator - Three keyed-entry tests: grouping by key, coexistence of keyed/unkeyed, multiple keys per window - DeltaSetAggregator cleanup exclusion test - Concurrency tests: concurrent inserts (8 threads) and concurrent reads Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix cargo fmt violations in store_correctness_tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Remove unused assert_clone_fidelity function (clippy dead_code) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix extra blank line (cargo fmt) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Replace SimpleStore with inverted index (label -> BTreeMap<Time>) * Expand INDEX_DESIGN.md with full theoretical complexity analysis Add separate time and space complexity tables with named variables (A, L, N, k, m, V), per-operation breakdowns, and a note on Arc-sharing eliminating deep copies on the read path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Apply three VictoriaMetrics-inspired optimizations to SimpleMapStore 1. Label interning (MetricID: u32) Introduce InternTable in common.rs that assigns a compact u32 to each unique Option<KeyByLabelValues> on first insert. All internal maps (label_map, window_to_ids) use MetricID as key — O(1) hash/compare instead of O(label_bytes). Label strings stored once; resolved back to KeyByLabelValues only when building the returned TimestampedBucketsMap. 2. Time-epoch partitioning + O(1) rotation cleanup Replace the single BTreeMap-per-label with epoch-partitioned storage: BTreeMap<EpochID, EpochData>. epoch_capacity = num_aggregates_to_retain (set on first insert). When the current epoch reaches capacity, a new epoch is opened and the oldest is dropped — O(1) drop vs the previous O(k·m) BTreeSet walk + targeted BTreeMap removals. ReadBased cleanup scans read_counts then calls EpochData::remove_windows on each epoch. 3. Sorted Vec posting lists Replace HashSet<Option<KeyByLabelValues>> in window_to_labels with Vec<MetricID> maintained in sorted order via partition_point + insert. Cache-friendly iteration for exact queries; enables merge-intersection for future label-predicate pushdown. Both per_key.rs and global.rs updated. Shared types extracted to common.rs. Public Store trait interface and all 329 existing tests are unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix Dockerfile bench stub and apply cargo fmt - Add dummy benches/simple_map_store_benchmark.rs in Dockerfile dep-caching layer so cargo can parse the manifest (bench entry in Cargo.toml) - Run cargo fmt to fix import ordering and line-wrapping in common.rs, global.rs, and per_key.rs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add legacy store for benchmark comparison - Save old time-primary SimpleMapStorePerKey as LegacySimpleMapStorePerKey (deprecated, kept only for benchmarking) - Expose it as pub mod per_key_legacy in the store module - Add old_vs_new/* benchmark group comparing legacy vs current on insert, range_query, exact_query, concurrent_reads, and scaling Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix clippy errors: type_complexity, unused_mut, collapsible_if, deprecated - Add MetricBucketMap type alias in common.rs to fix type_complexity - Use MetricBucketMap in global.rs and per_key.rs - Remove unused `mut` from `results` in global.rs - Collapse nested `if` into single condition in per_key.rs - Add #[allow(deprecated)] to impl blocks in per_key_legacy.rs - Add #![allow(deprecated)] to benchmark file for legacy store usage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Three further index optimizations: time-primary scan, flat sealed epochs, O(1) flat storage 1. Time-primary range scan (common.rs MutableEpoch): - window_to_ids: HashMap → BTreeMap, enabling O(log N + actual_matches) range scan - Range queries no longer visit every label; only labels with data in matched windows - For high-cardinality sparse data: O(L·log N) → O(log N + actual_matches) 2. Immutable sealed epochs (common.rs SealedEpoch): - On epoch rotation, MutableEpoch is sealed into a flat sorted Vec<(TimestampRange, MetricID, Arc<Agg>)> - Sorted by (TimestampRange, MetricID): windows contiguous, cache-friendly linear scan - Range queries use partition_point (binary search) then linear scan — no pointer chasing - min_tr/max_tr precomputed for O(1) epoch-skip check 3. Flat primary storage (MutableEpoch): - label_map: HashMap<MetricID, BTreeMap<TimestampRange, Vec<Arc<Agg>>>> replaced by data: HashMap<(MetricID, TimestampRange), Vec<Arc<Agg>>> - O(1) insert and point lookup; no nested BTreeMap traversal per_key.rs / global.rs updated: - StoreKeyData/PerKeyState now hold current_epoch: MutableEpoch + sealed_epochs: BTreeMap<EpochID, SealedEpoch> - maybe_rotate_epoch calls MutableEpoch::seal() and inserts into sealed_epochs - cleanup_read_based updated for current + sealed epoch split Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Make MutableEpoch insert O(1) amortized: append-only raw buffer MutableEpoch is now a plain append-only Vec (no BTreeMap, no sorted Vec maintenance during writes). All sorting is deferred to seal() — paid once at epoch rotation, not on every insert. Mirrors VictoriaMetrics' rawRows → in-memory part pipeline. Insert path: Vec::push + HashSet::insert + 2 scalar min/max updates = O(1). seal() path: sort_unstable_by_key = O(M log M), called once per epoch. Query on active epoch: linear scan O(M), bounded by epoch_capacity × L. Acceptable since sealed epochs hold most historical data and use binary search. Replace min_tr/max_tr (TimestampRange) with min_start/max_end (u64) in both MutableEpoch and SealedEpoch for accurate epoch-skip bounds. Callers updated to use time_bounds() -> Option<(u64, u64)>. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add window_to_ids exact-lookup index to MutableEpoch: O(M) → O(m) MutableEpoch now maintains a secondary index: window_to_ids: HashMap<TimestampRange, Vec<(MetricID, Arc<Agg>)>> On insert: Arc::clone into window_to_ids (refcount bump only, no data copy) + Vec::push — O(1) amortized, no change to insert complexity. exact_query: HashMap lookup O(1) + iterate m entries O(m), no raw scan. Previously O(M) linear scan of all raw entries. range_query_into: unchanged linear scan of raw (O(M), bounded). seal(): unchanged — window_to_ids dropped, raw is sorted in-place. remove_windows: also removes from window_to_ids. Memory: one extra Arc pointer (8 bytes) per inserted entry — cheap. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * update * refactor merge simple store benchmarks * refactor move legacy simple stores into module * refactor: optimize MutableEpoch insert path (columnar storage, lazy index, batch hoisting) - Columnar storage: three parallel arrays (windows_col, metric_ids_col, aggregates_col) instead of row tuples; range scan hot loop touches only windows_col - Lazy window_to_ids index: built on first exact_query after a write batch, invalidated cheaply (= None) on insert — zero index maintenance on hot path - Monotonic ingest fast path: skip windows_set HashSet probe for consecutive same-window inserts (Opt 3) - with_capacity hint: MutableEpoch::with_capacity uses previous epoch len to avoid Vec reallocation during next epoch fill - Batch metadata hoisting in global/per_key: config lookup, metrics.insert, earliest_ts update, items_inserted count moved from per-item to per-group Results: ~2x insert, ~3x range query (Sum), ~27x range query (KLL) vs legacy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: update INDEX_DESIGN to reflect columnar MutableEpoch/SealedEpoch design Replace the old EpochData/BTreeMap description with the current implementation: - MutableEpoch: columnar storage (3 parallel arrays), lazy window_to_ids index, monotonic ingest fast path, incremental bounds tracking - SealedEpoch: flat sorted Vec with binary-search range/exact queries - Updated complexity table (insert O(1) amortized, range O(log N + k) on sealed) - Updated query mechanics to reflect write-lock for exact query (lazy index build) - Remove asap-query-engine/docs/simple_store_insert_profile.md (profiling scratch doc) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: match legacy per-window CircularBuffer eviction semantics The epoch-based rotation was dropping an entire epoch (epoch_capacity windows) when the retention limit was exceeded. Legacy evicts exactly (total - retention_limit) windows, which can be fewer than one full epoch. Fix: after sealing, compute total distinct windows across all epochs in O(E) using SealedEpoch::distinct_window_count() (precomputed at seal time, updated in remove_windows). Evict the minimum number of oldest windows from the oldest sealed epoch(s) needed to reach retention_limit, dropping the epoch only when fully emptied. Also adds the eviction check to the non-rotation path so the 9th window (still in the current partial epoch) triggers eviction of the 1st window. Fixes: contract_global and contract_per_key (both now pass). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: apply cargo fmt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: extract GroupedBatch type alias to satisfy clippy::type_complexity Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: restore legacy store files to simple_store_opt baseline These files were modified during early development on this branch but those changes were superseded by the new global.rs/per_key.rs stores. Reset to simple_store_opt versions so they don't appear as unintended changes in the PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update legacy stores to return Arc<dyn AggregateCore> and fix mod ordering Legacy global/per_key stores were returning Box from clone_boxed_core() but TimestampedBucket now expects Arc. Add .into() at the 4 call sites. Also reorder mod.rs to put `mod common` before `pub mod legacy` for cargo fmt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: declare global and per_key as modules so common.rs items are reachable Without pub mod global and pub mod per_key in mod.rs, the compiler sees all types in common.rs as dead code and fails under -D warnings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: zz_y <zz_y@node0.zz-y-296227.softmeasure-pg0.wisc.cloudlab.us> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Split from #175 per reviewer request: this PR contains only the renames and corresponding bench changes.
SimpleMapStoreGlobal→LegacySimpleMapStoreGlobalintosimple_map_store/legacy/global.rsSimpleMapStorePerKey→LegacySimpleMapStorePerKeyintosimple_map_store/legacy/per_key.rssimple_map_store/legacy/mod.rsre-exporting both typessimple_map_store/mod.rs:SimpleMapStoreenum now wraps theLegacy*types via the newlegacysubmodulebenches/simple_store_bench.rsdoc comment to reflect it benchmarks the legacy storePublic API (
SimpleMapStore,Store) is unchanged — all existing callers compile without modification.Part (b) (the new optimised inverted-index implementation) will follow in a separate PR.
Test plan
cargo check -p query_engine_rustpassescargo test -p query_engine_rustpasses (store correctness contract tests from Store correctness contract test suite #212)cargo bench -p query_engine_rust --bench simple_store_benchstill works🤖 Generated with Claude Code