Skip to content

Rename SimpleMapStore impls to Legacy* (PR #175 split, part a)#220

Merged
milindsrivastava1997 merged 1 commit intosimple_store_optfrom
rename-legacy-simple-map-store
Mar 23, 2026
Merged

Rename SimpleMapStore impls to Legacy* (PR #175 split, part a)#220
milindsrivastava1997 merged 1 commit intosimple_store_optfrom
rename-legacy-simple-map-store

Conversation

@zzylol
Copy link
Copy Markdown
Contributor

@zzylol zzylol commented Mar 23, 2026

Summary

Split from #175 per reviewer request: this PR contains only the renames and corresponding bench changes.

  • Move SimpleMapStoreGlobalLegacySimpleMapStoreGlobal into simple_map_store/legacy/global.rs
  • Move SimpleMapStorePerKeyLegacySimpleMapStorePerKey into simple_map_store/legacy/per_key.rs
  • Add simple_map_store/legacy/mod.rs re-exporting both types
  • Update simple_map_store/mod.rs: SimpleMapStore enum now wraps the Legacy* types via the new legacy submodule
  • Update benches/simple_store_bench.rs doc comment to reflect it benchmarks the legacy store

Public API (SimpleMapStore, Store) is unchanged — all existing callers compile without modification.

Part (b) (the new optimised inverted-index implementation) will follow in a separate PR.

Test plan

  • cargo check -p query_engine_rust passes
  • cargo test -p query_engine_rust passes (store correctness contract tests from Store correctness contract test suite #212)
  • Existing benchmark invocation cargo bench -p query_engine_rust --bench simple_store_bench still works

🤖 Generated with Claude Code

Move `SimpleMapStoreGlobal` → `LegacySimpleMapStoreGlobal` and
`SimpleMapStorePerKey` → `LegacySimpleMapStorePerKey` under a new
`simple_map_store/legacy/` submodule, in preparation for introducing
optimised replacements that will reclaim the original names (PR #175 part b).

- `legacy/global.rs` / `legacy/per_key.rs`: original implementations,
  renamed with the `Legacy` prefix throughout (struct, impl, log messages)
- `legacy/mod.rs`: re-exports both legacy types
- `simple_map_store/mod.rs`: references legacy module; `SimpleMapStore`
  enum now wraps `LegacySimpleMapStoreGlobal` / `LegacySimpleMapStorePerKey`
- `benches/simple_store_bench.rs`: doc comment updated to reflect that
  the bench profiles the legacy store implementation

Public API (`SimpleMapStore`, `Store`) is unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@milindsrivastava1997 milindsrivastava1997 merged commit c92cc94 into simple_store_opt Mar 23, 2026
@milindsrivastava1997 milindsrivastava1997 deleted the rename-legacy-simple-map-store branch March 23, 2026 22:19
zzylol added a commit that referenced this pull request Mar 24, 2026
#221)

* Rename SimpleMapStore impls to Legacy* and move to legacy/ submodule (#220)

Move `SimpleMapStoreGlobal` → `LegacySimpleMapStoreGlobal` and
`SimpleMapStorePerKey` → `LegacySimpleMapStorePerKey` under a new
`simple_map_store/legacy/` submodule, in preparation for introducing
optimised replacements that will reclaim the original names (PR #175 part b).

- `legacy/global.rs` / `legacy/per_key.rs`: original implementations,
  renamed with the `Legacy` prefix throughout (struct, impl, log messages)
- `legacy/mod.rs`: re-exports both legacy types
- `simple_map_store/mod.rs`: references legacy module; `SimpleMapStore`
  enum now wraps `LegacySimpleMapStoreGlobal` / `LegacySimpleMapStorePerKey`
- `benches/simple_store_bench.rs`: doc comment updated to reflect that
  the bench profiles the legacy store implementation

Public API (`SimpleMapStore`, `Store`) is unchanged.

Co-authored-by: zz_y <zz_y@node0.zz-y-296227.softmeasure-pg0.wisc.cloudlab.us>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix cargo fmt formatting in simple_map_store/mod.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: zz_y <zz_y@node0.zz-y-296227.softmeasure-pg0.wisc.cloudlab.us>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
zzylol added a commit that referenced this pull request Mar 24, 2026
…220)

Move `SimpleMapStoreGlobal` → `LegacySimpleMapStoreGlobal` and
`SimpleMapStorePerKey` → `LegacySimpleMapStorePerKey` under a new
`simple_map_store/legacy/` submodule, in preparation for introducing
optimised replacements that will reclaim the original names (PR #175 part b).

- `legacy/global.rs` / `legacy/per_key.rs`: original implementations,
  renamed with the `Legacy` prefix throughout (struct, impl, log messages)
- `legacy/mod.rs`: re-exports both legacy types
- `simple_map_store/mod.rs`: references legacy module; `SimpleMapStore`
  enum now wraps `LegacySimpleMapStoreGlobal` / `LegacySimpleMapStorePerKey`
- `benches/simple_store_bench.rs`: doc comment updated to reflect that
  the bench profiles the legacy store implementation

Public API (`SimpleMapStore`, `Store`) is unchanged.

Co-authored-by: zz_y <zz_y@node0.zz-y-296227.softmeasure-pg0.wisc.cloudlab.us>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
zzylol added a commit that referenced this pull request Mar 24, 2026
…220)

Move `SimpleMapStoreGlobal` → `LegacySimpleMapStoreGlobal` and
`SimpleMapStorePerKey` → `LegacySimpleMapStorePerKey` under a new
`simple_map_store/legacy/` submodule, in preparation for introducing
optimised replacements that will reclaim the original names (PR #175 part b).

- `legacy/global.rs` / `legacy/per_key.rs`: original implementations,
  renamed with the `Legacy` prefix throughout (struct, impl, log messages)
- `legacy/mod.rs`: re-exports both legacy types
- `simple_map_store/mod.rs`: references legacy module; `SimpleMapStore`
  enum now wraps `LegacySimpleMapStoreGlobal` / `LegacySimpleMapStorePerKey`
- `benches/simple_store_bench.rs`: doc comment updated to reflect that
  the bench profiles the legacy store implementation

Public API (`SimpleMapStore`, `Store`) is unchanged.

Co-authored-by: zz_y <zz_y@node0.zz-y-296227.softmeasure-pg0.wisc.cloudlab.us>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
milindsrivastava1997 pushed a commit that referenced this pull request Mar 27, 2026
* Rename SimpleMapStore impls to Legacy* and move to legacy/ submodule (#220)

Move `SimpleMapStoreGlobal` → `LegacySimpleMapStoreGlobal` and
`SimpleMapStorePerKey` → `LegacySimpleMapStorePerKey` under a new
`simple_map_store/legacy/` submodule, in preparation for introducing
optimised replacements that will reclaim the original names (PR #175 part b).

- `legacy/global.rs` / `legacy/per_key.rs`: original implementations,
  renamed with the `Legacy` prefix throughout (struct, impl, log messages)
- `legacy/mod.rs`: re-exports both legacy types
- `simple_map_store/mod.rs`: references legacy module; `SimpleMapStore`
  enum now wraps `LegacySimpleMapStoreGlobal` / `LegacySimpleMapStorePerKey`
- `benches/simple_store_bench.rs`: doc comment updated to reflect that
  the bench profiles the legacy store implementation

Public API (`SimpleMapStore`, `Store`) is unchanged.

Co-authored-by: zz_y <zz_y@node0.zz-y-296227.softmeasure-pg0.wisc.cloudlab.us>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix cargo fmt formatting in simple_map_store/mod.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add Store correctness contract test suite

Defines a run_contract_suite() function that tests every observable
behaviour of a Store implementation:

  - Empty-store edge cases (range query, exact query, earliest timestamp)
  - Single insert: range query hit/miss, exact query hit/miss (wrong start,
    wrong end)
  - Batch insert: count correctness, chronological ordering guaranteed
  - Partial range filtering (windows outside query range excluded)
  - Aggregation-ID isolation (inserts into agg 1 not visible to agg 2)
  - Earliest-timestamp tracking: global minimum, per agg-ID
  - Cleanup — CircularBuffer: oldest window evicted, newest 8 retained
  - Cleanup — ReadBased: evicted after threshold reads, unread window kept
  - Concurrency: 8-thread concurrent inserts (no data loss),
    8-thread concurrent reads (each returns full result set)

Two test entry points exercise both existing implementations:
  contract_per_key  — LockStrategy::PerKey (reference)
  contract_global   — LockStrategy::Global

Adding a new Store implementation requires only a new #[test] function
that calls run_contract_suite() with the new factory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Extend store contract tests: all accumulator types, keyed entries, DeltaSet exclusion

- Add SerializableToSink import so clone-fidelity tests compile on concrete types
- Clone fidelity tests for all 11 accumulator types: SumAccumulator,
  MinMaxAccumulator, DatasketchesKLLAccumulator, IncreaseAccumulator,
  MultipleSumAccumulator, MultipleMinMaxAccumulator, SetAggregatorAccumulator,
  DeltaSetAggregatorAccumulator, CountMinSketchAccumulator,
  CountMinSketchWithHeapAccumulator, HydraKllSketchAccumulator
- Three keyed-entry tests: grouping by key, coexistence of keyed/unkeyed, multiple keys per window
- DeltaSetAggregator cleanup exclusion test
- Concurrency tests: concurrent inserts (8 threads) and concurrent reads

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix cargo fmt violations in store_correctness_tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Remove unused assert_clone_fidelity function (clippy dead_code)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix extra blank line (cargo fmt)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Replace SimpleStore with inverted index (label -> BTreeMap<Time>)

* Expand INDEX_DESIGN.md with full theoretical complexity analysis

Add separate time and space complexity tables with named variables (A,
L, N, k, m, V), per-operation breakdowns, and a note on Arc-sharing
eliminating deep copies on the read path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Apply three VictoriaMetrics-inspired optimizations to SimpleMapStore

1. Label interning (MetricID: u32)
   Introduce InternTable in common.rs that assigns a compact u32 to each
   unique Option<KeyByLabelValues> on first insert. All internal maps
   (label_map, window_to_ids) use MetricID as key — O(1) hash/compare
   instead of O(label_bytes). Label strings stored once; resolved back to
   KeyByLabelValues only when building the returned TimestampedBucketsMap.

2. Time-epoch partitioning + O(1) rotation cleanup
   Replace the single BTreeMap-per-label with epoch-partitioned storage:
   BTreeMap<EpochID, EpochData>. epoch_capacity = num_aggregates_to_retain
   (set on first insert). When the current epoch reaches capacity, a new
   epoch is opened and the oldest is dropped — O(1) drop vs the previous
   O(k·m) BTreeSet walk + targeted BTreeMap removals. ReadBased cleanup
   scans read_counts then calls EpochData::remove_windows on each epoch.

3. Sorted Vec posting lists
   Replace HashSet<Option<KeyByLabelValues>> in window_to_labels with
   Vec<MetricID> maintained in sorted order via partition_point + insert.
   Cache-friendly iteration for exact queries; enables merge-intersection
   for future label-predicate pushdown.

Both per_key.rs and global.rs updated. Shared types extracted to common.rs.
Public Store trait interface and all 329 existing tests are unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix Dockerfile bench stub and apply cargo fmt

- Add dummy benches/simple_map_store_benchmark.rs in Dockerfile dep-caching
  layer so cargo can parse the manifest (bench entry in Cargo.toml)
- Run cargo fmt to fix import ordering and line-wrapping in common.rs,
  global.rs, and per_key.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add legacy store for benchmark comparison

- Save old time-primary SimpleMapStorePerKey as LegacySimpleMapStorePerKey
  (deprecated, kept only for benchmarking)
- Expose it as pub mod per_key_legacy in the store module
- Add old_vs_new/* benchmark group comparing legacy vs current on
  insert, range_query, exact_query, concurrent_reads, and scaling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix clippy errors: type_complexity, unused_mut, collapsible_if, deprecated

- Add MetricBucketMap type alias in common.rs to fix type_complexity
- Use MetricBucketMap in global.rs and per_key.rs
- Remove unused `mut` from `results` in global.rs
- Collapse nested `if` into single condition in per_key.rs
- Add #[allow(deprecated)] to impl blocks in per_key_legacy.rs
- Add #![allow(deprecated)] to benchmark file for legacy store usage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Three further index optimizations: time-primary scan, flat sealed epochs, O(1) flat storage

1. Time-primary range scan (common.rs MutableEpoch):
   - window_to_ids: HashMap → BTreeMap, enabling O(log N + actual_matches) range scan
   - Range queries no longer visit every label; only labels with data in matched windows
   - For high-cardinality sparse data: O(L·log N) → O(log N + actual_matches)

2. Immutable sealed epochs (common.rs SealedEpoch):
   - On epoch rotation, MutableEpoch is sealed into a flat sorted Vec<(TimestampRange, MetricID, Arc<Agg>)>
   - Sorted by (TimestampRange, MetricID): windows contiguous, cache-friendly linear scan
   - Range queries use partition_point (binary search) then linear scan — no pointer chasing
   - min_tr/max_tr precomputed for O(1) epoch-skip check

3. Flat primary storage (MutableEpoch):
   - label_map: HashMap<MetricID, BTreeMap<TimestampRange, Vec<Arc<Agg>>>> replaced by
     data: HashMap<(MetricID, TimestampRange), Vec<Arc<Agg>>>
   - O(1) insert and point lookup; no nested BTreeMap traversal

per_key.rs / global.rs updated:
   - StoreKeyData/PerKeyState now hold current_epoch: MutableEpoch + sealed_epochs: BTreeMap<EpochID, SealedEpoch>
   - maybe_rotate_epoch calls MutableEpoch::seal() and inserts into sealed_epochs
   - cleanup_read_based updated for current + sealed epoch split

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Make MutableEpoch insert O(1) amortized: append-only raw buffer

MutableEpoch is now a plain append-only Vec (no BTreeMap, no sorted Vec
maintenance during writes). All sorting is deferred to seal() — paid once
at epoch rotation, not on every insert. Mirrors VictoriaMetrics' rawRows
→ in-memory part pipeline.

Insert path: Vec::push + HashSet::insert + 2 scalar min/max updates = O(1).
seal() path: sort_unstable_by_key = O(M log M), called once per epoch.

Query on active epoch: linear scan O(M), bounded by epoch_capacity × L.
Acceptable since sealed epochs hold most historical data and use binary search.

Replace min_tr/max_tr (TimestampRange) with min_start/max_end (u64) in
both MutableEpoch and SealedEpoch for accurate epoch-skip bounds.
Callers updated to use time_bounds() -> Option<(u64, u64)>.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add window_to_ids exact-lookup index to MutableEpoch: O(M) → O(m)

MutableEpoch now maintains a secondary index:
  window_to_ids: HashMap<TimestampRange, Vec<(MetricID, Arc<Agg>)>>

On insert: Arc::clone into window_to_ids (refcount bump only, no data copy)
  + Vec::push — O(1) amortized, no change to insert complexity.

exact_query: HashMap lookup O(1) + iterate m entries O(m), no raw scan.
  Previously O(M) linear scan of all raw entries.

range_query_into: unchanged linear scan of raw (O(M), bounded).
seal(): unchanged — window_to_ids dropped, raw is sorted in-place.
remove_windows: also removes from window_to_ids.

Memory: one extra Arc pointer (8 bytes) per inserted entry — cheap.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* update

* refactor merge simple store benchmarks

* refactor move legacy simple stores into module

* refactor: optimize MutableEpoch insert path (columnar storage, lazy index, batch hoisting)

- Columnar storage: three parallel arrays (windows_col, metric_ids_col,
  aggregates_col) instead of row tuples; range scan hot loop touches only
  windows_col
- Lazy window_to_ids index: built on first exact_query after a write batch,
  invalidated cheaply (= None) on insert — zero index maintenance on hot path
- Monotonic ingest fast path: skip windows_set HashSet probe for consecutive
  same-window inserts (Opt 3)
- with_capacity hint: MutableEpoch::with_capacity uses previous epoch len to
  avoid Vec reallocation during next epoch fill
- Batch metadata hoisting in global/per_key: config lookup, metrics.insert,
  earliest_ts update, items_inserted count moved from per-item to per-group

Results: ~2x insert, ~3x range query (Sum), ~27x range query (KLL) vs legacy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: update INDEX_DESIGN to reflect columnar MutableEpoch/SealedEpoch design

Replace the old EpochData/BTreeMap description with the current implementation:
- MutableEpoch: columnar storage (3 parallel arrays), lazy window_to_ids index,
  monotonic ingest fast path, incremental bounds tracking
- SealedEpoch: flat sorted Vec with binary-search range/exact queries
- Updated complexity table (insert O(1) amortized, range O(log N + k) on sealed)
- Updated query mechanics to reflect write-lock for exact query (lazy index build)
- Remove asap-query-engine/docs/simple_store_insert_profile.md (profiling scratch doc)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: match legacy per-window CircularBuffer eviction semantics

The epoch-based rotation was dropping an entire epoch (epoch_capacity
windows) when the retention limit was exceeded. Legacy evicts exactly
(total - retention_limit) windows, which can be fewer than one full epoch.

Fix: after sealing, compute total distinct windows across all epochs in
O(E) using SealedEpoch::distinct_window_count() (precomputed at seal time,
updated in remove_windows). Evict the minimum number of oldest windows
from the oldest sealed epoch(s) needed to reach retention_limit, dropping
the epoch only when fully emptied.

Also adds the eviction check to the non-rotation path so the 9th window
(still in the current partial epoch) triggers eviction of the 1st window.

Fixes: contract_global and contract_per_key (both now pass).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: apply cargo fmt

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: extract GroupedBatch type alias to satisfy clippy::type_complexity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: restore legacy store files to simple_store_opt baseline

These files were modified during early development on this branch but
those changes were superseded by the new global.rs/per_key.rs stores.
Reset to simple_store_opt versions so they don't appear as unintended
changes in the PR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update legacy stores to return Arc<dyn AggregateCore> and fix mod ordering

Legacy global/per_key stores were returning Box from clone_boxed_core() but
TimestampedBucket now expects Arc. Add .into() at the 4 call sites. Also
reorder mod.rs to put `mod common` before `pub mod legacy` for cargo fmt.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: declare global and per_key as modules so common.rs items are reachable

Without pub mod global and pub mod per_key in mod.rs, the compiler sees all
types in common.rs as dead code and fails under -D warnings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: zz_y <zz_y@node0.zz-y-296227.softmeasure-pg0.wisc.cloudlab.us>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants