bench: windowed aggregation benchmarks + scalability tests (1–16 workers)#231
Merged
milindsrivastava1997 merged 30 commits intomainfrom Apr 6, 2026
Merged
bench: windowed aggregation benchmarks + scalability tests (1–16 workers)#231milindsrivastava1997 merged 30 commits intomainfrom
milindsrivastava1997 merged 30 commits intomainfrom
Conversation
4c740c5 to
a145bd7
Compare
07e69dc to
439dcf2
Compare
Implements a single-node multi-threaded precompute engine as a new module and binary target within QueryEngineRust. The engine ingests Prometheus remote write samples, buffers them per-series with out-of-order handling, detects closed tumbling/sliding windows via event-time watermarks, feeds samples into accumulator wrappers for all existing sketch types, and emits PrecomputedOutput directly to the store. New modules: config, series_buffer, window_manager, accumulator_factory, series_router, worker, output_sink, and the PrecomputeEngine orchestrator. The binary supports embedded store + query HTTP server for single-process deployment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Handle top-level aggregation types (DatasketchesKLL, Sum, Min, Max, etc.) directly in the factory match, fixing the fallback to Sum that broke quantile queries. Also preserve the K parameter in KllAccumulatorUpdater::reset() instead of hardcoding 200. Add test_e2e_precompute binary that validates the full ingest -> precompute -> store -> query pipeline end-to-end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sends 10 series × 100 samples in a single HTTP request to the raw-mode engine and verifies all 1000 samples land in the store. Prints client RTT and per-series e2e_latency_us at debug level. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sends 1000 requests × 10000 samples (50 distinct series) to the raw-mode engine and polls until all samples are stored. Reports both send throughput and e2e throughput including drain time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dows Previously each sample was assigned to only one window via window_start_for(), which is incorrect for sliding windows where window_size > slide_interval. Added window_starts_containing() that returns all window starts whose range covers the timestamp, and use it in the worker aggregation loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements a single-node multi-threaded precompute engine as a new module and binary target within QueryEngineRust. The engine ingests Prometheus remote write samples, buffers them per-series with out-of-order handling, detects closed tumbling/sliding windows via event-time watermarks, feeds samples into accumulator wrappers for all existing sketch types, and emits PrecomputedOutput directly to the store. New modules: config, series_buffer, window_manager, accumulator_factory, series_router, worker, output_sink, and the PrecomputeEngine orchestrator. The binary supports embedded store + query HTTP server for single-process deployment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Handle top-level aggregation types (DatasketchesKLL, Sum, Min, Max, etc.) directly in the factory match, fixing the fallback to Sum that broke quantile queries. Also preserve the K parameter in KllAccumulatorUpdater::reset() instead of hardcoding 200. Add test_e2e_precompute binary that validates the full ingest -> precompute -> store -> query pipeline end-to-end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n doc Fix ghost accumulator bug where late samples passing the allowed_lateness check could create orphaned accumulators for already-closed windows. Add configurable LateDataPolicy (Drop/ForwardToStore) to control behavior. Drop prevents ghost windows; ForwardToStore emits mini-accumulators for query-time merge. Also add precompute engine design document. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract format-agnostic routing logic into route_decoded_samples() helper and register a VictoriaMetrics remote write endpoint at /api/v1/import alongside the existing Prometheus endpoint at /api/v1/write. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Route each sample to exactly 1 pane (sub-window of size slide_interval) instead of W overlapping window accumulators. When a window closes, merge its constituent panes via AggregateCore::merge_with(). This reduces per-sample accumulator updates from W to 1 for sliding windows. - Add snapshot_accumulator() to AccumulatorUpdater trait (9 implementations) - Add pane_start_for(), panes_for_window(), slide_interval_ms() to WindowManager - Replace active_windows HashMap with active_panes BTreeMap in worker - Rewrite sample routing and window close logic with pane merging - Extract PrecomputeEngine to engine.rs, ingest handlers to ingest_handler.rs - Update design doc with pane-based architecture Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
439dcf2 to
e5b6abc
Compare
Implements a single-node multi-threaded precompute engine as a new module and binary target within QueryEngineRust. The engine ingests Prometheus remote write samples, buffers them per-series with out-of-order handling, detects closed tumbling/sliding windows via event-time watermarks, feeds samples into accumulator wrappers for all existing sketch types, and emits PrecomputedOutput directly to the store. New modules: config, series_buffer, window_manager, accumulator_factory, series_router, worker, output_sink, and the PrecomputeEngine orchestrator. The binary supports embedded store + query HTTP server for single-process deployment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Handle top-level aggregation types (DatasketchesKLL, Sum, Min, Max, etc.) directly in the factory match, fixing the fallback to Sum that broke quantile queries. Also preserve the K parameter in KllAccumulatorUpdater::reset() instead of hardcoding 200. Add test_e2e_precompute binary that validates the full ingest -> precompute -> store -> query pipeline end-to-end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n doc Fix ghost accumulator bug where late samples passing the allowed_lateness check could create orphaned accumulators for already-closed windows. Add configurable LateDataPolicy (Drop/ForwardToStore) to control behavior. Drop prevents ghost windows; ForwardToStore emits mini-accumulators for query-time merge. Also add precompute engine design document. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract format-agnostic routing logic into route_decoded_samples() helper and register a VictoriaMetrics remote write endpoint at /api/v1/import alongside the existing Prometheus endpoint at /api/v1/write. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add tumbling and sliding window throughput benchmarks to E2E test
(Tumbling 10s, Sliding 30s/10s W=3, Sliding 60s/10s W=6)
- Each benchmark uses NoopOutputSink to isolate worker throughput
- Fix cargo fmt formatting in window_manager.rs and worker.rs
- Pin sketchlib-rust to rev 663a1df for Chapter enum compatibility
- Fix CMS updater field access (self.acc.inner.{row_num,col_num,sketch})
- Fix SimpleEngine::new missing promsketch_store arg in E2E test
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Parameterize run_single_bench with num_workers and add run_scalability_benchmark() that measures throughput across 1, 2, 4, 8, 16 workers with sliding 30s/10s Sum (W=3). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use tokio::task::spawn_blocking to build request payloads in parallel across 8 threads, and send HTTP requests concurrently via 8 tokio tasks (matching the pattern used in run_single_bench). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 10M-sample throughput test needs ~94s to fully drain through the store, exceeding the previous 60s deadline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
a145bd7 to
1648f1e
Compare
Base automatically changed from
pr/C-multi-connector-pane-sliding-window
to
main
April 6, 2026 14:10
Keep PR's benchmark additions (windowed aggregation benchmarks, scalability tests, multi-threaded throughput test, 120s drain timeout) while incorporating main's pane-based sliding window architecture updates in the design doc. Also fix duplicate [[bin]] entries in Cargo.toml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove extra `None` arg from SimpleEngine::new (PromSketchStore param was removed on main) - Fix AggregationConfig::new call to match current 17-arg signature (window fields are now direct values, not Option wrappers) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The function takes 17 args — was passing only 16 (missing the value_column: Option<String> parameter). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use .div_ceil() instead of manual ceiling division - Allow too_many_arguments on run_single_bench Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
milindsrivastava1997
approved these changes
Apr 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is PR D of 6 stacked PRs splitting #162
Stacking order:
Test plan
cargo benchruns windowed aggregation benchmarks without errors🤖 Generated with Claude Code