bench: windowed aggregation benchmarks + scalability tests (1–16 workers) by zzylol · Pull Request #231 · ProjectASAP/ASAPQuery

zzylol · 2026-03-25T10:52:48Z

Summary

Adds Criterion-based windowed aggregation benchmarks for the precompute engine
Adds scalability benchmark varying worker count from 1 to 16 to characterise throughput scaling
Makes throughput test multi-threaded for realistic load generation
Increases throughput test drain timeout to 120s for slower machines

This is PR D of 6 stacked PRs splitting #162

Stacking order:

PR A → main: Core engine (merge first)
PR B → PR A: E2E tests + sliding window fix
PR C → PR B: Multi-connector ingest + pane-based sliding window
PR D (this) → PR C: Benchmarks + scalability tests
PR E → PR D: Correctness tests + comparison test
PR F → PR E: Refactoring + benchmarking binary

Test plan

cargo bench runs windowed aggregation benchmarks without errors
Scalability benchmark outputs throughput numbers for 1–16 workers

🤖 Generated with Claude Code

Implements a single-node multi-threaded precompute engine as a new module and binary target within QueryEngineRust. The engine ingests Prometheus remote write samples, buffers them per-series with out-of-order handling, detects closed tumbling/sliding windows via event-time watermarks, feeds samples into accumulator wrappers for all existing sketch types, and emits PrecomputedOutput directly to the store. New modules: config, series_buffer, window_manager, accumulator_factory, series_router, worker, output_sink, and the PrecomputeEngine orchestrator. The binary supports embedded store + query HTTP server for single-process deployment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Handle top-level aggregation types (DatasketchesKLL, Sum, Min, Max, etc.) directly in the factory match, fixing the fallback to Sum that broke quantile queries. Also preserve the K parameter in KllAccumulatorUpdater::reset() instead of hardcoding 200. Add test_e2e_precompute binary that validates the full ingest -> precompute -> store -> query pipeline end-to-end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Sends 10 series × 100 samples in a single HTTP request to the raw-mode engine and verifies all 1000 samples land in the store. Prints client RTT and per-series e2e_latency_us at debug level. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Sends 1000 requests × 10000 samples (50 distinct series) to the raw-mode engine and polls until all samples are stored. Reports both send throughput and e2e throughput including drain time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…dows Previously each sample was assigned to only one window via window_start_for(), which is incorrect for sliding windows where window_size > slide_interval. Added window_starts_containing() that returns all window starts whose range covers the timestamp, and use it in the worker aggregation loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements a single-node multi-threaded precompute engine as a new module and binary target within QueryEngineRust. The engine ingests Prometheus remote write samples, buffers them per-series with out-of-order handling, detects closed tumbling/sliding windows via event-time watermarks, feeds samples into accumulator wrappers for all existing sketch types, and emits PrecomputedOutput directly to the store. New modules: config, series_buffer, window_manager, accumulator_factory, series_router, worker, output_sink, and the PrecomputeEngine orchestrator. The binary supports embedded store + query HTTP server for single-process deployment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Handle top-level aggregation types (DatasketchesKLL, Sum, Min, Max, etc.) directly in the factory match, fixing the fallback to Sum that broke quantile queries. Also preserve the K parameter in KllAccumulatorUpdater::reset() instead of hardcoding 200. Add test_e2e_precompute binary that validates the full ingest -> precompute -> store -> query pipeline end-to-end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…n doc Fix ghost accumulator bug where late samples passing the allowed_lateness check could create orphaned accumulators for already-closed windows. Add configurable LateDataPolicy (Drop/ForwardToStore) to control behavior. Drop prevents ghost windows; ForwardToStore emits mini-accumulators for query-time merge. Also add precompute engine design document. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extract format-agnostic routing logic into route_decoded_samples() helper and register a VictoriaMetrics remote write endpoint at /api/v1/import alongside the existing Prometheus endpoint at /api/v1/write. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Route each sample to exactly 1 pane (sub-window of size slide_interval) instead of W overlapping window accumulators. When a window closes, merge its constituent panes via AggregateCore::merge_with(). This reduces per-sample accumulator updates from W to 1 for sliding windows. - Add snapshot_accumulator() to AccumulatorUpdater trait (9 implementations) - Add pane_start_for(), panes_for_window(), slide_interval_ms() to WindowManager - Replace active_windows HashMap with active_panes BTreeMap in worker - Rewrite sample routing and window close logic with pane merging - Extract PrecomputeEngine to engine.rs, ingest handlers to ingest_handler.rs - Update design doc with pane-based architecture Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implements a single-node multi-threaded precompute engine as a new module and binary target within QueryEngineRust. The engine ingests Prometheus remote write samples, buffers them per-series with out-of-order handling, detects closed tumbling/sliding windows via event-time watermarks, feeds samples into accumulator wrappers for all existing sketch types, and emits PrecomputedOutput directly to the store. New modules: config, series_buffer, window_manager, accumulator_factory, series_router, worker, output_sink, and the PrecomputeEngine orchestrator. The binary supports embedded store + query HTTP server for single-process deployment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Handle top-level aggregation types (DatasketchesKLL, Sum, Min, Max, etc.) directly in the factory match, fixing the fallback to Sum that broke quantile queries. Also preserve the K parameter in KllAccumulatorUpdater::reset() instead of hardcoding 200. Add test_e2e_precompute binary that validates the full ingest -> precompute -> store -> query pipeline end-to-end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…n doc Fix ghost accumulator bug where late samples passing the allowed_lateness check could create orphaned accumulators for already-closed windows. Add configurable LateDataPolicy (Drop/ForwardToStore) to control behavior. Drop prevents ghost windows; ForwardToStore emits mini-accumulators for query-time merge. Also add precompute engine design document. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extract format-agnostic routing logic into route_decoded_samples() helper and register a VictoriaMetrics remote write endpoint at /api/v1/import alongside the existing Prometheus endpoint at /api/v1/write. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add tumbling and sliding window throughput benchmarks to E2E test (Tumbling 10s, Sliding 30s/10s W=3, Sliding 60s/10s W=6) - Each benchmark uses NoopOutputSink to isolate worker throughput - Fix cargo fmt formatting in window_manager.rs and worker.rs - Pin sketchlib-rust to rev 663a1df for Chapter enum compatibility - Fix CMS updater field access (self.acc.inner.{row_num,col_num,sketch}) - Fix SimpleEngine::new missing promsketch_store arg in E2E test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Parameterize run_single_bench with num_workers and add run_scalability_benchmark() that measures throughput across 1, 2, 4, 8, 16 workers with sliding 30s/10s Sum (W=3). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use tokio::task::spawn_blocking to build request payloads in parallel across 8 threads, and send HTTP requests concurrently via 8 tokio tasks (matching the pattern used in run_single_bench). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The 10M-sample throughput test needs ~94s to fully drain through the store, exceeding the previous 60s deadline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Keep PR's benchmark additions (windowed aggregation benchmarks, scalability tests, multi-threaded throughput test, 120s drain timeout) while incorporating main's pane-based sliding window architecture updates in the design doc. Also fix duplicate [[bin]] entries in Cargo.toml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove extra `None` arg from SimpleEngine::new (PromSketchStore param was removed on main) - Fix AggregationConfig::new call to match current 17-arg signature (window fields are now direct values, not Option wrappers) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The function takes 17 args — was passing only 16 (missing the value_column: Option<String> parameter). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Use .div_ceil() instead of manual ceiling division - Allow too_many_arguments on run_single_bench Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

zzylol force-pushed the pr/D-benchmarks-scalability branch from 4c740c5 to a145bd7 Compare March 25, 2026 11:32

zzylol force-pushed the pr/C-multi-connector-pane-sliding-window branch from 07e69dc to 439dcf2 Compare March 25, 2026 11:32

zzylol requested a review from milindsrivastava1997 March 25, 2026 14:44

zzylol and others added 18 commits March 31, 2026 14:32

update

b749ac3

fix: remove duplicate [[bin]] entries in Cargo.toml from rebase merge

8df4927

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

style: apply cargo fmt

3ea0d58

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: use std::io::Error::other per clippy lint

bf4c213

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: remove unnecessary cast per clippy lint

b7e1b02

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Minor wording fix in precompute engine design doc

fc34f14

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Clarify single-machine multi-threaded architecture in design doc

10e4161

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Run cargo fmt

e903783

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zzylol force-pushed the pr/C-multi-connector-pane-sliding-window branch from 439dcf2 to e5b6abc Compare March 31, 2026 20:25

zzylol and others added 8 commits March 31, 2026 15:27

Add scalability benchmark varying worker count (1-16)

6797dde

Parameterize run_single_bench with num_workers and add run_scalability_benchmark() that measures throughput across 1, 2, 4, 8, 16 workers with sliding 30s/10s Sum (W=3). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Increase throughput test drain timeout from 60s to 120s

1648f1e

The 10M-sample throughput test needs ~94s to fully drain through the store, exceeding the previous 60s deadline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zzylol force-pushed the pr/D-benchmarks-scalability branch from a145bd7 to 1648f1e Compare March 31, 2026 20:30

Base automatically changed from pr/C-multi-connector-pane-sliding-window to main April 6, 2026 14:10

zz_y and others added 4 commits April 6, 2026 12:43

fix: add missing value_column arg to AggregationConfig::new

a84ccd7

The function takes 17 args — was passing only 16 (missing the value_column: Option<String> parameter). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: resolve clippy lints in test_e2e_precompute

72ad226

- Use .div_ceil() instead of manual ceiling division - Allow too_many_arguments on run_single_bench Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

milindsrivastava1997 approved these changes Apr 6, 2026

View reviewed changes

milindsrivastava1997 merged commit 9ce1570 into main Apr 6, 2026
6 checks passed

milindsrivastava1997 deleted the pr/D-benchmarks-scalability branch April 6, 2026 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: windowed aggregation benchmarks + scalability tests (1–16 workers)#231

bench: windowed aggregation benchmarks + scalability tests (1–16 workers)#231
milindsrivastava1997 merged 30 commits intomainfrom
pr/D-benchmarks-scalability

zzylol commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zzylol commented Mar 25, 2026

Summary

This is PR D of 6 stacked PRs splitting #162

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants