Skip to content

bench: windowed aggregation benchmarks + scalability tests (1–16 workers)#231

Merged
milindsrivastava1997 merged 30 commits intomainfrom
pr/D-benchmarks-scalability
Apr 6, 2026
Merged

bench: windowed aggregation benchmarks + scalability tests (1–16 workers)#231
milindsrivastava1997 merged 30 commits intomainfrom
pr/D-benchmarks-scalability

Conversation

@zzylol
Copy link
Copy Markdown
Contributor

@zzylol zzylol commented Mar 25, 2026

Summary

  • Adds Criterion-based windowed aggregation benchmarks for the precompute engine
  • Adds scalability benchmark varying worker count from 1 to 16 to characterise throughput scaling
  • Makes throughput test multi-threaded for realistic load generation
  • Increases throughput test drain timeout to 120s for slower machines

This is PR D of 6 stacked PRs splitting #162

Stacking order:

  • PR A → main: Core engine (merge first)
  • PR B → PR A: E2E tests + sliding window fix
  • PR C → PR B: Multi-connector ingest + pane-based sliding window
  • PR D (this) → PR C: Benchmarks + scalability tests
  • PR E → PR D: Correctness tests + comparison test
  • PR F → PR E: Refactoring + benchmarking binary

Test plan

  • cargo bench runs windowed aggregation benchmarks without errors
  • Scalability benchmark outputs throughput numbers for 1–16 workers

🤖 Generated with Claude Code

@zzylol zzylol force-pushed the pr/D-benchmarks-scalability branch from 4c740c5 to a145bd7 Compare March 25, 2026 11:32
@zzylol zzylol force-pushed the pr/C-multi-connector-pane-sliding-window branch from 07e69dc to 439dcf2 Compare March 25, 2026 11:32
zzylol and others added 18 commits March 31, 2026 14:32
Implements a single-node multi-threaded precompute engine as a new module
and binary target within QueryEngineRust. The engine ingests Prometheus
remote write samples, buffers them per-series with out-of-order handling,
detects closed tumbling/sliding windows via event-time watermarks, feeds
samples into accumulator wrappers for all existing sketch types, and emits
PrecomputedOutput directly to the store.

New modules: config, series_buffer, window_manager, accumulator_factory,
series_router, worker, output_sink, and the PrecomputeEngine orchestrator.
The binary supports embedded store + query HTTP server for single-process
deployment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Handle top-level aggregation types (DatasketchesKLL, Sum, Min, Max,
etc.) directly in the factory match, fixing the fallback to Sum that
broke quantile queries. Also preserve the K parameter in
KllAccumulatorUpdater::reset() instead of hardcoding 200.

Add test_e2e_precompute binary that validates the full ingest ->
precompute -> store -> query pipeline end-to-end.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sends 10 series × 100 samples in a single HTTP request to the raw-mode
engine and verifies all 1000 samples land in the store. Prints client
RTT and per-series e2e_latency_us at debug level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sends 1000 requests × 10000 samples (50 distinct series) to the
raw-mode engine and polls until all samples are stored. Reports both
send throughput and e2e throughput including drain time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dows

Previously each sample was assigned to only one window via
window_start_for(), which is incorrect for sliding windows where
window_size > slide_interval. Added window_starts_containing() that
returns all window starts whose range covers the timestamp, and use
it in the worker aggregation loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements a single-node multi-threaded precompute engine as a new module
and binary target within QueryEngineRust. The engine ingests Prometheus
remote write samples, buffers them per-series with out-of-order handling,
detects closed tumbling/sliding windows via event-time watermarks, feeds
samples into accumulator wrappers for all existing sketch types, and emits
PrecomputedOutput directly to the store.

New modules: config, series_buffer, window_manager, accumulator_factory,
series_router, worker, output_sink, and the PrecomputeEngine orchestrator.
The binary supports embedded store + query HTTP server for single-process
deployment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Handle top-level aggregation types (DatasketchesKLL, Sum, Min, Max,
etc.) directly in the factory match, fixing the fallback to Sum that
broke quantile queries. Also preserve the K parameter in
KllAccumulatorUpdater::reset() instead of hardcoding 200.

Add test_e2e_precompute binary that validates the full ingest ->
precompute -> store -> query pipeline end-to-end.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n doc

Fix ghost accumulator bug where late samples passing the allowed_lateness
check could create orphaned accumulators for already-closed windows. Add
configurable LateDataPolicy (Drop/ForwardToStore) to control behavior.
Drop prevents ghost windows; ForwardToStore emits mini-accumulators for
query-time merge. Also add precompute engine design document.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract format-agnostic routing logic into route_decoded_samples() helper
and register a VictoriaMetrics remote write endpoint at /api/v1/import
alongside the existing Prometheus endpoint at /api/v1/write.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Route each sample to exactly 1 pane (sub-window of size slide_interval)
instead of W overlapping window accumulators. When a window closes, merge
its constituent panes via AggregateCore::merge_with(). This reduces
per-sample accumulator updates from W to 1 for sliding windows.

- Add snapshot_accumulator() to AccumulatorUpdater trait (9 implementations)
- Add pane_start_for(), panes_for_window(), slide_interval_ms() to WindowManager
- Replace active_windows HashMap with active_panes BTreeMap in worker
- Rewrite sample routing and window close logic with pane merging
- Extract PrecomputeEngine to engine.rs, ingest handlers to ingest_handler.rs
- Update design doc with pane-based architecture

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zzylol zzylol force-pushed the pr/C-multi-connector-pane-sliding-window branch from 439dcf2 to e5b6abc Compare March 31, 2026 20:25
zzylol and others added 8 commits March 31, 2026 15:27
Implements a single-node multi-threaded precompute engine as a new module
and binary target within QueryEngineRust. The engine ingests Prometheus
remote write samples, buffers them per-series with out-of-order handling,
detects closed tumbling/sliding windows via event-time watermarks, feeds
samples into accumulator wrappers for all existing sketch types, and emits
PrecomputedOutput directly to the store.

New modules: config, series_buffer, window_manager, accumulator_factory,
series_router, worker, output_sink, and the PrecomputeEngine orchestrator.
The binary supports embedded store + query HTTP server for single-process
deployment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Handle top-level aggregation types (DatasketchesKLL, Sum, Min, Max,
etc.) directly in the factory match, fixing the fallback to Sum that
broke quantile queries. Also preserve the K parameter in
KllAccumulatorUpdater::reset() instead of hardcoding 200.

Add test_e2e_precompute binary that validates the full ingest ->
precompute -> store -> query pipeline end-to-end.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n doc

Fix ghost accumulator bug where late samples passing the allowed_lateness
check could create orphaned accumulators for already-closed windows. Add
configurable LateDataPolicy (Drop/ForwardToStore) to control behavior.
Drop prevents ghost windows; ForwardToStore emits mini-accumulators for
query-time merge. Also add precompute engine design document.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract format-agnostic routing logic into route_decoded_samples() helper
and register a VictoriaMetrics remote write endpoint at /api/v1/import
alongside the existing Prometheus endpoint at /api/v1/write.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add tumbling and sliding window throughput benchmarks to E2E test
  (Tumbling 10s, Sliding 30s/10s W=3, Sliding 60s/10s W=6)
- Each benchmark uses NoopOutputSink to isolate worker throughput
- Fix cargo fmt formatting in window_manager.rs and worker.rs
- Pin sketchlib-rust to rev 663a1df for Chapter enum compatibility
- Fix CMS updater field access (self.acc.inner.{row_num,col_num,sketch})
- Fix SimpleEngine::new missing promsketch_store arg in E2E test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Parameterize run_single_bench with num_workers and add
run_scalability_benchmark() that measures throughput across
1, 2, 4, 8, 16 workers with sliding 30s/10s Sum (W=3).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use tokio::task::spawn_blocking to build request payloads in parallel
across 8 threads, and send HTTP requests concurrently via 8 tokio tasks
(matching the pattern used in run_single_bench).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 10M-sample throughput test needs ~94s to fully drain through the
store, exceeding the previous 60s deadline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zzylol zzylol force-pushed the pr/D-benchmarks-scalability branch from a145bd7 to 1648f1e Compare March 31, 2026 20:30
Base automatically changed from pr/C-multi-connector-pane-sliding-window to main April 6, 2026 14:10
zz_y and others added 4 commits April 6, 2026 12:43
Keep PR's benchmark additions (windowed aggregation benchmarks,
scalability tests, multi-threaded throughput test, 120s drain timeout)
while incorporating main's pane-based sliding window architecture
updates in the design doc. Also fix duplicate [[bin]] entries in
Cargo.toml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove extra `None` arg from SimpleEngine::new (PromSketchStore
  param was removed on main)
- Fix AggregationConfig::new call to match current 17-arg signature
  (window fields are now direct values, not Option wrappers)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The function takes 17 args — was passing only 16 (missing the
value_column: Option<String> parameter).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use .div_ceil() instead of manual ceiling division
- Allow too_many_arguments on run_single_bench

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@milindsrivastava1997 milindsrivastava1997 merged commit 9ce1570 into main Apr 6, 2026
6 checks passed
@milindsrivastava1997 milindsrivastava1997 deleted the pr/D-benchmarks-scalability branch April 6, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants