refactor: eliminate code duplication in precompute engine + bench_precompute_sketch binary by zzylol · Pull Request #233 · ProjectASAP/ASAPQuery

zzylol · 2026-03-25T10:53:08Z

Summary

Refactors precompute engine to eliminate code and data duplication across worker/engine modules
Fixes design doc: corrects second-tier merge worker design (had correctness issues)
Adds bench_precompute_sketch binary for standalone precompute engine benchmarking
Adds integration test: e2e_precompute_equivalence in tests/

This is PR F of 6 stacked PRs splitting #162

Stacking order:

PR A → main: Core engine (merge first)
PR B → PR A: E2E tests + sliding window fix
PR C → PR B: Multi-connector ingest + pane-based sliding window
PR D → PR C: Benchmarks + scalability tests
PR E → PR D: Correctness tests + comparison test
PR F (this) → PR E: Refactoring + benchmarking binary (final)

Test plan

cargo build --bin bench_precompute_sketch succeeds
cargo test integration test in tests/e2e_precompute_equivalence.rs passes

🤖 Generated with Claude Code

accumulator_factory.rs: - Add impl_clone_accumulator_methods! macro to replace 16 identical take_accumulator/snapshot_accumulator bodies across 8 updater structs - Add config_is_keyed() free fn for static keyed check without allocating an updater (replaces throwaway-updater pattern used 4+ times) - Add kll_k_param(), cms_params(), hydra_kll_params() helpers to eliminate 4 copy-pasted parameter extraction blocks in the factory - Fix latent bug: SingleSubpopulation/KLL arm now checks "K" (capital) before "k", consistent with all other arms - Add tests: test_config_is_keyed, test_kll_k_param_capital_k worker.rs: - AggregationState.config: AggregationConfig -> Arc<AggregationConfig> eliminating N×M deep config copies across series×aggregations - Worker.agg_configs and Worker::new param updated to Arc map; matching_agg_configs returns owned Arc clones (removes lifetime tie) - Add apply_sample() free fn replacing 3 copies of is_keyed dispatch - Add merge_panes_for_window() free fn replacing identical ~40-line pane-merge loop in both process_samples and flush_all - Fix ForwardToStore path: is_keyed() and extract_key_from_series() each called once instead of twice engine.rs: - Wrap aggregation configs in Arc once at engine startup; per-worker agg_configs.clone() is now N pointer bumps, not N full deep copies Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Rewrites the "Optional second-tier merge workers" section to fix seven correctness issues identified in review: 1. Deadlock (critical): workers with no matching series never emitted WindowCompletion. Fix: introduce per_agg_watermark per worker, advanced by both data arrival and the flush timer (wall-clock floor), so every worker emits WindowCompletion for every aggregation on every flush cycle regardless of whether it has matching series. 2. WindowCompletion routing (critical): routing partials by (agg_id, key, window) while routing key-less completions by (agg_id, window) required an O(M) broadcast of completions to all merge workers. Fix: route ALL messages — partials and completions — by hash(agg_id, window_start, window_end) only, co-locating all keys for a window on one merge worker and eliminating the broadcast. 3. Conflicting struct definitions (critical): two contradictory WindowCompletion structs were present (one with key, one without). Fix: adopt a single key-less definition; explain the rationale. 4. Merge-tier watermark undefined (significant): PendingWindowState had no eviction bound. Fix: define merge_watermark as the min of worker_watermark_ms values received via completions; add force-eviction after eviction_grace_period_ms for lagging workers. 5. Late data re-finalization (significant): "re-open the merged result" was unspecified after a window was already written to the store. Fix: late corrections remain store appends with query-time merge (same as non-merge-tier path); canonical finalized output is never mutated. 6. Message ordering (minor): partials and completions sent through separate channels had no ordering guarantee. Fix: both go through the same MPSC channel from a given first-tier worker; partials always enqueued before the completion for the same window. 7. State structure (minor): per-key completed_sources tracking was incompatible with key-less completions (required O(keys) scan). Fix: split state into window_completions (window-level) and pending_partials (key-level) maps; completion check is O(1), finalization scan is O(keys per window) done once at close. Also updates the rollout strategy to note that per_agg_watermark and flush-timer-driven WindowCompletion emission must be deployed in first-tier workers before any merge worker is started. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…arking Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

zzylol and others added 4 commits March 25, 2026 06:32

feat: add bench_precompute_sketch binary for precompute engine benchm…

fcf5331

…arking Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

add test

decbcc8

zzylol force-pushed the pr/F-refactor-bench-binary branch from 2372141 to decbcc8 Compare March 25, 2026 11:32

zzylol force-pushed the pr/E-correctness-tests-comparison branch from 8c04bc5 to 47d279e Compare March 25, 2026 11:32

zzylol requested a review from milindsrivastava1997 March 25, 2026 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: eliminate code duplication in precompute engine + bench_precompute_sketch binary#233

refactor: eliminate code duplication in precompute engine + bench_precompute_sketch binary#233
zzylol wants to merge 4 commits intopr/E-correctness-tests-comparisonfrom
pr/F-refactor-bench-binary

zzylol commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zzylol commented Mar 25, 2026

Summary

This is PR F of 6 stacked PRs splitting #162

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant