Add accuracy/performance CI and benchmarks infrastructure by zzylol · Pull Request #248 · ProjectASAP/ASAPQuery

zzylol · 2026-03-28T19:24:03Z

Summary

accuracy_performance.yml: Full e2e eval — builds ASAP service images once (planner-rs, summary-ingest, query-engine) with a sha-<short-sha> tag, spins up the quickstart stack, waits for Arroyo pipeline + sketch ingestion, runs PromQL suite against both Prometheus (baseline) and ASAPQuery, compares accuracy and latency.
benchmarks/docker-compose.yml: Compose override replacing GHCR images with ${ASAP_IMAGE_TAG} so CI always tests latest committed code.
benchmarks/queries/promql_suite.json: 14-query fixed suite covering avg/sum/max/min/quantile at p50/p90/p95/p99, with and without grouping.
benchmarks/scripts/: compare.py, run_baseline.py, run_asap.py, wait_for_stack.sh, ingest_wait.sh (with null-data handling, Arroyo state detection, 600s timeout for UDF compilation).
asap-summary-ingest/Dockerfile: Switch FROM to ghcr.io base image so builds work in any job without a local sketchdb-base image.

Pass/fail policy: no query failures allowed; ASAP-native relative error >5% warns (does not fail); latency regressions are warn-only on ephemeral GH runners.

Test plan

Accuracy/performance workflow builds Docker images and runs H2O groupby accuracy check within error bounds
relative-regression job runs and reports latency comparison table
workflow_dispatch works with runner: ubuntu-latest

🤖 Generated with Claude Code

- accuracy_performance.yml: full e2e eval — builds ASAP service images once (planner-rs, summary-ingest, query-engine) with sha tag, spins up quickstart stack, waits for Arroyo pipeline + sketch ingestion, runs PromQL suite against Prometheus (baseline) and ASAPQuery, then compares accuracy and latency - benchmarks/docker-compose.yml: compose override replacing GHCR images with ASAP_IMAGE_TAG so CI tests latest committed code - benchmarks/queries/promql_suite.json: 14-query fixed suite covering avg/sum/max/min/quantile at p50/p90/p95/p99 with/without grouping - benchmarks/scripts/: compare.py, run_baseline.py, run_asap.py, wait_for_stack.sh, ingest_wait.sh - asap-summary-ingest/Dockerfile: switch FROM to ghcr.io base image so builds work without a local sketchdb-base image Pass/fail policy: no query failures; ASAP-native relative error >5% warns (does not fail); latency regressions are warn-only on ephemeral GH runners. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

benchmarks/queries/promql_suite.json

asap-summary-ingest/Dockerfile

Clarifies that the field marks sketch-based approximate queries (quantiles) where accuracy threshold enforcement applies, vs exact aggregations (avg/sum/max/min) where it does not. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Local/Cloudlab builds default to sketchdb-base:latest (built locally). CI overrides via BASE_IMAGE build arg to pull from GHCR, fixing the break reported in installation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

zzylol requested a review from milindsrivastava1997 March 30, 2026 18:19

milindsrivastava1997 reviewed Mar 31, 2026

View reviewed changes

benchmarks/queries/promql_suite.json Outdated Show resolved Hide resolved

asap-summary-ingest/Dockerfile Show resolved Hide resolved

zzylol and others added 2 commits March 31, 2026 14:20

Parameterize asap-summary-ingest base image via build arg

63fe39b

Local/Cloudlab builds default to sketchdb-base:latest (built locally). CI overrides via BASE_IMAGE build arg to pull from GHCR, fixing the break reported in installation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

zzylol requested a review from milindsrivastava1997 March 31, 2026 19:25

milindsrivastava1997 approved these changes Apr 2, 2026

View reviewed changes

milindsrivastava1997 merged commit cebb798 into main Apr 2, 2026
12 checks passed

milindsrivastava1997 deleted the ci/accuracy-performance branch April 2, 2026 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add accuracy/performance CI and benchmarks infrastructure#248

Add accuracy/performance CI and benchmarks infrastructure#248
milindsrivastava1997 merged 3 commits intomainfrom
ci/accuracy-performance

zzylol commented Mar 28, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zzylol commented Mar 28, 2026

Summary

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants