Skip to content

Sketchlib KLL Integration#257

Merged
milindsrivastava1997 merged 8 commits intomainfrom
sketchlib-kll
Apr 2, 2026
Merged

Sketchlib KLL Integration#257
milindsrivastava1997 merged 8 commits intomainfrom
sketchlib-kll

Conversation

@GnaneshGnani
Copy link
Copy Markdown
Contributor

Integrates sketchlib-rust for KLL quantile sketches (single KllSketch and multi-key HydraKllSketch), completing the sketchlib migration alongside CMS and CMWH. Query engine uses the same KllBackend enum pattern as earlier PRs; Arroyo UDF templates (datasketcheskll_, hydrakll_) are sketchlib-only (no dsrs dual path), matching the default of sketchlib for all sketch types. This branch merges current main and resolves conflicts in fidelity, CMS UDF impl_mode rendering, and tooling defaults.

Changes

New Files

asap-common/sketch-core/src/kll_sketchlib.rs

  • Thin integration layer over sketchlib-rust KLL
  • Type alias: SketchlibKll = KLL
  • Helpers: new_sketchlib_kll, sketchlib_kll_update, sketchlib_kll_quantile, sketchlib_kll_merge, bytes_from_sketchlib_kll, sketchlib_kll_from_bytes

Modified Files

asap-common/sketch-core/src/lib.rs

  • pub mod kll_sketchlib;

asap-common/sketch-core/src/kll.rs

  • KllBackend enum (Legacy / Sketchlib)
  • KllSketch holds backend: KllBackend; sketch_bytes(), count(), update, get_quantile, merge, merge_refs, msgpack serde dispatch through backend
  • Clone / Debug for sketchlib path via byte round-trip where needed

asap-common/sketch-core/src/config.rs

  • DEFAULT_IMPL_MODE, DEFAULT_CMS_IMPL, DEFAULT_KLL_IMPL, DEFAULT_CMWH_IMPL all Sketchlib (aligned with production defaults)

asap-query-engine/src/precompute_operators/datasketches_kll_accumulator.rs

  • Uses KllSketch::sketch_bytes() / msgpack path compatible with both backends

asap-summary-ingest/templates/udfs/datasketcheskll_.rs.j2

  • Dependencies: sketchlib-rust, arroyo-udf-plugin, rmp-serde, serde (no dsrs)
  • Single implementation path: KLL::init_kll, SketchInput::F64, msgpack KllSketchData wire format

asap-summary-ingest/templates/udfs/hydrakll_.rs.j2

  • Same sketchlib-only approach; xxhash-rust for row hashes; nested KLL grid serialized as HydraKllSketchData

asap-summary-ingest/run_arroyosketch.py

  • --sketch_kll_impl default sketchlib (must match QueryEngine)
  • Removed impl_mode injection for datasketcheskll_ / hydrakll_ (templates no longer use it)

asap-tools/experiments/experiment_utils/services/arroyo.py

  • Default sketch_kll_impl set to sketchlib for experiment runs

asap-common/sketch-core/src/bin/sketchlib_fidelity.rs

  • Extends main’s CMS/CMWH harness with run_kll_once, run_hydra_kll_once
  • Per-sketch mode labels in output: cms_mode, cmwh_mode, kll_mode (from --cms-impl, --cmwh-impl, --kll-impl)

asap-common/sketch-core/report.md

  • Documents CLI matrix for all sketch types; tables for CMS, CMWH, KLL, HydraKLL (legacy vs sketchlib-rust)

asap-common/sketch-core/Cargo.toml

  • Pinned dsrs / sketchlib-rust revs consistent with main

Technical Approach

Backend abstraction (QueryEngine / sketch-core)

pub enum KllBackend {
    Legacy(KllDoubleSketch),
    Sketchlib(SketchlibKll),
}

pub struct KllSketch {
    pub k: u16,
    pub backend: KllBackend,
}

Wire format (unchanged contract with Arroyo)

pub struct KllSketchData {
    pub k: u16,
    pub sketch_bytes: Vec<u8>,
}

Both backends read/write this struct via MessagePack; sketch bytes are dsrs- or sketchlib-specific internally.

UDFs

Production UDFs always emit sketchlib KLL bytes. Legacy KllBackend remains in sketch-core for tests (force_legacy_mode_for_tests), fidelity binaries, and deserialization of historical data.

Testing

# Unit tests (legacy backends in sketch-core tests via ctor)
cargo test -p sketch-core
cargo test -p query_engine_rust

# Query engine with sketchlib backends
cargo test -p query_engine_rust --features sketchlib-tests

cargo fmt --check

# Fidelity (defaults: all sketchlib)
cargo run -p sketch-core --bin sketchlib_fidelity
cargo run -p sketch-core --bin sketchlib_fidelity -- --cms-impl legacy --kll-impl legacy --cmwh-impl legacy

# UDF validation (Arroyo must be running, e.g. quickstart arroyo service)
cd asap-summary-ingest
python3 validate_udfs.py --all_udfs --template_dir templates \
  --arroyo_url http://localhost:5115/api/v1

Quickstart (local images, KLL path) — optional local-only edits

To exercise quantile queries against images built from this branch:

  • In asap-quickstart/docker-compose.yml, use build instead of image for asap-planner-rs, asap-summary-ingest (context ../asap-summary-ingest, dockerfile: Dockerfile), and queryengine (context .., dockerfile: asap-query-engine/Dockerfile). Keep arroyo on the published image.
  • In asap-quickstart/config/controller-config.yaml, use quantile by (...) (0.5, sensor_reading) style queries and sketch_parameters.DatasketchesKLL.K: 20.
  • Regenerate Grafana JSON if needed: python3 generate_dashboards.py
  • docker compose build then docker compose up, and compare http://localhost:8088/api/v1/query vs Prometheus.

Fidelity Results

KllSketch (absolute rank error)

  • k=20, n=200k: sketchlib slightly higher error than legacy at q=0.5 / 0.9; still small in absolute terms
  • k=50 / k=200: Both backends achieve low rank error; sketchlib comparable at high k

HydraKllSketch

  • Mean / max rank errors remain on the order of 0.01–0.06 for typical grid sizes; acceptable for per-key quantile summaries

See asap-common/sketch-core/report.md for full tables.

@milindsrivastava1997 milindsrivastava1997 merged commit deaa9a9 into main Apr 2, 2026
19 checks passed
@milindsrivastava1997 milindsrivastava1997 deleted the sketchlib-kll branch April 2, 2026 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants