ASAPSketchLib

ASAPSketchLib is a Rust sketch library with reusable sketch building blocks, sketch implementations, and orchestration frameworks.

Supported Sketches

Goal	Use This	When to pick it	Pandas/Polars equivalent (exact, unbounded memory)
Frequency estimation	`CountMin`, `Count Sketch`	You need fast approximate counts for high-volume keys.	`df.groupby("key").size()` / `df.group_by("key").agg(pl.len())` — exact but O(distinct keys) memory
Cardinality estimation	`HyperLogLog` (`Regular`, `DataFusion`, `HIP`)	You need approximate distinct counts with bounded memory.	`df["col"].nunique()` / `df["col"].n_unique()` — exact but O(n) memory
Quantiles/distribution	`KLL`, `DDSketch`	You need percentile/latency summaries over streams.	`df["col"].quantile(0.99)` — exact but requires storing all values
Advanced use cases (frameworks)	see Advanced Use Cases	Hierarchical subpopulation queries, multi-sketch coordination, or sliding-window aggregation over streams.	No direct equivalent — sketches are the only practical solution at stream scale

Full sketch status and API details: APIs Index.

Quick Start

Simple demo use case: estimate unique users with HyperLogLog. Example usage:

use asap_sketchlib::{DataFusion, HyperLogLog, SketchInput};

let mut hll = HyperLogLog::<DataFusion>::default();

// Simulate a stream of user IDs (with duplicates)
for user_id in [101, 202, 303, 101, 404, 202, 505, 101] {
    hll.insert(&SketchInput::U64(user_id));
}

let unique_users = hll.estimate();
println!("estimated unique users: {unique_users}"); // ≈ 5

To validate the repo quickly:

cargo test

Common dev commands:

cargo build --all-targets
cargo test --all-features
cargo bench

Why ASAPSketchLib (vs Apache DataSketches)

Performance is the primary motivation for this library:

Performance-focused implementations with cache-friendly flat counter arrays, row-major layouts, and direct slice access in core sketch paths.
FastPath mode computes a single hash and derives row indices via bit masking, reducing hashing overhead relative to independent-hash modes.
Native Rust: no JNI/FFI bridge. Memory layout, allocation, and hashing stay within the Rust implementation.
Rust-first API: typed inputs (SketchInput) and largely consistent insert/estimate/merge patterns across the main sketches, with pluggable hashing via SketchHasher.
Built-in framework layer (Hydra, HashSketchEnsemble, ExponentialHistogram, UnivMon) included in the same crate, including hash-reuse support for coordinated sketch collections.

When DataSketches may be a better fit:

You need its broader algorithm catalog: CPC sketch, Theta/Tuple sketches with set operators (Union, Intersection, Difference), REQ quantiles sketch, VarOpt/Reservoir sampling, or FM85.
You need cross-language binary compatibility with existing DataSketches deployments in Java, C++, or Python.
You need long-running production maturity and an Apache-governed release cycle.

Algorithms this library provides that DataSketches does not: UnivMon (universal monitoring), Hydra (hierarchical subpopulation sketching), FoldCMS/FoldCS (memory-efficient windowed sketching), and NitroBatch.

Choosing Between Sketches for the Same Goal

Several sketches address the same analytical goal with different trade-offs. For example, CountMin and Count Sketch both estimate frequencies; HyperLogLog (Regular, DataFusion, HIP) all estimate cardinality; KLL and DDSketch both answer quantile queries.

The best current approach is to profile the sketch against a representative sample of your actual data and compare error rates, memory usage, and insert throughput for your specific key distribution and stream volume. The APIs Index lists the status and caveats for each sketch.

A detailed comparison guide with benchmark data across sketch types and workloads is planned.

Name		Name	Last commit message	Last commit date
Latest commit History 277 Commits
.github/workflows		.github/workflows
docs		docs
proto		proto
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASAPSketchLib

Supported Sketches

Quick Start

Why ASAPSketchLib (vs Apache DataSketches)

Choosing Between Sketches for the Same Goal

Documentation

Contributors

Major Contributors

Other Contributors

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ASAPSketchLib

Supported Sketches

Quick Start

Why ASAPSketchLib (vs Apache DataSketches)

Choosing Between Sketches for the Same Goal

Documentation

Contributors

Major Contributors

Other Contributors

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages