Skip to content

Sketchlib rust migration#161

Closed
GnaneshGnani wants to merge 20 commits intomainfrom
sketchlib-rust-migration
Closed

Sketchlib rust migration#161
GnaneshGnani wants to merge 20 commits intomainfrom
sketchlib-rust-migration

Conversation

@GnaneshGnani
Copy link
Copy Markdown
Contributor

#124 #137

Core sketch types

  • Introduce CountMinBackend, KllBackend, and CountMinWithHeapBackend enums with Legacy and Sketchlib variants
  • Add a backend field to choose between the two implementations
  • Add #[serde(skip)] on backend fields; wire format remains matrix-based for compatibility

Sketch-core

  • CountMin: Add from_legacy_matrix(), sketch(), sketch_mut() accessors
  • KLL: Add sketch_bytes(), count(); update merge, merge_refs, serialization, and Clone
  • CountMinWithHeap: Add from_legacy_matrix(), sketch_mut(), sketch_matrix(), topk_heap_items() accessors
  • HydraKllSketch: Uses refactored KllSketch (no structural changes)
  • Add #[ctor::ctor] to force legacy mode in tests

Query engine

  • Update CountMin, CountMinWithHeap, and DatasketchesKLL accumulators to use the new APIs

UDF templates

  • Update type aliases in CountMin UDFs: countminsketch_count.rs.j2, countminsketch_sum.rs.j2, countminsketchwithheap_topk.rs.j2
  • Add sketchlib support to KLL UDFs: datasketcheskll_.rs.j2, hydrakll_.rs.j2

Testing

  • All sketch-core and query-engine tests pass
  • Backend selection via SKETCH_CORE_CMS_IMPL, SKETCH_CORE_CMWH_IMPL, SKETCH_CORE_KLL_IMPL (default: sketchlib)
  • Arroyo UDF backend selection via ARROYO_SKETCH_CMS_IMPL, ARROYO_SKETCH_CMWH_IMPL, ARROYO_SKETCH_KLL_IMPL (default: sketchlib)

@milindsrivastava1997
Copy link
Copy Markdown
Contributor

FYI, I've seen this. Will need some time to understand and review the changes.

@milindsrivastava1997 milindsrivastava1997 self-requested a review March 10, 2026 17:56
@milindsrivastava1997
Copy link
Copy Markdown
Contributor

Just thinking out loud here. At a very high-level. I think the PR content looks okay but (a) it's mixing multiple different objectives, (b) is difficult to review because it's so large and touches many different files.

Some thoughts about how we can split this into more manageable reviewable chunks:

  • One PR for changes to Rust CI
  • One PR for just introducing the concept of ImplMode
  • One PR for adding sketchlib ImplMode for different sketches to both asap-sketch-ingest and asap-query-engine

Pls share your thoughts @zzylol @GnaneshGnani

@GnaneshGnani
Copy link
Copy Markdown
Contributor Author

GnaneshGnani commented Mar 19, 2026

@milindsrivastava1997 - This sounds good. I would suggest two PRs for adding sketchlib to asap-sketch-ingest and asap-query-engine - one for each

@milindsrivastava1997
Copy link
Copy Markdown
Contributor

@GnaneshGnani I would actually like to keep the changes to the same sketch in asap-sketch-ingest and asap-query-engine in the same PR. Instead of splitting by module, you can split by sketch. So you can have 1 PR per sketch or if that's too much overhead, we can do 1 PR for the first sketch and then another for all the remaining ones. I think this is better just so that the repo is in a runnable state at all times.

@GnaneshGnani
Copy link
Copy Markdown
Contributor Author

@milindsrivastava1997, is this ok?

PR 1: Rust CI changes
PR 2: ImplMode changes
PR 3: Count-Min Sketch
PR 4: Count-Min-With-Heap
PR 5: KLL

@milindsrivastava1997
Copy link
Copy Markdown
Contributor

Sounds great @GnaneshGnani

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change ArroyoSketch UDF sketches to use sketchlib-rust Use sketchlib-rust sketches

2 participants