Claude/consolidate cms frequency pr sb5 kk#104
Open
c-dickens wants to merge 15 commits intoapache:masterfrom
Open
Claude/consolidate cms frequency pr sb5 kk#104c-dickens wants to merge 15 commits intoapache:masterfrom
c-dickens wants to merge 15 commits intoapache:masterfrom
Conversation
leerho
approved these changes
Feb 24, 2026
Consolidates the Count-Min Sketch frequency estimation characterization from PRs #1, #2, and #3 into a single clean profile. The profile sweeps across sketch widths (256-4096) with constant load factor (distinct/width ≈ 4), runs adaptive trials per width, and uses KLL sketches to track error distribution quantiles (median, p75, p90, p95, max) for both absolute and relative error metrics against theoretical bounds. https://claude.ai/code/session_01RmEdWmm6vYXY3XevAAsWVe
Remove all relative error tracking. Single KLL sketch tracks absolute error distribution only. 170 lines → 100 lines. https://claude.ai/code/session_01RmEdWmm6vYXY3XevAAsWVe
Both width and stream length are now fixed parameters rather than sweep axes. Runs N trials and reports one row of results with KLL quantiles on absolute point query error. https://claude.ai/code/session_01RmEdWmm6vYXY3XevAAsWVe
Bar chart of absolute error quantiles from KLL sketch with theoretical bound overlay. Reads single-row TSV. https://claude.ai/code/session_01RmEdWmm6vYXY3XevAAsWVe
This line referenced a CMake target 'common' that doesn't exist in the project, causing cmake configuration to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements per-item error characterisation using KLL quantile sketches across trials with a fixed cached stream. Includes C++ profile, Python plotting script (log-log SVG with sigma bands and theoretical bounds), and bound violation tracking. Also comments out tdigest accuracy profiles that depend on unavailable ddsketch.hpp. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- C++ profile now outputs 27 quantile levels from 0.0 to 1.0, including the original 7 sigma levels plus dense intermediate coverage for richer slice analysis - Main plot updated to show min/max bands alongside sigma-level bands, using percentile labels instead of sigma notation (not normally distributed) - Add frequency slice plot script that shows the full error quantile function at selected true frequencies Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reshape C++ quantile output to focus on the upper tail where CMS error behavior is interesting: sparse below median, dense 90-99% with per-mille resolution in the 99-100% range (30 levels total) - Rewrite slice plot as empirical CDF: log(absolute error) on x-axis, quantile level on y-axis, with theoretical bound as vertical line - Main band plot unchanged (selects 9 levels from wider set) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
aab73aa to
9c3d597
Compare
Rewrite Experiment 1 section to be precise about: - Fixed stream across trials (only CMS seed varies) - stdout for TSV data, stderr for diagnostics - Dense upper-tail quantile levels (not symmetric sigma) - Correct uv invocation (uv run script.py, not uv run python) - Both plot specs (band plot + error CDF slice plot) - Bound violation tracking - Makefile targets Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Restore tdigest accuracy profile includes and registrations to match upstream master (our PR should not modify unrelated code) - Remove plot_cms_frequency_slice.py (out of scope for this PR) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was an earlier experiment superseded by cms_point_query_profile. Only the point query profile is needed for this PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
plot_cms_frequency.py reads a different TSV format that the current profile doesn't produce. The active plotting script is cpp/scripts/plot_cms_point_query.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Initial point query profile and plots