[specdec_bench] Stratify --num_requests across categories#1389
[specdec_bench] Stratify --num_requests across categories#1389milesial wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
When the dataset has a `category` column with >1 distinct categories and ``--num_requests N`` is below the dataset size, take ceil(N / num_categories) rows from each category and round-robin interleave them so any prefix is balanced. Falls back to the existing ``range(N)`` slice when category metadata is absent or there's only one category. Fixes a sampling bug where SPEED-Bench parquet files are sorted by category, so e.g. ``--num_requests 64`` on throughput_8k pulls 64 high_entropy prompts and zero from low_entropy / mixed. Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
📝 WalkthroughWalkthroughSPEEDBench dataset loading now only truncates when num_samples is set and smaller than the dataset. When truncating, if a multi-valued ChangesStratified Dataset Sampling
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/specdec_bench/specdec_bench/datasets/speed.py`:
- Around line 765-777: The interleaving logic can produce fewer than n items
when some categories have fewer than per_cat rows; update the loop that builds
interleaved (currently using per_cat and cat_samples) to backfill by repeatedly
cycling through cat_samples and appending the next available item from each
category until interleaved has length n or all samples are exhausted, i.e.,
maintain per-category indices (or pop from lists) and in a while
len(interleaved) < n loop iterate over cat_samples appending one item per
non-empty category, then return dataset.select(interleaved[:n]) to guarantee
num_samples.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f07d27e0-5726-4c6e-ae35-bffedcb82b90
📒 Files selected for processing (1)
examples/specdec_bench/specdec_bench/datasets/speed.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1389 +/- ##
==========================================
+ Coverage 76.90% 77.29% +0.39%
==========================================
Files 471 471
Lines 50565 50565
==========================================
+ Hits 38885 39083 +198
+ Misses 11680 11482 -198
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
h-guo18
left a comment
There was a problem hiding this comment.
Please take a look at CodeRabbit’s comment above. Other than that LGTM
Replace the two-step (per_cat clamp + interleave) selection with a single-pass round-robin walk over the per-category index lists. Smaller categories continue contributing until exhausted; larger categories then fill the remainder. This guarantees exactly ``n`` rows whenever ``n`` does not exceed the total dataset size, where the previous code could under-deliver if any category had fewer than ``ceil(n / num_categories)`` rows. The first ``n`` rows are still balanced (drop the same ``ceil`` / ``per_cat`` quantity); the only change is what happens past the exhaustion point of the smallest category. ``math`` is no longer needed. Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/specdec_bench/specdec_bench/datasets/speed.py`:
- Around line 745-772: In _stratified_select, handle non-positive n early: if n
<= 0 return dataset.select([]) (or an empty index sequence) before any category
processing so the round-robin loop never appends indices for n==0 or negative n;
locate the function _stratified_select and add this early guard immediately
after the docstring (before checking "category" in dataset.column_names) so all
subsequent logic assumes n > 0.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 38099a4c-3298-464c-901c-7a504580c435
📒 Files selected for processing (1)
examples/specdec_bench/specdec_bench/datasets/speed.py
When the dataset has a
categorycolumn with >1 distinct categories and--num_requests Nis below the dataset size, take ceil(N / num_categories) rows from each category and round-robin interleave them so any prefix is balanced. Falls back to the existingrange(N)slice when category metadata is absent or there's only one category.Fixes a sampling bug where SPEED-Bench parquet files are sorted by category, so e.g.
--num_requests 64on throughput_8k pulls 64 high_entropy prompts and zero from low_entropy / mixed.Summary by CodeRabbit