Skip to content

feat(encoding): numeric compression levels (1-22) API#61

Closed
polaz wants to merge 18 commits intomainfrom
feat/#21-feat-numeric-compression-levels-1-22-api
Closed

feat(encoding): numeric compression levels (1-22) API#61
polaz wants to merge 18 commits intomainfrom
feat/#21-feat-numeric-compression-levels-1-22-api

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Apr 4, 2026

Summary

  • Add CompressionLevel::Level(i32) variant and from_level() constructor for C zstd compatible numeric level API
  • Port level parameter table (levels 1–22) with per-level window size, hash/chain configuration, search depth, and lazy depth
  • Support negative levels for ultra-fast compression mode
  • Source-size-aware selection: set_source_size_hint() on Matcher trait, FrameCompressor, and StreamingEncoder — caps matcher/window parameters from ceil(log2(src_size)) with a MIN_WINDOW_LOG floor
  • Update CLI to accept zstd-compatible numeric levels with --store flag for uncompressed frames

Technical Details

Each numeric level resolves to a LevelParams struct containing the matcher backend, window log, hash fill step, lazy depth, and hash-chain config. Named variants map exactly to their numeric equivalents: Fastest=1, Default=3, Better=7, Best=11. Level 0 = default (level 3), matching C zstd semantics.

For small inputs, set_source_size_hint() adjusts window_log and HC table sizes proportionally to source size, including src_size == 0 (clamped to minimum window), avoiding multi-MiB allocations for tiny payloads. StreamingEncoder::set_pledged_content_size() also forwards the pledged size as matcher hint.

Window sizes are capped at 64 MiB (window_log=26) to stay within the decoder's 100 MiB limit.

Known Limitations

  • Levels 4 (greedy), 13+ (btopt/btultra) are approximated with the closest available backend — compression ratios won't match C zstd at those levels
  • Dfast backend parameters (hash bits, search depth) are not yet per-level configurable — levels 2–4 share the same hash table size

Test Plan

  • from_level(n) matches named variants for n=1,3,7,11
  • All 22 levels produce valid output and roundtrip correctly (with source-size hint applied in sampled tiny-input loop)
  • Negative levels roundtrip correctly
  • Sampled level validity: non-empty output + roundtrip for [1,3,7,11]
  • Source-size hint: small input roundtrip, window reduction, streaming+pledged, all levels with tiny input
  • Streaming encoder works with numeric levels
  • Out-of-range levels are clamped safely on the default matcher path (MatchGeneratorDriver)
  • cargo nextest run --workspace, cargo build --workspace, cargo fmt --all

Closes #21

Summary by CodeRabbit

  • New Features

    • Support numeric zstd-compatible compression levels (including negative "ultra-fast") and a new --store flag; APIs let callers provide an optional source-size hint which can reduce advertised window size.
  • Bug Fixes

    • CLI now enforces compression-level bounds at parse time and rejects out-of-range values.
  • Documentation

    • README updated: numeric-level examples, default clarified as 0, and mapping guidance.
  • Tests

    • Expanded coverage for numeric levels, roundtrips, clamping, and size-hint effects.

Compatibility Note

  • CompressionLevel::Level(i32) expands the public enum shape and is semver-breaking for downstream crates that exhaustively match on CompressionLevel without a wildcard arm.

Copilot AI review requested due to automatic review settings April 4, 2026 11:05
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

Warning

Rate limit exceeded

@polaz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 14 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 8 minutes and 14 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b001da5e-10a7-41d3-96af-64cc8ff7fe7f

📥 Commits

Reviewing files that changed from the base of the PR and between dcbdbd1 and 734f748.

📒 Files selected for processing (1)
  • zstd/src/encoding/mod.rs
📝 Walkthrough

Walkthrough

Adds numeric Zstandard-compatible compression levels (signed i32, 1–22 and negative ultra-fast), plumbing to resolve level parameters, per-frame source-size hints propagated to matchers/encoders, routes Level(_) through compressed encode paths, updates CLI to accept signed levels and --store, and extends tests and README examples.

Changes

Cohort / File(s) Summary
Docs
README.md
Replace named presets with numeric Zstd-compatible semantics (0 default, 1–22, negative ultra-fast); add CompressionLevel::from_level(n) example and update feature checklist.
CLI
cli/src/main.rs
--level changed to i32 (clamped to MIN_LEVEL..=MAX_LEVEL), new --store: bool flag; Commands::Compress and compress(...) signature updated to accept (level: i32, store: bool); help text and tests adjusted.
Public API
zstd/src/encoding/mod.rs
CompressionLevel now derives Debug, adds Level(i32), constants MIN_LEVEL/MAX_LEVEL/DEFAULT_LEVEL, and pub const fn from_level(i32) -> Self; Matcher trait gains set_source_size_hint default method.
Level resolver & matcher entrypoints
zstd/src/encoding/match_generator.rs
Introduce LevelParams/LEVEL_TABLE-driven resolution for levels 1–22 and negative levels; compute effective params optionally adjusted by source_size_hint; add MatchGeneratorDriver::set_source_size_hint and make reset(level) apply resolved params.
Matcher internals (Dfast / HC)
zstd/src/encoding/match_generator.rs
Dfast stores hash_bits and exposes set_hash_bits() to rebuild tables; Dfast helpers become instance methods; HC sizing/logs derived from resolved params; pooled buffer normalization to slice_size.
Frame compressor
zstd/src/encoding/frame_compressor.rs
Add pub fn set_source_size_hint(&mut self, size: u64); route CompressionLevel::Level(_) through encoded/compressed block path; forward hint into matcher before compress; minor test assertion formatting tweaks and test adjustments.
Streaming encoder
zstd/src/encoding/streaming_encoder.rs
Add pub fn set_source_size_hint(&mut self, size: u64) -> Result<(), Error>; forward pledged size to matcher; treat Level(_) as supported, allocate via matcher get_next_space(), and route Level(_) into compress_block_encoded.
Tests
zstd/src/tests/roundtrip_integrity.rs, other zstd tests
Add extensive tests verifying from_level mapping, numeric levels (1..=22), negative ultrafast levels, clamping/out-of-range normalization, source-size-hint effects on windows, and streaming roundtrips.
Small doc/test tweak
README.md, CLI tests
Add CompressionLevel::from_level(7) usage example and update checklist entries; update CLI tests to reflect parse-time clamping and --store behavior.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI / User
    participant Main as CLI main
    participant Resolver as LevelResolver
    participant Encoder as StreamingEncoder / FrameCompressor
    participant Matcher as MatchGenerator
    participant Backend as CompressionBackend

    CLI->>Main: compress(file, level = n, --store?)
    Main->>Resolver: CompressionLevel::from_level(n)
    alt store == true
        Main->>Encoder: call compress(..., level = Uncompressed, store = true)
        Encoder->>Backend: write stored (uncompressed) blocks
        Backend-->>Encoder: stored output
        Encoder-->>Main: finished
    else store == false
        Main->>Encoder: call compress(..., level = Level(n), store = false)
        Encoder->>Resolver: resolve level params (n)
        Resolver-->>Encoder: LevelParams (backend, window_log, hashes, ...)
        Encoder->>Matcher: set_source_size_hint(size?)
        Encoder->>Matcher: reset(params)
        Matcher->>Backend: configure tables (hash_bits, slice_size, ...)
        Encoder->>Backend: compress_block_encoded(using matcher)
        Backend-->>Encoder: compressed block
        Encoder-->>Main: emit compressed output
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐇
I nibble tables, count each byte,
Twenty-two steps from swift to tight.
Hints I whisper, matchers leap —
Windows shrink while carrots sleep.
Hop, compress, and dream in flight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main feature: adding a numeric compression levels (1-22) API with CompressionLevel::from_level() support.
Linked Issues check ✅ Passed The PR comprehensively addresses all requirements from issue #21: numeric level API via CompressionLevel::Level(i32) and from_level(), parameter tables for levels 1-22, source-size-aware selection via set_source_size_hint(), negative level support, and extensive roundtrip testing confirming functionality.
Out of Scope Changes check ✅ Passed All changes align with issue #21 objectives: numeric API, parameter tables, source-size hints, CLI integration, and roundtrip tests; no unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#21-feat-numeric-compression-levels-1-22-api

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 4, 2026

Codecov Report

❌ Patch coverage is 96.17225% with 8 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zstd/src/encoding/match_generator.rs 97.36% 4 Missing ⚠️
zstd/src/encoding/streaming_encoder.rs 92.68% 3 Missing ⚠️
zstd/src/encoding/mod.rs 88.88% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a zstd-compatible numeric compression level API to the encoder and updates internal tuning to be level-driven, plus adjusts the CLI/docs to expose numeric levels.

Changes:

  • Introduces CompressionLevel::Level(i32) plus CompressionLevel::from_level() and derives Debug for better diagnostics.
  • Adds a per-level parameter table (levels 1–22 + negative “ultra-fast”) and routes matcher configuration through resolved LevelParams.
  • Updates encoder/CLI paths to recognize numeric levels and expands tests/docs to cover the new API.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
zstd/src/encoding/mod.rs Adds numeric level variant/constructor and documents level semantics/mappings.
zstd/src/encoding/match_generator.rs Implements level parameter table and resolves numeric levels into backend/window/config choices.
zstd/src/encoding/streaming_encoder.rs Treats CompressionLevel::Level(_) as a compressing mode in streaming paths.
zstd/src/encoding/frame_compressor.rs Treats CompressionLevel::Level(_) as a compressing mode and updates debug formatting/tests.
zstd/src/tests/roundtrip_integrity.rs Adds roundtrip + equivalence tests for numeric levels (including negative/out-of-range).
cli/src/main.rs Switches CLI to accept numeric levels (including negative) and maps CLI level to library levels.
README.md Documents numeric level usage and updates feature list accordingly.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/src/main.rs`:
- Around line 37-55: The CLI flag `level: i32` is currently interpreted so that
`compress()` special-cases 0 into `CompressionLevel::Uncompressed`, which makes
`--level 0` emit a raw frame instead of zstd level 0; change the logic so that
`level == 0` maps to the zstd level via `CompressionLevel::from_level(0)` (or
equivalent) and remove the special-case that converts 0 to
`CompressionLevel::Uncompressed`; if you need an uncompressed/raw mode, add a
separate flag (e.g., `--raw` or `--uncompressed`) and have `compress()` check
that flag to return `CompressionLevel::Uncompressed` instead of overloading the
numeric `level` field.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ed87d76e-03be-48a9-96ac-25559edee973

📥 Commits

Reviewing files that changed from the base of the PR and between cd31f5e and 2610970.

📒 Files selected for processing (7)
  • README.md
  • cli/src/main.rs
  • zstd/src/encoding/frame_compressor.rs
  • zstd/src/encoding/match_generator.rs
  • zstd/src/encoding/mod.rs
  • zstd/src/encoding/streaming_encoder.rs
  • zstd/src/tests/roundtrip_integrity.rs

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/src/main.rs`:
- Around line 51-66: The CLI currently restricts --level to -5..=22 which
prevents using the full CompressionLevel numeric API; update the arg parser for
the level field in main.rs (the #[arg(...)] on level: i32) so it does not
hardcode -5: either remove the .range(...) and accept any i32 then
validate/convert with CompressionLevel::from_level() at compress-time, or set
the range to use the library constants
(CompressionLevel::MIN_LEVEL..=CompressionLevel::MAX_LEVEL) so the parser
accepts the full allowed range; also ensure the --store flag handling (store:
bool) continues to ignore the level value at runtime by short-circuiting before
validating/converting the level when store is true.

In `@zstd/src/encoding/match_generator.rs`:
- Around line 76-113: The resolver always picks a single preset from LEVEL_TABLE
for a given CompressionLevel, ignoring zstd's source-size classes; update the
API and logic so resolve_level_params can consider input size (e.g., add a
src_size parameter or a CompressionSizeHint) and use that hint to choose the
appropriate size-class variant of the level preset (either by consulting a small
mapping of clevel size classes or by selecting alternate entries/adjusted
window_log/target_len for small inputs) before returning LevelParams; modify
call sites to pass the source-size hint and update any tests to cover small vs
large input behavior (refer to LEVEL_TABLE and
resolve_level_params/CompressionLevel for where to apply the change).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a287a2da-3082-4842-9514-0272b8164c32

📥 Commits

Reviewing files that changed from the base of the PR and between 2610970 and 4cb6828.

📒 Files selected for processing (2)
  • cli/src/main.rs
  • zstd/src/encoding/match_generator.rs

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cli/src/main.rs (1)

120-145: ⚠️ Potential issue | 🟠 Major

Forward the known input size into StreamingEncoder.

This path already reads the file length for progress reporting, but it never calls set_source_size_hint() (or set_pledged_content_size()), so the CLI still skips the new size-aware window/table selection on the main compression path. Tiny files at higher levels will keep paying the full matcher footprint unnecessarily.

Suggested fix
         let mut encoder =
             structured_zstd::encoding::StreamingEncoder::new(temporary_output, compression_level);
+        encoder
+            .set_source_size_hint(source_size as u64)
+            .wrap_err("failed to configure source size hint")?;
         std::io::copy(&mut encoder_input, &mut encoder).wrap_err("streaming compression failed")?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/main.rs` around lines 120 - 145, In compress(), after computing
source_size and before streaming data into the encoder, pass the known input
size to the zstd encoder so it can use size-aware window/table selection: when
you construct the StreamingEncoder
(structured_zstd::encoding::StreamingEncoder::new(...)) call
set_source_size_hint(source_size) (or set_pledged_content_size(source_size) if
that API is used) on the encoder instance (the variable named encoder) before
calling std::io::copy(&mut encoder_input, &mut encoder); this forwards the file
length you already computed into the encoder and enables the new size-aware
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 121-142: The small-input path adjusts LevelParams in
adjust_params_for_source_size but only clamps HC table logs, leaving the DFAST
tables at full size; update either adjust_params_for_source_size to reduce the
effective hash table size for MatcherBackend::Dfast (e.g. derive a reduced dfast
hash bits value from src_log/params.window_log) or modify
DfastMatchGenerator::ensure_hash_tables to consult the adjusted LevelParams
(instead of always allocating 1 << DFAST_HASH_BITS) and allocate smaller
short/long tables when the source is tiny; reference
adjust_params_for_source_size, LevelParams, MatcherBackend::Dfast,
DfastMatchGenerator::ensure_hash_tables and DFAST_HASH_BITS so the dfast
allocations are proportional to the hinted small-input params.
- Around line 348-350: The resolved window_log can make max_window_size smaller
than the normal block/slice size, so clamp self.slice_size to not exceed
max_window_size when you compute params (use the computed max_window_size to set
self.slice_size = min(self.base_slice_size, max_window_size)) and ensure
commit_space() / get_next_space() honor this reduced slice size when allocating
or reusing pooled vectors; this prevents commit_space() from passing a too-large
block to MatchGenerator::reserve / DfastMatchGenerator::add_data /
HcMatchGenerator::add_data and avoids the assertion failure.

In `@zstd/src/tests/roundtrip_integrity.rs`:
- Around line 664-672: Update the out_of_range_level_clamped test to also
exercise the i32::MIN edge case: call roundtrip_at_level(&data,
CompressionLevel::from_level(i32::MIN)) and assert equality with data just like
the other cases so the test covers the overflow-saturation path in
CompressionLevel::from_level rather than only large negative values like
-200000; keep the existing positive out-of-range case too.
- Around line 698-730: The test only checks roundtrip and output length but must
also assert that the size hint changed the encoder's emitted window size; after
producing with_hint, parse the frame header (or use the decoder's/window
inspector API) to read the emitted window_size and assert it's smaller than the
full level-11 window (e.g. < 16 * 1024 * 1024) and different from the no_hint
frame; update source_size_hint_reduces_window_for_small_input to call
FrameCompressor::set_source_size_hint, compress, then extract and assert the
frame's window_size() (or decoder.window_size()) reflects the hint so the hint
plumbing is actually exercised.

---

Outside diff comments:
In `@cli/src/main.rs`:
- Around line 120-145: In compress(), after computing source_size and before
streaming data into the encoder, pass the known input size to the zstd encoder
so it can use size-aware window/table selection: when you construct the
StreamingEncoder (structured_zstd::encoding::StreamingEncoder::new(...)) call
set_source_size_hint(source_size) (or set_pledged_content_size(source_size) if
that API is used) on the encoder instance (the variable named encoder) before
calling std::io::copy(&mut encoder_input, &mut encoder); this forwards the file
length you already computed into the encoder and enables the new size-aware
behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3d7de7cb-33f1-4d41-bcc0-47bcb28d6a76

📥 Commits

Reviewing files that changed from the base of the PR and between 4cb6828 and 301f37b.

📒 Files selected for processing (6)
  • cli/src/main.rs
  • zstd/src/encoding/frame_compressor.rs
  • zstd/src/encoding/match_generator.rs
  • zstd/src/encoding/mod.rs
  • zstd/src/encoding/streaming_encoder.rs
  • zstd/src/tests/roundtrip_integrity.rs

Copy link
Copy Markdown

@sw-release-bot sw-release-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: 734f748 Previous: bbd826e Ratio
compress/fastest/low-entropy-1m/matrix/c_ffi 0.188 ms 0.157 ms 1.20
compress/default/large-log-stream/matrix/pure_rust 108.565 ms 92.164 ms 1.18
compress/best/large-log-stream/matrix/c_ffi 15.011 ms 10.602 ms 1.42
decompress/fastest/high-entropy-1m/c_stream/matrix/c_ffi 0.035 ms 0.026 ms 1.35
decompress/default/high-entropy-1m/c_stream/matrix/c_ffi 0.036 ms 0.027 ms 1.33
decompress/better/high-entropy-1m/c_stream/matrix/c_ffi 0.035 ms 0.026 ms 1.35

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

polaz added 4 commits April 4, 2026 16:52
- Add CompressionLevel::Level(i32) variant and from_level() constructor
- Port C zstd level parameter table (levels 1-22) with per-level
  window size, hash/chain config, search depth, and lazy depth
- Negative levels (-1..-131072) select ultra-fast mode via Simple
  backend with progressively coarser hash insertion
- Level 0 maps to default (level 3), matching C zstd semantics
- Named variants map exactly to their numeric equivalents:
  Fastest=1, Default=3, Better=7, Best=11
- Refactor MatchGeneratorDriver::reset() to use centralized LevelParams
  instead of per-variant hardcoded constants
- Update CLI to accept zstd-compatible numeric levels (-5..22)
- Add 11 roundtrip tests covering equivalence, all 22 levels,
  negative levels, monotonic ratio, streaming, and clamping

Closes #21
- Use saturating_abs() instead of (-n) cast to prevent overflow
  when n == i32::MIN in negative level resolution
- Use CompressionLevel::MIN_LEVEL constant instead of hardcoded 131072
- CLI level 0 now maps to default compression (level 3), consistent
  with CompressionLevel::from_level(0) and C zstd semantics
- Add --store flag for uncompressed zstd frames (replaces old level 0
  = Uncompressed behavior)
- Remove "C zstd numbering" claim from CLI help text since the CLI
  extends standard numbering with --store
Source-size-aware selection (issue #21 requirement):
- Add Matcher::set_source_size_hint() trait method (default no-op)
- MatchGeneratorDriver stores hint and adjusts window_log + HC table
  sizes for small inputs, following C zstd source-size-class behavior
- FrameCompressor::set_source_size_hint() passes through to matcher
- StreamingEncoder::set_pledged_content_size() also sets the hint
- StreamingEncoder::set_source_size_hint() for hint without enforcement
- For small inputs, window and hash tables are capped proportionally
  to source size, avoiding multi-MB allocations for tiny payloads

Review fixes:
- Derive Level(0) index from DEFAULT_LEVEL constant (Copilot #5)
- Tighten test doc comment to match strict assertion (Copilot #6)
- Widen CLI level range to MIN_LEVEL..=MAX_LEVEL (CodeRabbit #7)
- Use saturating_abs() for negative level overflow (Copilot #1, prev)
- pass source size from CLI into StreamingEncoder before writing
- resize dfast hash tables based on hinted window and clamp slice size to window
- strengthen roundtrip tests for i32::MIN clamping and hinted frame window assertions
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 4, 2026

@coderabbitai Fixed outside-diff item: compress() now forwards the known source_size into StreamingEncoder via set_source_size_hint() before streaming io::copy.

@polaz polaz force-pushed the feat/#21-feat-numeric-compression-levels-1-22-api branch from 301f37b to 625c1f0 Compare April 4, 2026 14:01
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the compress function in the CLI main.rs and show the relevant context
rg -n "set_source_size_hint\|source_size\|io::copy\|StreamingEncoder" --type rust cli/src/main.rs -A2 -B2

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Show the full CLI main.rs to find the compress function
cat cli/src/main.rs

Length of output: 21261


🏁 Script executed:

#!/bin/bash
# Check recent commits on the PR branch
git log --oneline -10

# Also double-check the specific encoder block in compress()
rg -n "encoder\|source_size_hint\|source_size" cli/src/main.rs

Length of output: 757


@polaz — from the current state of the branch (301f37b) I can see in cli/src/main.rs, the encoder closure still looks like this (no hint call):

let mut encoder =
    structured_zstd::encoding::StreamingEncoder::new(temporary_output, compression_level);
std::io::copy(&mut encoder_input, &mut encoder).wrap_err("streaming compression failed")?;
encoder.finish().wrap_err("failed to finalize zstd frame")

The source_size value (computed from file metadata) is only used for ProgressMonitor and the final log ratio — there's no encoder.set_source_size_hint(source_size as u64) call before the io::copy. It's possible your fix is in a local commit that hasn't been pushed yet, or was pushed after the sandbox snapshot was taken.

Once the fix lands, the encoder block should look roughly like:

let mut encoder =
    structured_zstd::encoding::StreamingEncoder::new(temporary_output, compression_level);
encoder
    .set_source_size_hint(source_size as u64)
    .wrap_err("failed to set source size hint")?;
std::io::copy(&mut encoder_input, &mut encoder).wrap_err("streaming compression failed")?;
encoder.finish().wrap_err("failed to finalize zstd frame")

Could you double-check whether the commit containing that change has been pushed to the PR branch?


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T16:36:03.209Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` appears asymmetric to `new()` in its pre-allocation, but the asymmetry is intentional: `decoder_scratch.reset(window_size)` internally calls `DecodeBuffer::reset()` which already reserves `window_size`. Therefore `reset()` only needs to call `scratch.buffer.reserve(frame_content_size)` on top of that for the FCS delta, rather than explicitly reserving `window_size` again as `new()` does. Do not re-flag this as a bug or asymmetry.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-01T06:49:34.233Z
Learning: In the `structured-zstd` codebase (`zstd/src/encoding/match_generator.rs`), `CompressionLevel::Default` uses the `DfastMatchGenerator` backend (with `pick_lazy_match` lazy evaluation), while `CompressionLevel::Better` uses the `HcMatchGenerator` (hash chain backend) which inserts positions before lazy evaluation inside its own `start_matching` loop. These two backends are independent; review comments about Dfast internals (e.g. position seeding order in `pick_lazy_match`) do not apply to the Better level's implementation path.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/encoding/frame_compressor.rs:280-287
Timestamp: 2026-04-02T16:31:58.167Z
Learning: In `structured-zstd`, `CompressionLevel::Best` intentionally uses `MatcherBackend::HashChain` (not a binary-tree/DUBT backend) due to memory constraints: a BT matcher for a 16 MiB window requires ~128–256 MB of tables (C zstd level 11 uses chainLog=25 → 256 MB), which is impractical. Best is distinguished from Better by deeper HC search (32 vs 16 candidates), larger hash/chain tables (2M/1M vs 1M/512K), and higher target match length (128 vs 48), keeping memory under 20 MB. It achieves a proportional compression gain (~0.27% better than Better on corpus) comparable to C zstd's L7→L11 delta (~0.14%). A proper BT backend is deferred as a future optimization. Do not flag Best's use of HashChain as missing a BT implementation.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-01T06:49:40.012Z
Learning: In the structured-zstd codebase (`zstd/src/encoding/match_generator.rs`), `CompressionLevel::Default` uses the `DfastMatchGenerator` backend (with `pick_lazy_match` for lazy evaluation), while `CompressionLevel::Better` uses the `HcMatchGenerator` (hash chain backend) which inserts positions before lazy evaluation inside its own `start_matching` loop. These are distinct backends with separate lazy-seeding paths.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:479-523
Timestamp: 2026-04-02T19:19:44.750Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), the per-level roundtrip test suites (Better, Best) share a common `level_roundtrip_suite!` macro that expands the same 7-test matrix (compressible, random, multi-block, streaming, edge cases, repeat offsets, large literals) for each level via module-scoped macro invocations. Do not suggest re-inlining these as separate flat test functions — the macro was added intentionally to eliminate cross-level drift.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
zstd/src/tests/roundtrip_integrity.rs (1)

748-764: ⚠️ Potential issue | 🟡 Minor

This test doesn't lock in the automatic hint plumbing yet.

If set_pledged_content_size() stops forwarding to set_source_size_hint(), this still passes because level 11 roundtrips fine without the hint. Please assert an observable encoder outcome such as the advertised window_size() versus a no-hint streaming frame.

Suggested assertion shape
 fn streaming_pledged_size_uses_source_hint() {
     use crate::encoding::StreamingEncoder;
     use crate::io::Write;
 
     let data = generate_compressible(9602, 2 * 1024); // 2 KiB
+    let no_hint = {
+        let mut encoder = StreamingEncoder::new(Vec::new(), CompressionLevel::from_level(11));
+        encoder.write_all(&data).unwrap();
+        encoder.finish().unwrap()
+    };
     let mut encoder = StreamingEncoder::new(Vec::new(), CompressionLevel::from_level(11));
     encoder.set_pledged_content_size(data.len() as u64).unwrap();
     encoder.write_all(&data).unwrap();
     let compressed = encoder.finish().unwrap();
+
+    let no_hint_window = crate::decoding::frame::read_frame_header(no_hint.as_slice())
+        .unwrap()
+        .0
+        .window_size()
+        .unwrap();
+    let hinted_window = crate::decoding::frame::read_frame_header(compressed.as_slice())
+        .unwrap()
+        .0
+        .window_size()
+        .unwrap();
+    assert!(hinted_window <= no_hint_window);
+    assert!(hinted_window < 16 * 1024 * 1024);
 
     let mut decoder = StreamingDecoder::new(compressed.as_slice()).unwrap();
     let mut result = Vec::new();
     decoder.read_to_end(&mut result).unwrap();
     assert_eq!(data, result, "Pledged-size streaming must roundtrip");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/tests/roundtrip_integrity.rs` around lines 748 - 764, The test
streaming_pledged_size_uses_source_hint() only checks roundtrip and can pass
even if set_pledged_content_size() no longer forwards to set_source_size_hint();
update the test to assert an observable encoder behavior by capturing the
encoder's advertised window size (call window_size() or equivalent after writing
but before finish) and compare it to the window size produced when you omit the
pledged-size hint (create a second StreamingEncoder without calling
set_pledged_content_size() and write the same data) to ensure
set_pledged_content_size() actually affects set_source_size_hint() and changes
the encoder framing; reference StreamingEncoder::new,
set_pledged_content_size(), set_source_size_hint(), and window_size() to locate
the relevant code paths.
cli/src/main.rs (1)

51-68: ⚠️ Potential issue | 🟡 Minor

--store still doesn't fully ignore --level.

The help text says the level is ignored, but the clap range check still rejects inputs like --store --level 23 before compress() can short-circuit. Either move level validation out of the parser, or tighten the doc string so it only promises runtime ignore after successful parsing.

One minimal fix is to make the help text match the current behavior
-        /// When set, --level is ignored and the input is wrapped in a
-        /// raw zstd frame without any compression.
+        /// When set, the parsed `--level` value is ignored and the input is
+        /// wrapped in a raw zstd frame without any compression.
+        /// Values outside the supported range are still rejected by the CLI parser.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/main.rs` around lines 51 - 68, The CLI currently enforces a clap
range validator on the level argument so values like `--store --level 23` are
rejected during parsing even though the code promises to ignore level at
runtime; update the argument handling so parsing does not fail when --store is
set — e.g., make `level` an Option<i32> or remove the range validator and
perform range validation inside your runtime logic (in compress() or the
function that dispatches compression) and skip that validation when `store` is
true; reference the `level` and `store` arg definitions and the `compress()`
call to implement the runtime-only validation or adjust the help text to state
parsing may still validate level.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cli/src/main.rs`:
- Around line 51-68: The CLI currently enforces a clap range validator on the
level argument so values like `--store --level 23` are rejected during parsing
even though the code promises to ignore level at runtime; update the argument
handling so parsing does not fail when --store is set — e.g., make `level` an
Option<i32> or remove the range validator and perform range validation inside
your runtime logic (in compress() or the function that dispatches compression)
and skip that validation when `store` is true; reference the `level` and `store`
arg definitions and the `compress()` call to implement the runtime-only
validation or adjust the help text to state parsing may still validate level.

In `@zstd/src/tests/roundtrip_integrity.rs`:
- Around line 748-764: The test streaming_pledged_size_uses_source_hint() only
checks roundtrip and can pass even if set_pledged_content_size() no longer
forwards to set_source_size_hint(); update the test to assert an observable
encoder behavior by capturing the encoder's advertised window size (call
window_size() or equivalent after writing but before finish) and compare it to
the window size produced when you omit the pledged-size hint (create a second
StreamingEncoder without calling set_pledged_content_size() and write the same
data) to ensure set_pledged_content_size() actually affects
set_source_size_hint() and changes the encoder framing; reference
StreamingEncoder::new, set_pledged_content_size(), set_source_size_hint(), and
window_size() to locate the relevant code paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dd4a6344-209b-45f2-9c75-97a89e7d4e62

📥 Commits

Reviewing files that changed from the base of the PR and between 301f37b and 625c1f0.

📒 Files selected for processing (7)
  • README.md
  • cli/src/main.rs
  • zstd/src/encoding/frame_compressor.rs
  • zstd/src/encoding/match_generator.rs
  • zstd/src/encoding/mod.rs
  • zstd/src/encoding/streaming_encoder.rs
  • zstd/src/tests/roundtrip_integrity.rs

- make pledged-size streaming test assert observable window-size effect
- clarify CLI help: --store skips compression but parse-time level validation remains
- add regression test for --store with out-of-range --level
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/src/main.rs`:
- Around line 419-455: Tests use hardcoded numeric limits (e.g., 23 and -131073)
that duplicate the production bounds; update the test inputs to derive
out-of-range and boundary values from CompressionLevel constants instead. In the
tests cli_rejects_unsupported_compression_level_at_parse_time,
cli_rejects_too_negative_compression_level, and
cli_store_still_validates_level_range_at_parse_time, replace the literal "23"
and "-131073" with expressions based on CompressionLevel::MAX_LEVEL and
CompressionLevel::MIN_LEVEL (e.g., (CompressionLevel::MAX_LEVEL + 1).to_string()
for an unsupported-high value and (CompressionLevel::MIN_LEVEL - 1).to_string()
for an unsupported-low value) so the parser-boundary test inputs track the
actual API limits.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ed8f871b-6068-40ea-af4f-0248064f2eb6

📥 Commits

Reviewing files that changed from the base of the PR and between 625c1f0 and ec9a61a.

📒 Files selected for processing (2)
  • cli/src/main.rs
  • zstd/src/tests/roundtrip_integrity.rs

@polaz polaz requested a review from Copilot April 4, 2026 15:04
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/mod.rs`:
- Around line 127-129: from_level currently always returns
CompressionLevel::Level(level) which breaks the "3 equals Default" expectation;
change the const fn from_level(level: i32) to return CompressionLevel::Default
when level == 3 and CompressionLevel::Level(level) otherwise so that calls like
from_level(3) produce CompressionLevel::Default; update the match/if logic in
from_level to implement this check while preserving const fn semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d9e37710-37d8-44f2-a8f3-d4314d310d08

📥 Commits

Reviewing files that changed from the base of the PR and between c37f6cf and 6aa7618.

📒 Files selected for processing (1)
  • zstd/src/encoding/mod.rs

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 4, 2026

@coderabbitai resume and full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

@polaz — resuming automatic reviews and triggering a full review of PR #61 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:479-523
Timestamp: 2026-04-02T19:19:44.750Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), the per-level roundtrip test suites (Better, Best) share a common `level_roundtrip_suite!` macro that expands the same 7-test matrix (compressible, random, multi-block, streaming, edge cases, repeat offsets, large literals) for each level via module-scoped macro invocations. Do not suggest re-inlining these as separate flat test functions — the macro was added intentionally to eliminate cross-level drift.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 58
File: .github/workflows/ci.yml:31-32
Timestamp: 2026-04-03T11:24:17.021Z
Learning: In `structured-world/structured-zstd`, the CI Clippy lint job (`.github/workflows/ci.yml`) uses two separate steps: (1) the required gate: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`, and (2) a separate bench-internals step: `cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings`. The `bench_internals` feature must NOT replace the primary gate — it runs as an additional step only.

Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-26T15:15:32.318Z
Learning: Applies to **/*.rs : Code must pass `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings` (excluding `rustc-dep-of-std` and `fuzz_exports` features)

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1531-1535
Timestamp: 2026-04-01T15:33:17.229Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` intentionally casts `abs_pos: usize` to `u32` (wrapping on streams > 4 GiB). This is safe by design: (1) the chain table slot is computed as `(pos_u32 as usize) & chain_mask`, so only the low bits matter and wrapping leaves the addressed slot unchanged; (2) `chain_candidates` filters any recovered `cur as usize` value against `history_abs_start` — a wrapped (small) value is always less than the current sliding-window start and is discarded. Do not flag this cast as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-01T06:49:34.233Z
Learning: In the `structured-zstd` codebase (`zstd/src/encoding/match_generator.rs`), `CompressionLevel::Default` uses the `DfastMatchGenerator` backend (with `pick_lazy_match` lazy evaluation), while `CompressionLevel::Better` uses the `HcMatchGenerator` (hash chain backend) which inserts positions before lazy evaluation inside its own `start_matching` loop. These two backends are independent; review comments about Dfast internals (e.g. position seeding order in `pick_lazy_match`) do not apply to the Better level's implementation path.

Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-26T15:15:32.318Z
Learning: Avoid suggesting architectural changes that diverge too far from upstream, as this is a fork of ruzstd (KillingSpark/zstd-rs)

Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-26T15:15:32.318Z
Learning: Review ONLY code within the PR's diff. For issues found outside the diff, suggest creating a separate issue.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/encoding/frame_compressor.rs:280-287
Timestamp: 2026-04-02T16:31:58.167Z
Learning: In `structured-zstd`, `CompressionLevel::Best` intentionally uses `MatcherBackend::HashChain` (not a binary-tree/DUBT backend) due to memory constraints: a BT matcher for a 16 MiB window requires ~128–256 MB of tables (C zstd level 11 uses chainLog=25 → 256 MB), which is impractical. Best is distinguished from Better by deeper HC search (32 vs 16 candidates), larger hash/chain tables (2M/1M vs 1M/512K), and higher target match length (128 vs 48), keeping memory under 20 MB. It achieves a proportional compression gain (~0.27% better than Better on corpus) comparable to C zstd's L7→L11 delta (~0.14%). A proper BT backend is deferred as a future optimization. Do not flag Best's use of HashChain as missing a BT implementation.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-01T06:49:40.012Z
Learning: In the structured-zstd codebase (`zstd/src/encoding/match_generator.rs`), `CompressionLevel::Default` uses the `DfastMatchGenerator` backend (with `pick_lazy_match` for lazy evaluation), while `CompressionLevel::Better` uses the `HcMatchGenerator` (hash chain backend) which inserts positions before lazy evaluation inside its own `start_matching` loop. These are distinct backends with separate lazy-seeding paths.
✅ Actions performed

Reviews resumed.


Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/mod.rs`:
- Around line 169-177: The set_source_size_hint hook is currently ignored by the
built-in non-driver matchers; update NoDictionaryMatcher and TinyMatcher to
override set_source_size_hint(&mut self, size: u64) and store/apply the hint so
FrameCompressor and StreamingEncoder forwards are honored; specifically, add an
implementation in the NoDictionaryMatcher and TinyMatcher types that records the
provided size (and adjusts any internal sizing decisions or exposes it to the
match selection logic used by find_matches/next_match), ensuring the matchers no
longer drop hints forwarded by FrameCompressor and StreamingEncoder.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 32870fe6-8472-477f-90cb-e1168ac43493

📥 Commits

Reviewing files that changed from the base of the PR and between fec2b3f and dcbdbd1.

📒 Files selected for processing (2)
  • zstd/src/encoding/match_generator.rs
  • zstd/src/encoding/mod.rs

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment on lines 537 to +550
fn get_next_space(&mut self) -> Vec<u8> {
self.vec_pool.pop().unwrap_or_else(|| {
let mut space = alloc::vec![0; self.slice_size];
space.resize(space.capacity(), 0);
space
})
if let Some(mut space) = self.vec_pool.pop() {
if space.len() > self.slice_size {
space.truncate(self.slice_size);
}
if space.capacity() > self.slice_size {
space.shrink_to(self.slice_size);
}
if space.len() < self.slice_size {
space.resize(self.slice_size, 0);
}
return space;
}
alloc::vec![0; self.slice_size]
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_next_space() now calls shrink_to(self.slice_size) when reusing a buffer from vec_pool. This can trigger reallocations (and memcpy of up to slice_size bytes) in a hot path, especially when alternating between hinted small windows and normal windows, which risks a measurable compression throughput regression. Consider avoiding unconditional shrinking here (e.g., only shrink when capacity is far above slice_size, or perform capacity trimming once at reset() when the slice size decreases) so the pool continues to amortize allocations.

Copilot uses AI. Check for mistakes.
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 4, 2026

Closing to reopen as a clean squashed PR with rewritten history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: numeric compression levels (1-22) API

2 participants