Conversation
…leaf arrays Arrow's PrimitiveArray::slice() trims the ScalarBuffer to start at offset 0 but keeps the NullBuffer's internal bit-offset pointing into the original bitmap (e.g. bit-offset=3 for the second list element). The previous code called ArrayDataBuilder::null_bit_buffer(), which clears the pre-existing NullBuffer (self.nulls = None) and then build() reconstructs a new NullBuffer from the raw bytes using the *array* offset (0), not the bitmap's bit-offset. This causes validate_nulls() to count nulls at bits 0..N instead of the correct bit-offset..bit-offset+N range, triggering a null_count mismatch panic for any List<nullable T> whose inner values contain actual nulls. Fix: replace the .null_count() + .null_bit_buffer() calls with a single .nulls(Some(effective.clone())) call, which stores the NullBuffer directly without destroying its bit-offset. Closes PLT-1047 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes a panic in ArrowDigesterCore::traverse_leaf when hashing list arrays whose sliced child (leaf) arrays have a NullBuffer with a non-zero bit-offset (e.g., List<nullable T> where inner elements contain actual nulls). The change preserves the NullBuffer bit-offset when rebuilding ArrayData, and adds regression tests covering the reported scenarios.
Changes:
- Rebuild leaf
ArrayDatausingArrayDataBuilder::nulls(Some(NullBuffer))to preserveNullBufferbit-offset during hashing. - Add regression tests ensuring
hash_arraydoes not panic forList<nullable>andLargeList<nullable>with inner nulls. - Add correctness test asserting different inner-null placements produce different hashes.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/arrow_digester_core.rs |
Preserves NullBuffer bit-offset when rebuilding leaf arrays with combined nulls to prevent validation panics and incorrect null interpretation. |
tests/arrow_digester.rs |
Adds PLT-1047 regression + correctness tests for list arrays with nullable inner elements. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Regression test: `List<nullable>` with nulls at the outer level and inner level | ||
| /// must not panic and must produce a deterministic hash. | ||
| #[test] |
There was a problem hiding this comment.
The doc comment for this test says it covers nulls at both the outer list level and inner element level, but the constructed array has no outer-level nulls (no None list entries). Either add an outer None entry to the input data or adjust the comment to match what the test actually exercises.
There was a problem hiding this comment.
Good catch! The doc comment was indeed inaccurate — the array only contained inner-level nulls with no outer None list entries. I've updated the comment to say "nulls in the inner elements" so it accurately describes what the test exercises. No data changes were needed since the inner-null determinism scenario is the one we want to confirm.
The doc comment claimed the test covered outer-level nulls too, but the array had no None list entries. Corrected the comment to say "inner elements" only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review round summaryOne change made in response to Copilot's review:
All 49 integration tests + 17 unit tests continue to pass. |
Summary
Fixes PLT-1047 —
ArrowDigester::hash_arraypanics onList<nullable T>when inner elements contain actual null values.Root cause:
ArrayDataBuilder::null_bit_buffer()silently clears the builder's pre-existingnullsfield (self.nulls = None). Arrow'sPrimitiveArray::slice()produces arrays where theScalarBufferis trimmed to start at element 0 but theNullBufferretains its original bit-offset (e.g. 3 for the second list element). Whenbuild()then reconstructs theNullBufferfrom the raw bytes, it uses the array's data offset (0) rather than the bitmap's own bit-offset (3), sovalidate_nulls()counts nulls at bits 0..N instead of the correct range and panics with anull_countmismatch.Fix: Replace
.null_count(n).null_bit_buffer(Some(raw_buf))with.nulls(Some(effective.clone())). This stores theNullBufferdirectly without discarding its internal bit-offset, so both validation and downstream hash operations see the correct null positions.Test plan
list_with_nullable_inner_elements_does_not_panic— reproduces the exact panic from the issue reportlist_with_nullable_inner_elements_is_deterministic— ensures hash is stable across callslarge_list_with_nullable_inner_elements_does_not_panic— same fix applies toLargeListlist_nullable_inner_null_placement_affects_hash— different null positions yield different hashes (correctness)🤖 Generated with Claude Code