Skip to content

fix(digester): preserve NullBuffer bit-offset when rebuilding sliced leaf arrays (PLT-1047)#12

Merged
eywalker merged 2 commits intomainfrom
eywalker/plt-1047-bug-rust-starfix-panics-on-hash_array-with-listnullable-leaf
Mar 20, 2026
Merged

fix(digester): preserve NullBuffer bit-offset when rebuilding sliced leaf arrays (PLT-1047)#12
eywalker merged 2 commits intomainfrom
eywalker/plt-1047-bug-rust-starfix-panics-on-hash_array-with-listnullable-leaf

Conversation

@kurodo3
Copy link
Copy Markdown
Contributor

@kurodo3 kurodo3 Bot commented Mar 20, 2026

Summary

Fixes PLT-1047 — ArrowDigester::hash_array panics on List<nullable T> when inner elements contain actual null values.

  • Root cause: ArrayDataBuilder::null_bit_buffer() silently clears the builder's pre-existing nulls field (self.nulls = None). Arrow's PrimitiveArray::slice() produces arrays where the ScalarBuffer is trimmed to start at element 0 but the NullBuffer retains its original bit-offset (e.g. 3 for the second list element). When build() then reconstructs the NullBuffer from the raw bytes, it uses the array's data offset (0) rather than the bitmap's own bit-offset (3), so validate_nulls() counts nulls at bits 0..N instead of the correct range and panics with a null_count mismatch.

  • Fix: Replace .null_count(n).null_bit_buffer(Some(raw_buf)) with .nulls(Some(effective.clone())). This stores the NullBuffer directly without discarding its internal bit-offset, so both validation and downstream hash operations see the correct null positions.

Test plan

  • New test list_with_nullable_inner_elements_does_not_panic — reproduces the exact panic from the issue report
  • New test list_with_nullable_inner_elements_is_deterministic — ensures hash is stable across calls
  • New test large_list_with_nullable_inner_elements_does_not_panic — same fix applies to LargeList
  • New test list_nullable_inner_null_placement_affects_hash — different null positions yield different hashes (correctness)
  • All 49 existing tests continue to pass

🤖 Generated with Claude Code

…leaf arrays

Arrow's PrimitiveArray::slice() trims the ScalarBuffer to start at
offset 0 but keeps the NullBuffer's internal bit-offset pointing into
the original bitmap (e.g. bit-offset=3 for the second list element).

The previous code called ArrayDataBuilder::null_bit_buffer(), which
clears the pre-existing NullBuffer (self.nulls = None) and then
build() reconstructs a new NullBuffer from the raw bytes using the
*array* offset (0), not the bitmap's bit-offset.  This causes
validate_nulls() to count nulls at bits 0..N instead of the correct
bit-offset..bit-offset+N range, triggering a null_count mismatch
panic for any List<nullable T> whose inner values contain actual nulls.

Fix: replace the .null_count() + .null_bit_buffer() calls with a
single .nulls(Some(effective.clone())) call, which stores the
NullBuffer directly without destroying its bit-offset.

Closes PLT-1047

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a panic in ArrowDigesterCore::traverse_leaf when hashing list arrays whose sliced child (leaf) arrays have a NullBuffer with a non-zero bit-offset (e.g., List<nullable T> where inner elements contain actual nulls). The change preserves the NullBuffer bit-offset when rebuilding ArrayData, and adds regression tests covering the reported scenarios.

Changes:

  • Rebuild leaf ArrayData using ArrayDataBuilder::nulls(Some(NullBuffer)) to preserve NullBuffer bit-offset during hashing.
  • Add regression tests ensuring hash_array does not panic for List<nullable> and LargeList<nullable> with inner nulls.
  • Add correctness test asserting different inner-null placements produce different hashes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/arrow_digester_core.rs Preserves NullBuffer bit-offset when rebuilding leaf arrays with combined nulls to prevent validation panics and incorrect null interpretation.
tests/arrow_digester.rs Adds PLT-1047 regression + correctness tests for list arrays with nullable inner elements.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/arrow_digester.rs Outdated
Comment on lines +1650 to +1652
/// Regression test: `List<nullable>` with nulls at the outer level and inner level
/// must not panic and must produce a deterministic hash.
#[test]
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment for this test says it covers nulls at both the outer list level and inner element level, but the constructed array has no outer-level nulls (no None list entries). Either add an outer None entry to the input data or adjust the comment to match what the test actually exercises.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! The doc comment was indeed inaccurate — the array only contained inner-level nulls with no outer None list entries. I've updated the comment to say "nulls in the inner elements" so it accurately describes what the test exercises. No data changes were needed since the inner-null determinism scenario is the one we want to confirm.

The doc comment claimed the test covered outer-level nulls too, but
the array had no None list entries. Corrected the comment to say
"inner elements" only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kurodo3
Copy link
Copy Markdown
Contributor Author

kurodo3 Bot commented Mar 20, 2026

Review round summary

One change made in response to Copilot's review:

  • tests/arrow_digester.rs — corrected doc comment on list_with_nullable_inner_elements_is_deterministic: The original comment said the test covered "nulls at the outer level and inner level", but the array only contained inner nulls (no None list entries). Updated the comment to accurately say "nulls in the inner elements". No test data or assertions changed.

All 49 integration tests + 17 unit tests continue to pass.

@eywalker eywalker merged commit e401599 into main Mar 20, 2026
5 checks passed
@eywalker eywalker deleted the eywalker/plt-1047-bug-rust-starfix-panics-on-hash_array-with-listnullable-leaf branch March 20, 2026 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants