fix(digester): preserve NullBuffer bit-offset when rebuilding sliced leaf arrays (PLT-1047) by kurodo3[bot] · Pull Request #12 · nauticalab/starfix

kurodo3 · 2026-03-20T07:28:50Z

Summary

Fixes PLT-1047 — ArrowDigester::hash_array panics on List<nullable T> when inner elements contain actual null values.

Root cause: ArrayDataBuilder::null_bit_buffer() silently clears the builder's pre-existing nulls field (self.nulls = None). Arrow's PrimitiveArray::slice() produces arrays where the ScalarBuffer is trimmed to start at element 0 but the NullBuffer retains its original bit-offset (e.g. 3 for the second list element). When build() then reconstructs the NullBuffer from the raw bytes, it uses the array's data offset (0) rather than the bitmap's own bit-offset (3), so validate_nulls() counts nulls at bits 0..N instead of the correct range and panics with a null_count mismatch.
Fix: Replace .null_count(n).null_bit_buffer(Some(raw_buf)) with .nulls(Some(effective.clone())). This stores the NullBuffer directly without discarding its internal bit-offset, so both validation and downstream hash operations see the correct null positions.

Test plan

New test list_with_nullable_inner_elements_does_not_panic — reproduces the exact panic from the issue report
New test list_with_nullable_inner_elements_is_deterministic — ensures hash is stable across calls
New test large_list_with_nullable_inner_elements_does_not_panic — same fix applies to LargeList
New test list_nullable_inner_null_placement_affects_hash — different null positions yield different hashes (correctness)
All 49 existing tests continue to pass

🤖 Generated with Claude Code

…leaf arrays Arrow's PrimitiveArray::slice() trims the ScalarBuffer to start at offset 0 but keeps the NullBuffer's internal bit-offset pointing into the original bitmap (e.g. bit-offset=3 for the second list element). The previous code called ArrayDataBuilder::null_bit_buffer(), which clears the pre-existing NullBuffer (self.nulls = None) and then build() reconstructs a new NullBuffer from the raw bytes using the *array* offset (0), not the bitmap's bit-offset. This causes validate_nulls() to count nulls at bits 0..N instead of the correct bit-offset..bit-offset+N range, triggering a null_count mismatch panic for any List<nullable T> whose inner values contain actual nulls. Fix: replace the .null_count() + .null_bit_buffer() calls with a single .nulls(Some(effective.clone())) call, which stores the NullBuffer directly without destroying its bit-offset. Closes PLT-1047 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Fixes a panic in ArrowDigesterCore::traverse_leaf when hashing list arrays whose sliced child (leaf) arrays have a NullBuffer with a non-zero bit-offset (e.g., List<nullable T> where inner elements contain actual nulls). The change preserves the NullBuffer bit-offset when rebuilding ArrayData, and adds regression tests covering the reported scenarios.

Changes:

Rebuild leaf ArrayData using ArrayDataBuilder::nulls(Some(NullBuffer)) to preserve NullBuffer bit-offset during hashing.
Add regression tests ensuring hash_array does not panic for List<nullable> and LargeList<nullable> with inner nulls.
Add correctness test asserting different inner-null placements produce different hashes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/arrow_digester_core.rs`	Preserves `NullBuffer` bit-offset when rebuilding leaf arrays with combined nulls to prevent validation panics and incorrect null interpretation.
`tests/arrow_digester.rs`	Adds PLT-1047 regression + correctness tests for list arrays with nullable inner elements.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T07:31:36Z

+    /// Regression test: `List<nullable>` with nulls at the outer level and inner level
+    /// must not panic and must produce a deterministic hash.
+    #[test]


The doc comment for this test says it covers nulls at both the outer list level and inner element level, but the constructed array has no outer-level nulls (no None list entries). Either add an outer None entry to the input data or adjust the comment to match what the test actually exercises.

Good catch! The doc comment was indeed inaccurate — the array only contained inner-level nulls with no outer None list entries. I've updated the comment to say "nulls in the inner elements" so it accurately describes what the test exercises. No data changes were needed since the inner-null determinism scenario is the one we want to confirm.

The doc comment claimed the test covered outer-level nulls too, but the array had no None list entries. Corrected the comment to say "inner elements" only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kurodo3 · 2026-03-20T07:34:52Z

Review round summary

One change made in response to Copilot's review:

tests/arrow_digester.rs — corrected doc comment on list_with_nullable_inner_elements_is_deterministic: The original comment said the test covered "nulls at the outer level and inner level", but the array only contained inner nulls (no None list entries). Updated the comment to accurately say "nulls in the inner elements". No test data or assertions changed.

All 49 integration tests + 17 unit tests continue to pass.

eywalker requested a review from Copilot March 20, 2026 07:29

Copilot started reviewing on behalf of eywalker March 20, 2026 07:29 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

test: fix misleading doc comment on is_deterministic test

a41ba6a

The doc comment claimed the test covered outer-level nulls too, but the array had no None list entries. Corrected the comment to say "inner elements" only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

eywalker merged commit e401599 into main Mar 20, 2026
5 checks passed

eywalker deleted the eywalker/plt-1047-bug-rust-starfix-panics-on-hash_array-with-listnullable-leaf branch March 20, 2026 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(digester): preserve NullBuffer bit-offset when rebuilding sliced leaf arrays (PLT-1047)#12

fix(digester): preserve NullBuffer bit-offset when rebuilding sliced leaf arrays (PLT-1047)#12
eywalker merged 2 commits intomainfrom
eywalker/plt-1047-bug-rust-starfix-panics-on-hash_array-with-listnullable-leaf

kurodo3 Bot commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

kurodo3 Bot Mar 20, 2026

Uh oh!

kurodo3 Bot commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kurodo3 Bot commented Mar 20, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

kurodo3 Bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

kurodo3 Bot commented Mar 20, 2026

Review round summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants