fix: use BitVec<u8, Lsb0> for platform-independent bit ordering by eywalker · Pull Request #7 · nauticalab/starfix

eywalker · 2026-03-07T02:59:25Z

Summary

Validity bitmap: Changed from default BitVec (BitVec<usize, Lsb0>) to BitVec<u8, Lsb0> — fixes platform-dependent word size (usize is 4 bytes on 32-bit, 8 bytes on 64-bit)
Boolean value packing: Changed from BitVec<u8, Msb0> to BitVec<u8, Lsb0> — consistent with Arrow's native bit ordering
Bitmap length serialization: Cast len() to u64 before to_le_bytes() for stable 8-byte output regardless of platform

All bit vectors now use BitVec<u8, Lsb0>, matching Arrow's native validity bitmap layout.

Test plan

All 32 lib tests pass (including updated byte-level boolean packing tests)
All 19 integration tests pass (golden hash values updated)
13 ignored tests (pre-existing, for known unfixed bugs) unchanged

🤖 Generated with Claude Code

…ordering The validity bitmap and boolean value packing previously used inconsistent BitVec parameterizations: - Validity bitmap: default BitVec (BitVec<usize, Lsb0>) — platform-dependent word size (4 bytes on 32-bit, 8 bytes on 64-bit) - Boolean packing: BitVec<u8, Msb0> — fixed size but non-Arrow bit order This changes both to BitVec<u8, Lsb0>, which: 1. Matches Arrow's native validity bitmap layout (LSB-first, u8 words) 2. Is platform-independent (u8 words are always 1 byte) 3. Uses consistent bit ordering throughout the codebase 4. Casts bitmap length to u64 before to_le_bytes() for stable serialization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR makes Arrow digest hashing deterministic across platforms by standardizing bitmap storage to BitVec<u8, Lsb0>, aligning boolean bit-packing with Arrow’s native LSB-first ordering, and ensuring bitmap length serialization is stable.

Changes:

Switch validity bitmaps from default BitVec (usize-backed) to BitVec<u8, Lsb0> for platform-independent raw storage.
Change boolean value packing from Msb0 to Lsb0 to match Arrow’s bit ordering.
Serialize bitmap bit-length as u64 before hashing to ensure consistent 8-byte output.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`src/arrow_digester_core.rs`	Standardizes bitmap representation and boolean packing; updates bitmap-length hashing for cross-platform stability.
`tests/arrow_digester.rs`	Updates golden digests to reflect the new stable bit ordering/serialization.

Comments suppressed due to low confidence (1)

src/arrow_digester_core.rs:338

Boolean hashing for non-nullable arrays only feeds the packed bytes into the digest. Because the last byte is padded, different boolean lengths can produce identical byte streams (e.g., len=1 and len=8 with the same first bit), leading to guaranteed hash collisions. Prefix the boolean bit-length (or element count) into the digest before updating with bit_vec.as_raw_slice() so arrays with different lengths cannot collide.

                        for i in 0..bool_array.len() {
                            bit_vec.push(bool_array.value(i));
                        }

                        data_digest.update(bit_vec.as_raw_slice());

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-07T03:02:47Z

+
+

There are two extra blank lines before the use crate::arrow_digester_core::{...}; import; please remove them to keep the test module formatting consistent.

Suggested change

Copilot · 2026-03-07T03:02:48Z

            DigestBufferType::Nullable(null_bit_digest, data_digest) => {
-                final_digest.update(null_bit_digest.len().to_le_bytes());
+                final_digest.update((null_bit_digest.len() as u64).to_le_bytes());
                for &word in null_bit_digest.as_raw_slice() {
                    final_digest.update(word.to_be_bytes());
                }


Now that the validity bitmap uses BitVec<u8, Lsb0>, iterating each byte and calling to_be_bytes() is redundant and adds per-byte update overhead. You can update the digest in one call with the raw slice (and drop the clippy::big_endian_bytes expect) to keep this path simpler and faster.

eywalker · 2026-03-07T03:51:23Z

This is superseded by PR #9

Copilot AI review requested due to automatic review settings March 7, 2026 02:59

Copilot started reviewing on behalf of eywalker March 7, 2026 02:59 View session

Copilot AI reviewed Mar 7, 2026

View reviewed changes

eywalker closed this Mar 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use BitVec<u8, Lsb0> for platform-independent bit ordering#7

fix: use BitVec<u8, Lsb0> for platform-independent bit ordering#7
eywalker wants to merge 1 commit intomainfrom
fix/bitvec-lsb0-platform-independent

eywalker commented Mar 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

eywalker commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eywalker commented Mar 7, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

eywalker commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants