diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..44548be
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,14 @@
+# Project Instructions
+
+## Formatting
+
+Always run `cargo fmt` before committing. Formatting is enforced in CI via GitHub Actions.
+
+## Test-Driven Development
+
+When implementing new features or fixing bugs:
+
+1. Write tests first that check the desired behavior.
+2. Verify the new tests fail (confirming they catch the issue / check the right thing).
+3. Implement the fix or feature.
+4. Verify all previously failing tests now pass.
diff --git a/docs/byte-layout-spec.md b/docs/byte-layout-spec.md
new file mode 100644
index 0000000..40abf27
--- /dev/null
+++ b/docs/byte-layout-spec.md
@@ -0,0 +1,995 @@
+# Starfix Byte Layout Specification
+
+This document describes the **exact byte-level serialization** used by Starfix to compute deterministic hashes of Apache Arrow schemas and record batches. Every byte fed into SHA-256 is specified here, making it possible to implement a compatible hasher in any language.
+
+All multi-byte integers use **little-endian** byte order unless explicitly stated otherwise.
+
+---
+
+## 1. Output Format
+
+Every Starfix hash is **35 bytes**:
+
+```
+[version: 3 bytes] [SHA-256 digest: 32 bytes]
+```
+
+The version prefix is currently `0x00 0x00 0x01` (version 0.0.1).
+
+When displayed as hex, a hash looks like:
+
+```
+000001 <64 hex chars of SHA-256>
+```
+
+---
+
+## 2. Schema Serialization
+
+### 2.1 Canonical JSON String
+
+The schema is serialized as a **compact JSON string** (no whitespace) of an object where:
+
+- **Keys** are field names, sorted alphabetically (via `BTreeMap`).
+- **Values** are objects with keys `"data_type"` and `"nullable"`, with JSON keys sorted alphabetically within every nested object (recursively).
+
+Because all JSON object keys are sorted recursively, the key order is always `"data_type"` before `"nullable"` (and `"data_type"` before `"name"` before `"nullable"` for struct children).
+
+#### Type Canonicalization
+
+Before serialization, these logical equivalence classes are collapsed:
+
+| Arrow type(s)              | Canonical JSON form           |
+|----------------------------|-------------------------------|
+| `Binary`, `LargeBinary`   | `"LargeBinary"`               |
+| `Utf8`, `LargeUtf8`       | `"LargeUtf8"`                 |
+| `List(f)`, `LargeList(f)` | `{"LargeList": <element>}`    |
+| `Dictionary(k, v)`        | canonical form of `v`         |
+
+#### Nested Type Serialization
+
+**Struct fields** are serialized as:
+```json
+{"Struct": [<array of child objects sorted by "name">]}
+```
+Each child object: `{"data_type": ..., "name": "<field_name>", "nullable": <bool>}`.
+
+**List / LargeList elements** are serialized as:
+```json
+{"LargeList": {"data_type": ..., "nullable": <bool>}}
+```
+Note: the Arrow-internal field name (typically `"item"`) is **omitted** — only `data_type` and `nullable` are included.
+
+**Primitive types** use Arrow's built-in serde:
+- `"Int32"`, `"Boolean"`, `"Float64"`, `"LargeBinary"`, `"LargeUtf8"`, etc.
+- `{"Decimal128": [38, 5]}`, `{"Time32": "Second"}`, etc.
+
+### 2.2 Schema Digest
+
+```
+schema_digest = SHA-256(canonical_json_string_bytes)
+```
+
+The UTF-8 bytes of the JSON string are fed directly into SHA-256. The result is 32 bytes.
+
+### 2.3 Concrete Example
+
+Schema: `{name: LargeUtf8 nullable, age: Int32 non-nullable}`
+
+Canonical JSON string (compact, keys sorted):
+```
+{"age":{"data_type":"Int32","nullable":false},"name":{"data_type":"LargeUtf8","nullable":true}}
+```
+
+Note: `"age"` comes before `"name"` alphabetically, and `"data_type"` comes before `"nullable"`.
+
+```
+schema_digest = SHA-256(b'{"age":{"data_type":"Int32","nullable":false},"name":{"data_type":"LargeUtf8","nullable":true}}')
+```
+
+---
+
+## 3. Field Data Serialization
+
+The schema is recursively decomposed into a `BTreeMap` of entries. **Leaf fields** and **list intermediate nodes** get their own entries. **Struct fields are transparent** — they do not create entries themselves; instead, their null validity is AND-propagated to descendant entries, and their children are recursively traversed.
+
+Each entry has a **digest buffer** containing up to three **optional** components:
+
+| Component | Present when | Purpose |
+|-----------|-------------|---------|
+| `null_bits` (BitVec) | field is nullable | Tracks which elements are valid vs null |
+| `structural` (SHA-256) | entry is a list type (`List` or `LargeList`) | Accumulates element counts (structure) |
+| `data` (SHA-256) | leaf fields and list-leaf entries | Accumulates leaf data bytes |
+
+There are four entry types:
+
+| Entry type | `null_bits` | `structural` | `data` | Example |
+|------------|:-----------:|:------------:|:------:|---------|
+| **data-only** | — | — | yes | Non-nullable leaf field (e.g., `Int32`) |
+| **validity + data** | yes | — | yes | Nullable leaf field |
+| **validity-only** | yes | — | — | Nullable parent whose descendants have their own entries |
+| **structural-only** | — | yes | — | Non-nullable list whose value type is a struct or nested list |
+| **list_leaf** | optional | yes | yes | List whose value type is a leaf (e.g., `List<Int32>`) |
+
+**Naming convention**: Struct adds `/fieldname` to the path. List adds a trailing `/`. Nested lists add `//`, etc.
+
+This separation of structural information from leaf data ensures that list element boundaries are hashed independently from the values they contain. For example, `[[1,2],[3]]` and `[[1],[2,3]]` differ in their structural digest (element counts `[2,1]` vs `[1,2]`) even though their leaf data digest is identical (`[1,2,3]`).
+
+### 3.1 Fixed-Size Types
+
+**Types**: `Int8`, `UInt8`, `Int16`, `UInt16`, `Int32`, `UInt32`, `Int64`, `UInt64`, `Float16`, `Float32`, `Float64`, `Date32`, `Date64`, `Time32(*)`, `Time64(*)`, `Decimal32`, `Decimal64`, `Decimal128`, `Decimal256`, `FixedSizeBinary(n)`.
+
+| Type | Bytes per element |
+|------|-------------------|
+| Int8 / UInt8 | 1 |
+| Int16 / UInt16 / Float16 | 2 |
+| Int32 / UInt32 / Float32 / Date32 / Decimal32 / Time32 | 4 |
+| Int64 / UInt64 / Float64 / Date64 / Decimal64 / Time64 | 8 |
+| Decimal128 | 16 |
+| Decimal256 | 32 |
+| FixedSizeBinary(n) | n |
+
+**Non-nullable path**: The entire contiguous byte buffer (all elements concatenated, little-endian) is fed into the data digest in a single update.
+
+**Nullable path**:
+1. For each element `i`, push `is_valid(i)` (true=1, false=0) into the validity `BitVec`.
+2. For each **valid** element, feed its little-endian bytes into the data digest.
+3. **Null elements are skipped entirely** — no data bytes are fed.
+
+If a nullable field has no actual nulls (null buffer absent), all elements are marked valid and the entire buffer is fed in one update (same as non-nullable data path).
+
+### 3.2 Boolean Type
+
+Boolean values are **bit-packed** using **LSB-first** (`Lsb0`) ordering into bytes.
+
+**Non-nullable**: All values are packed sequentially into a `BitVec<u8, Lsb0>`, then the raw bytes are fed into the data digest.
+
+**Nullable**:
+1. Extend the validity `BitVec` as usual.
+2. Only **valid** values are packed (nulls are skipped).
+3. The packed bytes are fed into the data digest.
+
+**Example**: `[true, NULL, false, true]` (nullable, 4 elements)
+- Validity bits: `[1, 0, 1, 1]`
+- Data bits (valid only): `[true, false, true]` → Lsb0 packed: `00000_1_0_1` = `0x05`
+- Bytes fed to data digest: `[0x05]`
+
+### 3.3 Variable-Length Types (Binary, String)
+
+**Types**: `Binary`, `LargeBinary`, `Utf8`, `LargeUtf8`.
+
+Each element is serialized as:
+```
+[length as u64 little-endian: 8 bytes] [raw bytes: length bytes]
+```
+
+The length prefix is **always `u64`** (8 bytes, little-endian) regardless of the Arrow offset type.
+
+**Non-nullable**: For each element, feed `(len as u64).to_le_bytes()` then the raw bytes.
+
+**Nullable**:
+1. Extend the validity `BitVec`.
+2. For valid elements: feed length prefix + raw bytes.
+3. For null elements: **skip entirely** — no bytes fed to data digest.
+
+### 3.4 List Types (Record-Batch Path)
+
+**Types**: `List(field)`, `LargeList(field)`.
+
+List columns are **recursively decomposed** into separate BTreeMap entries. A list creates an intermediate entry at `path/` (path + delimiter). The value type is then recursively traversed to create further entries.
+
+**Decomposition by value type:**
+
+- **`List<leaf>`** (e.g., `List<Int32>`): The entry at `path/` is a **list-leaf** with both structural and data digests. List lengths go to structural; leaf values go to data.
+- **`List<Struct<...>>`**: The entry at `path/` is **structural-only** (list lengths). The struct is transparent, and each struct child creates its own entry at `path//childname`.
+- **`List<List<...>>`**: The entry at `path/` is structural-only. The inner list creates another entry at `path//`, and so on recursively.
+
+**Nullable list columns**: The column-level entry at `path` (without trailing `/`) is **validity-only**, recording which rows are null vs valid. Null list elements are not traversed — no structural or data bytes are written for them.
+
+**Traversal**: For each non-null list element, write the sub-array length (u64 LE) to the structural digest at `path/`, then recurse into the sub-array using the value type.
+
+#### Concrete Example: Structural vs Leaf Separation
+
+For `LargeList<Int32>` (non-nullable) with data `[[1,2],[3]]`:
+
+The single entry at `col/` is a list-leaf:
+
+```
+structural digest receives:
+    02 00 00 00 00 00 00 00     (element 0: 2 items, u64 LE)
+    01 00 00 00 00 00 00 00     (element 1: 1 item, u64 LE)
+
+data digest receives:
+    01 00 00 00                  (1 as i32 LE)
+    02 00 00 00                  (2 as i32 LE)
+    03 00 00 00                  (3 as i32 LE)
+```
+
+Compare with `[[1],[2,3]]`: same data digest but different structural digest — so the final hashes differ.
+
+### 3.5 Struct Types (Record-Batch Path)
+
+Struct fields are **transparent** in the record-batch path — they do not create a BTreeMap entry. Instead:
+
+1. **Children are traversed** in alphabetical order by field name.
+2. **Struct-level nulls are AND-propagated** to all descendant entries. If a struct row is null, none of its children's data is hashed for that row, and the null is reflected in each descendant's effective validity.
+3. Each child is recursively decomposed (leaf → data entry, list → structural entry, nested struct → recurse further).
+
+**Example**: A struct field `address` with children `city` (LargeUtf8) and `zip` (Int32) creates two leaf entries: `address/city` and `address/zip`. No entry exists for `address` itself.
+
+### 3.6 Dictionary-Encoded Arrays
+
+Dictionary arrays are **resolved to their plain equivalent** before hashing. The dictionary is unpacked so that the data stream is identical to a non-dictionary array with the same logical values.
+
+---
+
+## 4. Field Digest Finalization
+
+After all record batches have been fed, each entry's digest buffer is finalized and fed into the **final combining digest**. Each entry may have up to three optional components, written in this fixed order (skipping absent components):
+
+```
+1. null_bits    (if present — nullable entries only)
+2. structural   (if present — list entries only)
+3. data         (if present — leaf and list-leaf entries only)
+```
+
+### 4.1 Data-Only Entry
+
+```
+final_digest.update( SHA-256(data_bytes).finalize() )    // 32 bytes
+```
+
+### 4.2 Validity + Data Entry (Nullable Leaf)
+
+```
+final_digest.update( bit_count.to_le_bytes() )           // 8 bytes (u64 LE)
+for each word in validity_bitvec.as_raw_slice():          // each word is u8 (1 byte)
+    final_digest.update( word.to_le_bytes() )             // 1 byte per word (u8, LE is trivial)
+final_digest.update( SHA-256(data_bytes).finalize() )     // 32 bytes
+```
+
+### 4.3 Validity-Only Entry
+
+```
+final_digest.update( bit_count.to_le_bytes() )           // 8 bytes (u64 LE)
+for each word in validity_bitvec.as_raw_slice():
+    final_digest.update( word.to_le_bytes() )             // 1 byte per word (u8)
+```
+
+No structural or data digest is written.
+
+### 4.4 Structural-Only Entry
+
+```
+final_digest.update( SHA-256(structural_bytes).finalize() )   // 32 bytes (element counts)
+```
+
+### 4.5 List-Leaf Entry (Structural + Data)
+
+```
+final_digest.update( SHA-256(structural_bytes).finalize() )   // 32 bytes (element counts)
+final_digest.update( SHA-256(data_bytes).finalize() )          // 32 bytes (leaf values)
+```
+
+If nullable, prepend null_bits before structural:
+
+```
+final_digest.update( bit_count.to_le_bytes() )                // 8 bytes (u64 LE)
+for each word in validity_bitvec.as_raw_slice():
+    final_digest.update( word.to_le_bytes() )                  // 1 byte per word (u8)
+final_digest.update( SHA-256(structural_bytes).finalize() )    // 32 bytes
+final_digest.update( SHA-256(data_bytes).finalize() )          // 32 bytes
+```
+
+**Validity BitVec details** (applies to all entries with `null_bits`):
+- Storage type: `u8` (1 byte per word).
+- Bit order: `Lsb0` (least significant bit first within each word).
+- `bit_count` = total number of elements (valid + null), serialized as `u64` little-endian (8 bytes).
+- Each storage word is serialized as `u8` little-endian (trivially 1 byte).
+- The last word may have unused high bits (zero-padded).
+
+---
+
+## 5. Final Combining Digest
+
+The final hash is computed by feeding into a fresh SHA-256:
+
+```
+final_digest = SHA-256()
+
+// 1. Schema digest (32 bytes)
+final_digest.update( schema_digest )
+
+// 2. Field digests in alphabetical order of field path
+for field_path in sorted(field_paths):
+    finalize field's DigestBufferType into final_digest (see Section 4)
+
+raw_hash = final_digest.finalize()    // 32 bytes
+output = [0x00, 0x00, 0x01] ++ raw_hash   // 35 bytes
+```
+
+---
+
+## 6. `hash_array` API
+
+The `hash_array` function hashes a single array (without a schema context). It uses the **same recursive decomposition** as the record-batch path, ensuring consistent hashing regardless of which API is used:
+
+```
+final_digest = SHA-256()
+
+// 1. Type metadata (canonical JSON string)
+canonical_type = data_type_to_value(effective_data_type)
+json_string = JSON.serialize(canonical_type)     // compact, keys sorted
+final_digest.update( json_string.as_bytes() )
+
+// 2. Build BTreeMap entries from the type tree (same as record-batch path)
+fields = extract_type_entries(effective_data_type, nullable, root_path="")
+
+// 3. Traverse and populate entries
+traverse_and_update(effective_data_type, nullable, effective_array, "", fields)
+
+// 4. Finalize all entries into the digest (same order as record-batch finalize)
+for (_, entry) in fields:
+    finalize_digest(final_digest, entry)   // see Section 4
+
+raw_hash = final_digest.finalize()    // 32 bytes
+output = [0x00, 0x00, 0x01] ++ raw_hash   // 35 bytes
+```
+
+Dictionary arrays are resolved to their value type before hashing.
+
+---
+
+## 7. Worked Examples
+
+### Example A: Simple Two-Column Table
+
+**Schema**: `{age: Int32 non-nullable, name: LargeUtf8 nullable}`
+
+**Data** (1 record batch, 2 rows):
+
+| age | name    |
+|-----|---------|
+| 25  | "Alice" |
+| 30  | NULL    |
+
+#### Step 1: Schema Digest
+
+Canonical JSON (compact):
+```
+{"age":{"data_type":"Int32","nullable":false},"name":{"data_type":"LargeUtf8","nullable":true}}
+```
+
+```
+schema_digest = SHA-256("{"age":{"data_type":"Int32","nullable":false},"name":{"data_type":"LargeUtf8","nullable":true}}")
+```
+
+#### Step 2: Field "age" (Int32, non-nullable)
+
+Values: `[25, 30]`
+
+Little-endian bytes:
+- 25 as i32 LE: `19 00 00 00`
+- 30 as i32 LE: `1e 00 00 00`
+
+Data fed to digest: `19 00 00 00 1e 00 00 00` (8 bytes, one contiguous slice)
+
+```
+age_data_digest = SHA-256(0x19000000_1e000000)
+```
+
+Finalization into final_digest (non-nullable):
+```
+final_digest.update( age_data_digest.finalize() )   // 32 bytes
+```
+
+#### Step 3: Field "name" (LargeUtf8, nullable)
+
+Values: `["Alice", NULL]`
+
+**Validity bits** (Lsb0 in u8 words):
+- Element 0 ("Alice"): valid → bit = 1
+- Element 1 (NULL): null → bit = 0
+- BitVec contents: bits `[1, 0]`, bit_count = 2
+- As u8 (Lsb0): bit 0 = 1, bit 1 = 0 → binary `0000_0001` = 1
+- `as_raw_slice()` = `[1_u8]`
+
+Validity serialization:
+```
+bit_count LE:  02 00 00 00 00 00 00 00     (2 as u64 little-endian)
+word 0 LE:     01                           (1 as u8)
+```
+
+**Data bytes** (only valid elements):
+- "Alice": length 5 as u64 LE = `05 00 00 00 00 00 00 00`, then UTF-8 bytes `41 6c 69 63 65`
+- NULL: skipped entirely
+
+```
+name_data_digest = SHA-256(0x0500000000000000_416c696365)
+```
+
+Finalization into final_digest (nullable):
+```
+final_digest.update( 0x0200000000000000 )                   // bit count (u64 LE)
+final_digest.update( 0x01 )                                  // word 0 (u8)
+final_digest.update( name_data_digest.finalize() )           // 32 bytes
+```
+
+#### Step 4: Final Combination
+
+Fields in alphabetical order: `age`, then `name`.
+
+```
+final_digest = SHA-256()
+final_digest.update( schema_digest )                          // 32 bytes
+final_digest.update( age_data_digest.finalize() )             // 32 bytes (non-nullable)
+final_digest.update( 0x0200000000000000 )                     // name bit count (u64 LE)
+final_digest.update( 0x01 )                                   // name validity word (u8)
+final_digest.update( name_data_digest.finalize() )            // 32 bytes
+raw_hash = final_digest.finalize()
+output = 0x000001 ++ raw_hash
+```
+
+---
+
+### Example B: Boolean Array with Nulls (hash_array API)
+
+**Array**: `BooleanArray [true, NULL, false, true]` (nullable)
+
+#### Step 1: Type Metadata
+
+Canonical type JSON: `"Boolean"` (7 bytes as UTF-8)
+
+```
+final_digest.update(b'"Boolean"')
+```
+
+Note: `serde_json::to_string` of a JSON string value includes the surrounding quotes.
+
+#### Step 2: Data
+
+**Validity bits** (Lsb0 in u8):
+- `[1, 0, 1, 1]` → bits: b0=1, b1=0, b2=1, b3=1
+- As u8 (Lsb0): binary `0000_1101` = 13
+- `as_raw_slice()` = `[13_u8]`
+
+**Data bits** (Lsb0 packed, valid values only):
+- Valid values: `[true, false, true]` (3 values)
+- Lsb0 packing: bit0=true(1), bit1=false(0), bit2=true(1), bits3-7=0
+- Byte: `00000101` = `0x05`
+
+```
+data_digest = SHA-256(0x05)
+```
+
+#### Step 3: Finalization
+
+```
+final_digest = SHA-256()
+final_digest.update(b'"Boolean"')                             // type metadata
+final_digest.update( 0x0400000000000000 )                     // 4 bits (bit count as u64 LE)
+final_digest.update( 0x0D )                                   // 13 as u8
+final_digest.update( data_digest.finalize() )                 // 32 bytes
+raw_hash = final_digest.finalize()
+output = 0x000001 ++ raw_hash
+```
+
+---
+
+### Example C: Non-Nullable Int32 Array (hash_array API)
+
+**Array**: `Int32Array [1, 2, 3]` (non-nullable)
+
+#### Step 1: Type Metadata
+
+Canonical type JSON: `"Int32"` (6 bytes: `22 49 6e 74 33 32 22`... wait, `"Int32"` is the JSON string `"Int32"` including quotes)
+
+Actually: `serde_json::to_string(&json!("Int32"))` produces `"\"Int32\""`, but `data_type_to_value` for Int32 produces the JSON value `"Int32"` (a JSON string). Then `serde_json::to_string` of that JSON string value produces `"\"Int32\""` — the 7-byte string `"Int32"` with quotes.
+
+```
+final_digest.update(b'"Int32"')     // 7 bytes: 22 49 6e 74 33 32 22
+```
+
+#### Step 2: Data
+
+Values as i32 LE bytes:
+- 1: `01 00 00 00`
+- 2: `02 00 00 00`
+- 3: `03 00 00 00`
+
+Entire buffer fed as one slice: `01 00 00 00 02 00 00 00 03 00 00 00` (12 bytes)
+
+```
+data_digest = SHA-256(0x010000000200000003000000)
+```
+
+#### Step 3: Finalization (non-nullable)
+
+```
+final_digest = SHA-256()
+final_digest.update(b'"Int32"')                               // 7 bytes
+final_digest.update( data_digest.finalize() )                 // 32 bytes
+raw_hash = final_digest.finalize()
+output = 0x000001 ++ raw_hash
+```
+
+---
+
+### Example D: Binary Array (hash_array API)
+
+**Array**: `BinaryArray [b"hi", b""]` (non-nullable)
+
+#### Step 1: Type Metadata
+
+`Binary` is canonicalized to `LargeBinary`.
+
+```
+final_digest.update(b'"LargeBinary"')      // 13 bytes
+```
+
+#### Step 2: Data
+
+Each element: `[u64 LE length] [raw bytes]`
+
+- `b"hi"`: length 2 → `02 00 00 00 00 00 00 00` + `68 69`
+- `b""`: length 0 → `00 00 00 00 00 00 00 00` (no raw bytes)
+
+```
+data_digest = SHA-256(0x0200000000000000_6869_0000000000000000)
+```
+
+#### Step 3: Finalization (non-nullable)
+
+```
+final_digest = SHA-256()
+final_digest.update(b'"LargeBinary"')
+final_digest.update( data_digest.finalize() )
+raw_hash = final_digest.finalize()
+output = 0x000001 ++ raw_hash
+```
+
+---
+
+### Example E: Column-Order Independence
+
+Two record batches with the same logical data but different column orders must produce identical hashes.
+
+**Batch 1** (columns: x, y):
+```
+Schema: {x: Int32 non-nullable, y: Boolean nullable}
+x: [10]
+y: [true]
+```
+
+**Batch 2** (columns: y, x):
+```
+Schema: {y: Boolean nullable, x: Int32 non-nullable}
+y: [true]
+x: [10]
+```
+
+Both produce the same canonical schema JSON:
+```
+{"x":{"data_type":"Int32","nullable":false},"y":{"data_type":"Boolean","nullable":true}}
+```
+
+Both produce the same field digests (fields processed alphabetically: `x` then `y`):
+- Field `x`: `SHA-256(0x0a000000)` (10 as i32 LE)
+- Field `y`: validity `[1]` (1 bit, 1 word), data `0x01` (true packed Lsb0)
+
+Therefore `hash_record_batch(batch1) == hash_record_batch(batch2)`.
+
+---
+
+### Example F: Type Equivalence (Utf8 vs LargeUtf8)
+
+**Array 1**: `StringArray ["ab"]` (non-nullable, Arrow type `Utf8`)
+**Array 2**: `LargeStringArray ["ab"]` (non-nullable, Arrow type `LargeUtf8`)
+
+Both produce the same type metadata: `"LargeUtf8"` (after canonicalization).
+
+Both produce the same data bytes:
+```
+02 00 00 00 00 00 00 00   (length 2 as u64 LE)
+61 62                      ("ab" as UTF-8)
+```
+
+Therefore `hash_array(array1) == hash_array(array2)`.
+
+---
+
+### Example G: Nullable Int32 Array with Nulls (hash_array API)
+
+**Array**: `Int32Array [Some(42), None, Some(-7), Some(0)]` (nullable)
+
+#### Step 1: Type Metadata
+
+```
+final_digest.update(b'"Int32"')     // 7 bytes
+```
+
+#### Step 2: Data
+
+**Validity bits** (Lsb0 in u8):
+- `[1, 0, 1, 1]` → bits: b0=1, b1=0, b2=1, b3=1
+- As u8 (Lsb0): binary `0000_1101` = 13
+- bit_count = 4
+
+**Data bytes** (only valid elements):
+- 42 as i32 LE: `2a 00 00 00`
+- -7 as i32 LE: `f9 ff ff ff`
+-  0 as i32 LE: `00 00 00 00`
+
+```
+data_digest = SHA-256(0x2a000000_f9ffffff_00000000)
+```
+
+#### Step 3: Finalization (nullable)
+
+```
+final_digest = SHA-256()
+final_digest.update(b'"Int32"')                                 // type metadata
+final_digest.update( 0x0400000000000000 )                       // 4 bits (bit count as u64 LE)
+final_digest.update( 0x0D )                                     // 13 as u8
+final_digest.update( data_digest.finalize() )                   // 32 bytes
+raw_hash = final_digest.finalize()
+output = 0x000001 ++ raw_hash
+```
+
+---
+
+### Example H: Nullable String Array with Nulls (hash_array API)
+
+**Array**: `StringArray [Some("hello"), None, Some("world"), Some("")]` (nullable, Arrow type `Utf8`)
+
+#### Step 1: Type Metadata
+
+`Utf8` is canonicalized to `LargeUtf8`.
+
+```
+final_digest.update(b'"LargeUtf8"')     // 12 bytes
+```
+
+#### Step 2: Data
+
+**Validity bits** (Lsb0 in u8):
+- `[1, 0, 1, 1]` → 0b1101 = 13
+- bit_count = 4
+
+**Data bytes** (only valid elements, null skipped entirely):
+- `"hello"`: `05 00 00 00 00 00 00 00` (len=5 as u64 LE) + `68 65 6c 6c 6f`
+- `"world"`: `05 00 00 00 00 00 00 00` (len=5 as u64 LE) + `77 6f 72 6c 64`
+- `""`: `00 00 00 00 00 00 00 00` (len=0 as u64 LE, no raw bytes)
+
+```
+data_digest = SHA-256(len+"hello" + len+"world" + len+"")
+```
+
+#### Step 3: Finalization (nullable)
+
+```
+final_digest = SHA-256()
+final_digest.update(b'"LargeUtf8"')
+final_digest.update( 0x0400000000000000 )                       // bit_count=4 as u64 LE
+final_digest.update( 0x0D )                                     // validity=13 as u8
+final_digest.update( data_digest.finalize() )                   // 32 bytes
+raw_hash = final_digest.finalize()
+output = 0x000001 ++ raw_hash
+```
+
+---
+
+### Example I: Empty Table (no data, schema only)
+
+**Schema**: `{a: Int32 non-nullable, b: Boolean nullable}`
+
+When no record batches are fed (i.e., `finalize()` is called immediately after construction), the field digests still exist — they just contain no data.
+
+#### Schema Digest
+
+```
+schema_json = '{"a":{"data_type":"Int32","nullable":false},"b":{"data_type":"Boolean","nullable":true}}'
+schema_digest = SHA-256(schema_json)
+```
+
+#### Field "a" (Int32, non-nullable)
+
+No data was fed, so:
+```
+a_data_digest = SHA-256("")     // SHA-256 of empty input
+```
+
+#### Field "b" (Boolean, nullable)
+
+No data was fed:
+- `bit_count` = 0 (no elements, BitVec is empty)
+- `as_raw_slice()` = `[]` (no words)
+- Data digest = SHA-256 of empty input
+
+#### Final Combination
+
+```
+final_digest = SHA-256()
+final_digest.update( schema_digest )                             // 32 bytes
+final_digest.update( SHA-256("").finalize() )                    // field "a" (non-nullable, 32 bytes)
+final_digest.update( 0x0000000000000000 )                        // field "b" bit_count=0 (u64 LE)
+// no validity words (raw_slice is empty for 0-length BitVec)
+final_digest.update( SHA-256("").finalize() )                    // field "b" data (32 bytes)
+output = 0x000001 ++ final_digest.finalize()
+```
+
+---
+
+### Example J: Multi-Batch Streaming (batch-split independence)
+
+**Schema**: `{v: Int32 non-nullable}`
+
+Feeding two batches must produce the same hash as feeding one combined batch:
+
+- **Batch 1**: `v = [1, 2]`
+- **Batch 2**: `v = [3]`
+- **Combined**: `v = [1, 2, 3]`
+
+Because the internal SHA-256 state is incremental:
+```
+update(01 00 00 00  02 00 00 00)   // from batch 1
+update(03 00 00 00)                // from batch 2
+```
+is identical to:
+```
+update(01 00 00 00  02 00 00 00  03 00 00 00)   // single combined batch
+```
+
+#### Manual Computation
+
+```
+schema_json = '{"v":{"data_type":"Int32","nullable":false}}'
+schema_digest = SHA-256(schema_json)
+
+v_data_digest = SHA-256(0x010000000200000003000000)
+
+final_digest = SHA-256()
+final_digest.update( schema_digest )
+final_digest.update( v_data_digest.finalize() )
+output = 0x000001 ++ final_digest.finalize()
+```
+
+Therefore `hash(batch1 + batch2) == hash(combined)`.
+
+---
+
+### Example K: Struct Column in a Record Batch
+
+**Schema**: `{person: Struct<age: Int32 non-null, name: LargeUtf8 non-null> non-nullable}`
+
+**Data** (2 rows):
+
+| person.age | person.name |
+|------------|-------------|
+| 25         | "Alice"     |
+| 30         | "Bob"       |
+
+In the record-batch path, the struct is **decomposed into leaf fields**: `person/age` and `person/name`. Each is hashed independently.
+
+#### Step 1: Schema Digest
+
+Canonical JSON:
+```
+{"person":{"data_type":{"Struct":[{"data_type":"Int32","name":"age","nullable":false},{"data_type":"LargeUtf8","name":"name","nullable":false}]},"nullable":false}}
+```
+
+#### Step 2: Leaf field "person/age" (Int32, non-nullable)
+
+```
+age_data_digest = SHA-256(0x19000000_1e000000)    // [25, 30] as i32 LE
+```
+
+#### Step 3: Leaf field "person/name" (LargeUtf8, non-nullable)
+
+```
+name_data_digest = SHA-256(
+    0x0500000000000000 "Alice"    // len=5 u64 LE + UTF-8
+    0x0300000000000000 "Bob"      // len=3 u64 LE + UTF-8
+)
+```
+
+#### Step 4: Final Combination
+
+Fields alphabetically: `person/age`, `person/name`.
+
+```
+final_digest = SHA-256()
+final_digest.update( schema_digest )                     // 32 bytes
+final_digest.update( age_data_digest.finalize() )        // 32 bytes (non-nullable)
+final_digest.update( name_data_digest.finalize() )       // 32 bytes (non-nullable)
+output = 0x000001 ++ final_digest.finalize()
+```
+
+---
+
+### Example L: Struct Array via hash_array (non-nullable, decomposed)
+
+**Array**: `StructArray [{a: 1, b: true}, {a: 2, b: false}]`
+
+Children: `a: Int32 non-null`, `b: Boolean non-null`. Struct is non-nullable.
+
+`hash_array` uses the same recursive decomposition as the record-batch path. Struct is transparent — no BTreeMap entry for the struct itself. Children become separate entries.
+
+#### Step 1: Type Metadata
+
+Canonical type JSON (struct fields sorted alphabetically, keys sorted):
+```
+{"Struct":[{"data_type":"Int32","name":"a","nullable":false},{"data_type":"Boolean","name":"b","nullable":false}]}
+```
+
+#### Step 2: Decomposed Entries
+
+BTreeMap entries (sorted by key): `"a"`, `"b"`
+
+**Entry "a"** (Int32, non-nullable → data-only):
+```
+data_a = SHA-256(0x01000000_02000000)    // [1, 2] as i32 LE
+```
+
+**Entry "b"** (Boolean, non-nullable → data-only):
+```
+// [true, false] → Lsb0: bit0=1, bit1=0 → 0x01
+data_b = SHA-256(0x01)
+```
+
+#### Step 3: Finalization
+
+Each entry is non-nullable → no null_bits, no structural, just data.finalize().
+
+```
+final_digest = SHA-256()
+final_digest.update( type_json_bytes )       // type metadata
+final_digest.update( data_a.finalize() )     // entry "a": 32 bytes
+final_digest.update( data_b.finalize() )     // entry "b": 32 bytes
+output = 0x000001 ++ final_digest.finalize()
+```
+
+---
+
+### Example M: Nullable Struct Array via hash_array (struct-level nulls, decomposed)
+
+**Array**: `StructArray [Some({a: 10, b: "x"}), None, Some({a: 30, b: "z"})]`
+
+Children: `a: Int32 non-null`, `b: LargeUtf8 non-null`. Struct is **nullable**.
+
+Row 1 is a null struct. Struct is transparent — its null is AND-propagated to children for data hashing. Since children are non-nullable per their Field definitions, their entries have no null_bits — but null rows are skipped in the data stream.
+
+#### Step 1: Type Metadata
+
+```
+{"Struct":[{"data_type":"Int32","name":"a","nullable":false},{"data_type":"LargeUtf8","name":"b","nullable":false}]}
+```
+
+#### Step 2: Decomposed Entries (with struct-null propagation)
+
+BTreeMap entries (sorted by key): `"a"`, `"b"`
+
+**Entry "a"** (Int32, non-nullable → data-only):
+- Struct nulls propagated: rows 0, 2 valid → data: `[10, 30]`
+
+```
+data_a = SHA-256(0x0a000000_1e000000)     // [10, 30] as i32 LE
+```
+
+**Entry "b"** (LargeUtf8, non-nullable → data-only):
+- Struct nulls propagated: rows 0, 2 valid → data: `"x"`, `"z"`
+
+```
+data_b = SHA-256(
+    0x0100000000000000 "x"     // len=1 + "x"
+    0x0100000000000000 "z"     // len=1 + "z"
+)
+```
+
+#### Step 3: Finalization
+
+Each entry is non-nullable → no null_bits, no structural, just data.finalize().
+
+```
+final_digest = SHA-256()
+final_digest.update( type_json_bytes )       // type metadata
+final_digest.update( data_a.finalize() )     // entry "a": 32 bytes
+final_digest.update( data_b.finalize() )     // entry "b": 32 bytes
+output = 0x000001 ++ final_digest.finalize()
+```
+
+---
+
+### Example N: List-of-Struct in a Record Batch (Recursive Decomposition)
+
+**Schema**: `{items: LargeList<Struct<id: Int32 non-null, label: LargeUtf8 non-null>> nullable}`
+
+**Data** (2 rows):
+
+| items |
+|-------|
+| `[{id: 1, label: "a"}, {id: 2, label: "b"}]` |
+| `[{id: 3, label: "c"}]` |
+
+The list-of-struct column is **recursively decomposed** into four BTreeMap entries:
+
+| Path | Entry type | Components |
+|------|-----------|------------|
+| `items` | validity-only | null_bits: `[V, V]` (2 bits) |
+| `items/` | structural-only | list lengths: `[2, 1]` |
+| `items//id` | data-only | leaf values: `[1, 2, 3]` as i32 LE |
+| `items//label` | data-only | leaf values: `len+"a"`, `len+"b"`, `len+"c"` |
+
+Note the path naming: `items` (column) → `items/` (list adds `/`) → `items//id` (struct adds `/id`, producing `//` because parent ends in `/`).
+
+#### Step 1: Schema Digest
+
+Canonical JSON (element type omits Arrow-internal field name "item"):
+```
+{"items":{"data_type":{"LargeList":{"data_type":{"Struct":[{"data_type":"Int32","name":"id","nullable":false},{"data_type":"LargeUtf8","name":"label","nullable":false}]},"nullable":false}},"nullable":true}}
+```
+
+#### Step 2: Traversal
+
+The top-down recursive traversal processes each row:
+
+**Row 0** (valid list, 2 elements):
+- `items` entry: push `valid` to null_bits
+- `items/` entry: write `2_u64.to_le_bytes()` to structural
+- Recurse into sub-array `[{id:1, label:"a"}, {id:2, label:"b"}]`:
+  - Struct is transparent — recurse into children (sorted: "id", "label"):
+    - `items//id` entry: write `1_i32.to_le_bytes()`, `2_i32.to_le_bytes()` to data
+    - `items//label` entry: write `len+"a"`, `len+"b"` to data
+
+**Row 1** (valid list, 1 element):
+- `items` entry: push `valid` to null_bits
+- `items/` entry: write `1_u64.to_le_bytes()` to structural
+- Recurse into sub-array `[{id:3, label:"c"}]`:
+  - `items//id` entry: write `3_i32.to_le_bytes()` to data
+  - `items//label` entry: write `len+"c"` to data
+
+#### Step 3: Final Combination
+
+Entries are finalized in BTreeMap (alphabetical) order:
+
+```
+final_digest = SHA-256()
+final_digest.update( schema_digest )                              // 32 bytes
+
+// Entry "items" (validity-only)
+final_digest.update( 0x0200000000000000 )                         // bit_count=2 (u64 LE)
+final_digest.update( 0x03 )                                       // validity word: 0b11 = 3 (u8)
+
+// Entry "items/" (structural-only)
+items_structural = SHA-256(
+    0x0200000000000000                                            // row 0: 2 elements
+    0x0100000000000000                                            // row 1: 1 element
+)
+final_digest.update( items_structural.finalize() )                // 32 bytes
+
+// Entry "items//id" (data-only)
+id_data = SHA-256(
+    0x01000000                                                    // 1 as i32 LE
+    0x02000000                                                    // 2 as i32 LE
+    0x03000000                                                    // 3 as i32 LE
+)
+final_digest.update( id_data.finalize() )                         // 32 bytes
+
+// Entry "items//label" (data-only)
+label_data = SHA-256(
+    0x0100000000000000 0x61                                       // len=1 + "a"
+    0x0100000000000000 0x62                                       // len=1 + "b"
+    0x0100000000000000 0x63                                       // len=1 + "c"
+)
+final_digest.update( label_data.finalize() )                      // 32 bytes
+
+output = 0x000001 ++ final_digest.finalize()
+```
+
+---
+
+## 8. Platform Considerations
+
+- **Integer sizes**: All length prefixes use `u64` (8 bytes, LE). Validity bitmaps use `BitVec<u8, Lsb0>` (1 byte per word). Bit counts use `u64` (8 bytes, LE). Hashes are **platform-independent**.
+- **Byte order**: All values use little-endian. Validity words are `u8` (1 byte, so endianness is trivial). Bit counts use little-endian.
+- **Floating point**: IEEE 754 representation is hashed directly. `NaN` values with different bit patterns produce different hashes. `+0.0` and `-0.0` produce different hashes.
diff --git a/docs/design-spec.md b/docs/design-spec.md
index 5ad83c6..1f809b4 100644
--- a/docs/design-spec.md
+++ b/docs/design-spec.md
@@ -21,7 +21,8 @@ The hash algorithm is parameterized via Rust's `digest::Digest` trait. The publi
 |------|-----------|
 | **Logical equivalence** | Two Arrow structures represent the same data regardless of physical layout choices (encoding, column order, batch splits). |
 | **Validity bitmap** | A bit vector where `1` = valid, `0` = null, tracked per nullable field. |
-| **Data digest** | A running hash of the non-null data bytes for a single field. |
+| **Data digest** | A running hash of the non-null leaf data bytes for a single field. |
+| **Structural digest** | A running hash of element counts for list-type fields, separating structure from leaf data. |
 | **Schema digest** | A hash of the canonicalized JSON representation of the schema. |
 | **Field path** | A `/`-separated path for nested struct fields (e.g., `address/city`). |
 
@@ -69,35 +70,38 @@ Because the top-level is a `BTreeMap<String, Value>`, field names are automatica
 ```json
 {
   "age": {"data_type": "Int32", "nullable": false},
-  "name": {"data_type": "Utf8", "nullable": true}
+  "name": {"data_type": "LargeUtf8", "nullable": true}
 }
 ```
 
-### 4.2 Data Type Serialization
+### 4.2 Data Type Serialization (`data_type_to_value`)
+
+All data type serialization goes through `data_type_to_value`, which produces a canonical JSON representation. The output is recursively key-sorted via `sort_json_value` before returning.
 
 #### Primitive types
 Serialized using Arrow's built-in serde, producing strings like `"Int32"`, `"Boolean"`, `"Float64"`, or objects like `{"Decimal128": [38, 5]}`, `{"Time32": "Second"}`.
 
 #### Logical type equivalence classes
 
-For fully logical hashing, certain types that differ only in physical representation are canonicalized to a single form in the schema:
+Certain types that differ only in physical representation (offset width) are canonicalized to a single form:
 
 | Types in equivalence class | Canonical form in schema |
 |---|---|
 | `Binary`, `LargeBinary` | `"LargeBinary"` |
 | `Utf8`, `LargeUtf8` | `"LargeUtf8"` |
-| `List(field)`, `LargeList(field)` | `{"LargeList": <inner_field>}` |
+| `List(field)`, `LargeList(field)` | `{"LargeList": <element_type>}` |
+| `Dictionary(key_type, value_type)` | Recursive `data_type_to_value(value_type)` |
 
 The "large" variant is always the canonical form because it is the superset representation.
 
 #### Nested types
 
 - **Struct**: `{"Struct": [<sorted array of inner field objects>]}` — inner fields are **sorted alphabetically by field name** before serialization.
-- **List / LargeList**: `{"LargeList": <inner_field_object>}` (canonicalized to large variant).
-- **FixedSizeList**: `{"FixedSizeList": [<inner_field_object>, <size>]}`.
+- **List / LargeList**: `{"LargeList": <element_type_object>}` (canonicalized to large variant). The element type uses `element_type_to_value` which omits the Arrow-internal field name (e.g., `"item"`), including only `data_type` and `nullable`.
+- **FixedSizeList**: `{"FixedSizeList": [<element_type_object>, <size>]}`. Also uses `element_type_to_value` (no field name).
 - **Map**: `{"Map": [<inner_field_object>, <sorted>]}`.
 
-Each inner field object has the form:
+**Inner field object** (for struct children, map entries):
 ```json
 {
   "data_type": <recursive data_type>,
@@ -106,6 +110,14 @@ Each inner field object has the form:
 }
 ```
 
+**Element type object** (for list/fixed-size-list items):
+```json
+{
+  "data_type": <recursive data_type>,
+  "nullable": <bool>
+}
+```
+
 All JSON objects have their keys sorted recursively via `sort_json_value` to ensure deterministic serialization.
 
 ### 4.3 Schema Digest Computation
@@ -116,13 +128,33 @@ schema_digest = SHA256(canonical_json_string)
 
 ---
 
-## 5. Data Serialization (Byte Layout)
+## 5. DigestBufferType
+
+Each entry in the BTreeMap has a `DigestBufferType` struct with three **optional** components:
+
+```rust
+struct DigestBufferType<D: Digest> {
+    null_bits: Option<BitVec<u8, Lsb0>>,  // Present for nullable entries
+    structural: Option<D>,                  // Present for list-type entries
+    data: Option<D>,                        // Present for leaf and list-leaf entries
+}
+```
+
+- **`null_bits`**: Validity bitmap. Present for nullable fields, absent for non-nullable.
+- **`structural`**: A separate running digest for list element counts. Present for list-type entries. Separates structure (how elements are partitioned into lists) from leaf data.
+- **`data`**: The running digest for actual data bytes (leaf values). Present for leaf and list-leaf entries, absent for validity-only and structural-only entries.
+
+There are four entry types, constructed via dedicated constructors:
+- **`new_data_only(nullable)`**: Leaf field (e.g., `Int32`). Has `data`, optionally `null_bits`.
+- **`new_structural_only(nullable)`**: List intermediate node above a struct or nested list. Has `structural`, optionally `null_bits`.
+- **`new_list_leaf(nullable)`**: List whose value type is a leaf (e.g., `List<Int32>`). Has `structural` + `data`, optionally `null_bits`.
+- **`new_validity_only()`**: Nullable parent whose descendants have their own entries. Has `null_bits` only.
+
+---
 
-Each field is hashed independently. The field's digest buffer is one of:
-- `NonNullable(D)` — a single running digest for data bytes.
-- `Nullable(BitVec, D)` — a validity bitmap (`BitVec`) plus a running data digest.
+## 6. Data Serialization (Byte Layout)
 
-### 5.1 Fixed-Size Types
+### 6.1 Fixed-Size Types
 
 **Types:** `Int8`, `UInt8`, `Int16`, `UInt16`, `Int32`, `UInt32`, `Int64`, `UInt64`, `Float16`, `Float32`, `Float64`, `Date32`, `Date64`, `Time32(*)`, `Time64(*)`, `Decimal32`, `Decimal64`, `Decimal128`, `Decimal256`, `FixedSizeBinary(n)`.
 
@@ -138,18 +170,18 @@ Each field is hashed independently. The field's digest buffer is one of:
 | Decimal256 | 32 | Little-endian |
 | FixedSizeBinary(n) | n | Raw bytes |
 
-**Non-nullable path:** The entire buffer slice (accounting for offset) is fed into the digest in one call.
+**Non-nullable path:** The entire buffer slice (accounting for offset) is fed into the data digest in one call.
 
 **Nullable path:**
 1. Extend the validity bitmap with `is_valid(i)` for each element.
 2. For each valid element, feed its little-endian bytes into the data digest.
 3. Null elements are **skipped** — no data bytes are fed (null information is captured solely by the validity bitmap).
 
-### 5.2 Boolean Type
+### 6.2 Boolean Type
 
-Boolean values are **bit-packed** using MSB-first (`Msb0`) ordering into bytes.
+Boolean values are **bit-packed** using LSB-first (`Lsb0`) ordering with `u8` storage words into bytes via `BitVec<u8, Lsb0>`.
 
-**Non-nullable path:** All values are packed sequentially.
+**Non-nullable path:** All values are packed sequentially into a `BitVec<u8, Lsb0>`, and the raw backing bytes are fed into the data digest.
 
 **Nullable path:**
 1. Extend the validity bitmap.
@@ -158,9 +190,12 @@ Boolean values are **bit-packed** using MSB-first (`Msb0`) ordering into bytes.
 
 **Example:** `[true, NULL, false, true]` (nullable)
 - Validity bitmap: `[1, 0, 1, 1]`
-- Data bits (valid only): `[true, false, true]` → Msb0 packed: `1010_0000` = `0xA0`
+- Data bits (valid only): `[true, false, true]` → Lsb0 packed: bit0=1, bit1=0, bit2=1 → `0000_0101` = `0x05`
+
+**Example:** `[true, false, true]` (non-nullable)
+- Lsb0 packed: bit0=1, bit1=0, bit2=1 → `0000_0101` = `0x05`
 
-### 5.3 Variable-Length Types (Binary, String)
+### 6.3 Variable-Length Types (Binary, String)
 
 **Types:** `Binary`, `LargeBinary`, `Utf8`, `LargeUtf8`.
 
@@ -171,67 +206,67 @@ Each element is serialized as:
 
 The length prefix is **always u64** (8 bytes, little-endian) regardless of the offset type (`i32` for `Binary`/`Utf8`, `i64` for `LargeBinary`/`LargeUtf8`). This ensures cross-platform stability and logical equivalence between small/large variants.
 
-**Non-nullable path:** For each element, feed `len.to_le_bytes()` (u64) then the raw bytes.
+**Non-nullable path:** For each element, feed `(value.len() as u64).to_le_bytes()` then the raw bytes.
 
 **Nullable path:**
 1. Extend the validity bitmap.
 2. For valid elements: feed length prefix + raw bytes.
 3. For null elements: **skip entirely** — no sentinel bytes. Null information is captured by the validity bitmap.
 
-### 5.4 List Types
+### 6.4 List Types (Record-Batch Path)
 
 **Types:** `List(field)`, `LargeList(field)`.
 
-Each list element (a sub-array) is serialized as:
-```
-[sub-array length as u64 little-endian (8 bytes)] [recursive serialization of sub-array elements]
-```
+List columns are **recursively decomposed** into separate BTreeMap entries. A list creates an intermediate entry at `path/` (path + delimiter). The value type is then recursively traversed.
 
-The sub-array length prefix prevents collisions between differently-partitioned lists (e.g., `[[1,2],[3]]` vs `[[1],[2,3]]`).
+**Decomposition by value type:**
+- **`List<leaf>`** (e.g., `List<Int32>`): Entry at `path/` is a **list-leaf** with both structural and data digests.
+- **`List<Struct<...>>`**: Entry at `path/` is **structural-only**. The struct is transparent, and each struct child creates its own entry at `path//childname`.
+- **`List<List<...>>`**: Entry at `path/` is structural-only. The inner list creates another entry at `path//`.
 
-**Nullable path:** Same as other types — extend validity bitmap, skip null list entries.
+**Nullable list columns:** A **validity-only** entry is created at `path` (without trailing `/`), recording which rows are null vs valid. Null list elements are not traversed.
 
-The sub-array elements are hashed recursively using the same `array_digest_update` dispatch, so nested lists and nested structs within lists follow the same rules.
+**Traversal:** For each non-null list element, write the sub-array length (u64 LE) to the structural digest at `path/`, then recurse into the sub-array.
 
-### 5.5 Struct Types
+### 6.5 Struct Types (Record-Batch Path)
 
-Struct fields are **not hashed as a composite** — instead, each leaf field within the struct is extracted and hashed independently under its own field path (e.g., `address/city`, `address/zip`). The field paths are stored in a `BTreeMap`, so they are always processed in alphabetical order.
+Struct fields are **transparent** — they do not create a BTreeMap entry. Instead:
 
-This design means:
-- Struct field order in the Arrow schema does not affect the hash.
-- Each leaf field maintains its own independent validity bitmap and data digest.
+1. **Children are traversed** in alphabetical order by field name.
+2. **Struct-level nulls are AND-propagated** to all descendant entries via `combine_nulls`. If a struct row is null, none of its children's data is hashed for that row.
+3. Each child is recursively decomposed (leaf → data entry, list → structural entry, nested struct → recurse further).
 
-### 5.6 Dictionary-Encoded Arrays
+**Path naming:** Struct adds `/fieldname` to the path. Combined with list's trailing `/`, this produces paths like `items//id` (list `/` + struct `/id`).
 
-Dictionary-encoded arrays are **resolved to their plain equivalent** before hashing. The dictionary is unpacked so that the resulting data stream is identical to what a non-dictionary-encoded array with the same logical values would produce.
+### 6.6 Dictionary-Encoded Arrays
+
+Dictionary-encoded arrays are **resolved to their plain equivalent** before hashing. The dictionary is unpacked using Arrow's `cast` kernel so that the resulting data stream is identical to what a non-dictionary-encoded array with the same logical values would produce.
 
 This ensures that `DictionaryArray<Int32, Utf8>(indices=[0,1,0], dict=["a","b"])` produces the same hash as `StringArray(["a","b","a"])`.
 
 ---
 
-## 6. Final Digest Assembly
+## 7. Final Digest Assembly
 
-### 6.1 Field Digest Finalization
+### 7.1 Field Digest Finalization
 
-Each field's digest buffer is finalized and fed into the combined final digest:
+Each entry's `DigestBufferType` is finalized and fed into the combined final digest via `finalize_digest`. Each component is written only if present:
 
-**Non-nullable field:**
-```
-feed: SHA256_finalize(data_digest)    // 32 bytes
 ```
+// If nullable (null_bits is Some):
+feed: validity_bitmap_length as u64 LE    // 8 bytes (number of bits)
+feed: validity_bitmap raw bytes (LE)      // ceil(length/8) bytes (u8 words, to_le_bytes is identity for u8)
 
-**Nullable field:**
-```
-feed: validity_bitmap_length as u64 LE  // 8 bytes (number of bits)
-feed: validity_bitmap words (BE bytes)  // ceil(length/8) bytes, each u8 word in big-endian
-feed: SHA256_finalize(data_digest)      // 32 bytes
+// If list type (structural is Some):
+feed: SHA256_finalize(structural_digest)  // 32 bytes
+
+// If leaf/list-leaf (data is Some):
+feed: SHA256_finalize(data_digest)        // 32 bytes
 ```
 
-The validity bitmap is serialized as:
-1. The bit count (number of elements seen) as `u64` little-endian.
-2. The raw backing storage words, each converted to big-endian bytes.
+The validity bitmap uses `BitVec<u8, Lsb0>` storage. Each `u8` word is serialized via `to_le_bytes()` (identity for single-byte words). The bit count (not byte count) is written as the length prefix.
 
-### 6.2 Combined Final Digest
+### 7.2 Combined Final Digest
 
 ```
 final_digest = SHA256(
@@ -244,7 +279,7 @@ final_digest = SHA256(
 
 Fields are iterated from the `BTreeMap` which maintains alphabetical ordering by field path.
 
-### 6.3 Version Prefix
+### 7.3 Version Prefix
 
 The public `ArrowDigester` prepends a 3-byte version prefix to the final digest:
 
@@ -254,139 +289,86 @@ output = [0x00, 0x00, 0x01] || final_digest   // 3 + 32 = 35 bytes total
 
 ---
 
-## 7. Standalone `hash_array` Function
+## 8. Standalone `hash_array` Function
 
-`hash_array` hashes a single array without a full schema context. Its digest is:
+`hash_array` hashes a single array without a full schema context. It uses the **same recursive decomposition** as the record-batch path (`extract_type_entries` + `traverse_and_update`), ensuring consistent hashing regardless of which API is used.
 
 ```
 final = SHA256(
-    canonical_json(data_type)     // data type metadata
-    || finalized_field_digest     // nullable or non-nullable, same rules as above
+    serde_json::to_string(data_type_to_value(effective_type))   // canonical type JSON string
+    || for each BTreeMap entry: finalize_digest(entry)           // same decomposition as record-batch
 )
 ```
 
-The data type is serialized using the same `data_type_to_value` logic (with type canonicalization) and then `serde_json::to_string`.
+If the input is a dictionary array, it is first resolved to its plain value type via `cast`. The effective type is then serialized using `data_type_to_value` (with type canonicalization and recursive key sorting), converted to a JSON string, and fed into the digest before the decomposed field entries.
 
 ---
 
-## 8. Invariants and Guarantees
+## 9. Schema Equality in `update()`
+
+When `update(record_batch)` is called, the record batch's schema is compared against the digester's schema **logically** — both schemas are serialized via `serialized_schema()` (which uses `data_type_to_value` with type canonicalization) and the resulting strings are compared. This means:
+- Column order doesn't matter (both are sorted by `BTreeMap`).
+- `Utf8` vs `LargeUtf8`, `Binary` vs `LargeBinary`, `List` vs `LargeList` are treated as equivalent.
+- Dictionary types are canonicalized to their value types.
+
+---
+
+## 10. Invariants and Guarantees
 
 1. **Column-order independence:** Top-level fields are sorted alphabetically via `BTreeMap`.
-2. **Struct field-order independence:** Struct children are sorted by name during schema serialization and field extraction.
+2. **Struct field-order independence:** Struct children are sorted by name during schema serialization and during composite hashing in `array_digest_update`.
 3. **Batch-split independence:** Streaming `update()` calls produce the same hash as a single combined batch.
 4. **Encoding independence:** Dictionary-encoded arrays are resolved before hashing.
 5. **Physical type independence:** `Binary`/`LargeBinary`, `Utf8`/`LargeUtf8`, `List`/`LargeList` are canonicalized to their large variants in the schema and use identical data serialization.
-6. **Platform independence:** All length prefixes use `u64` (8 bytes LE), all numeric values use little-endian byte order.
+6. **Platform independence:** All length prefixes use `u64` (8 bytes LE), all numeric values use little-endian byte order, validity bitmaps use `BitVec<u8, Lsb0>` (u8-width words, not platform-dependent `usize`).
 7. **Null handling consistency:** Null values are tracked solely via the validity bitmap. No sentinel bytes are fed into the data digest for any type.
-8. **Non-null arrays with/without validity bitmap:** An array with all valid values produces the same data digest whether or not a validity bitmap is present (nulls simply mean bits are not pushed and values are not fed, and all-valid arrays feed the same bytes).
-
----
-
-## 9. Known Issues and Required Fixes
-
-The following issues have been identified in the current implementation that must be fixed to achieve the guarantees above:
-
-### 9.1 Struct Fields Not Sorted in Schema Serialization
-
-**File:** `arrow_digester_core.rs`, `data_type_to_value()` (line ~206)
-
-**Issue:** Struct inner fields are collected into a `Vec` in their original order. Two schemas with the same struct fields in different order will produce different schema hashes.
-
-**Fix:** Sort the fields iterator by field name before collecting into the Vec.
-
-### 9.2 `inner_field_to_value` Not Recursively Sorted
-
-**File:** `arrow_digester_core.rs`, `inner_field_to_value()` (line ~232)
-
-**Issue:** The JSON object produced by `serde_json::json!` has non-deterministic key order. While `sort_json_value` is applied at the top level in `serialized_schema`, it is NOT applied to the output of `data_type_to_value`/`inner_field_to_value`.
-
-**Fix:** Apply `sort_json_value` recursively in `data_type_to_value` before returning.
-
-### 9.3 Binary Length Prefix Uses Platform-Dependent `usize`
-
-**File:** `arrow_digester_core.rs`, `hash_binary_array()` (line ~518)
-
-**Issue:** `value.len().to_le_bytes()` produces 4 bytes on 32-bit and 8 bytes on 64-bit platforms.
-
-**Fix:** Cast to `u64` before calling `to_le_bytes()`: `(value.len() as u64).to_le_bytes()`.
-
-### 9.4 `NULL_BYTES` Sentinel in Binary/String Nullable Paths
-
-**File:** `arrow_digester_core.rs`, `hash_binary_array()` (line ~536), `hash_string_array()` (line ~579)
-
-**Issue:** Null values feed `b"NULL"` into the data digest, but `hash_fixed_size_array` skips nulls entirely. Since null information is already captured in the validity bitmap, the sentinel is redundant and inconsistent.
-
-**Fix:** Remove `data_digest.update(NULL_BYTES)` from the null branches. Skip null values entirely, matching the fixed-size type behavior.
-
-### 9.5 No Type Canonicalization for Binary/Utf8/List Variants
-
-**File:** `arrow_digester_core.rs`, `data_type_to_value()` and `serialized_schema()`
-
-**Issue:** `Binary` and `LargeBinary` serialize to different JSON strings, causing logically equivalent schemas to hash differently.
-
-**Fix:** In `data_type_to_value`, map `Binary` → `LargeBinary`, `Utf8` → `LargeUtf8`, `List` → `LargeList` before serialization.
-
-### 9.6 Dictionary-Encoded Arrays Not Supported
-
-**File:** `arrow_digester_core.rs`, `array_digest_update()` (line ~437)
-
-**Issue:** Dictionary-encoded arrays hit `todo!()` and panic.
-
-**Fix:** Resolve dictionary arrays to their plain value arrays using Arrow's `take` kernel or equivalent, then recursively hash the result.
-
-### 9.7 Schema Equality Check in `update()` Too Strict
-
-**File:** `arrow_digester_core.rs`, `update()` (line ~61)
-
-**Issue:** `*record_batch.schema() == self.schema` uses strict Arrow schema equality which includes column order. This prevents streaming batches with different column orders.
-
-**Fix:** Compare schemas logically (same set of fields with same types and nullability, regardless of order).
+8. **Non-null arrays with/without validity bitmap:** An array with all valid values produces the same data digest whether or not a validity bitmap is present.
 
 ---
 
-## 10. Comprehensive Test Plan
+## 11. Comprehensive Test Plan
 
-### 10.1 Column-Order Independence Tests
+### 11.1 Column-Order Independence Tests
 
 - **Top-level column reorder:** Two record batches with columns `[a, b, c]` vs `[c, a, b]` with same data produce identical hashes.
 - **Schema-only column reorder:** Two schemas with same fields in different order produce identical schema hashes.
 - **Streaming with reordered batches:** Feed batch1 with order `[a, b]`, batch2 with order `[b, a]` — should produce same hash as feeding both in order `[a, b]`.
 
-### 10.2 Struct Field-Order Independence Tests
+### 11.2 Struct Field-Order Independence Tests
 
 - **Flat struct reorder:** `Struct({x: Int32, y: Utf8})` vs `Struct({y: Utf8, x: Int32})` with same data produce identical hashes.
 - **Nested struct reorder:** Deeply nested structs with shuffled field orders at every level.
 - **Schema hash with reordered struct fields:** Verify schema digest is identical.
 
-### 10.3 Dictionary Encoding Equivalence Tests
+### 11.3 Dictionary Encoding Equivalence Tests
 
 - **String dictionary vs plain:** `DictionaryArray<Int32, Utf8>` vs `StringArray` with same logical values.
 - **Integer dictionary vs plain:** Dictionary-encoded integers vs plain integer array.
 - **Dictionary with nulls:** Dictionary arrays containing null entries match plain arrays with same nulls.
 - **Nested dictionary:** List of dictionary-encoded strings vs list of plain strings.
 
-### 10.4 Binary/Utf8/List Size Variant Equivalence Tests
+### 11.4 Binary/Utf8/List Size Variant Equivalence Tests
 
 - **Binary vs LargeBinary:** Same byte data in both produces identical hash.
 - **Utf8 vs LargeUtf8:** Same string data produces identical hash.
 - **List vs LargeList:** Same list data produces identical hash.
 - **Schema equivalence:** Schema with `Binary` field hashes same as schema with `LargeBinary` field (same name, same nullability).
 
-### 10.5 Null Handling Tests
+### 11.5 Null Handling Tests
 
-- **No sentinel bytes:** Verify that null values in binary/string arrays don't feed any extra bytes into the data digest (after fix).
+- **No sentinel bytes:** Verify that null values in binary/string arrays don't feed any extra bytes into the data digest.
 - **All-null array:** Array of all nulls produces a hash that depends only on the validity bitmap.
 - **All-valid nullable vs non-nullable:** Array with all valid values produces same data digest whether schema says nullable or not.
 - **Mixed nulls across batches:** First batch all nulls, second batch all valid — same as single combined batch.
 - **Null at different positions:** `[1, NULL, 3]` vs `[NULL, 1, 3]` produce different hashes.
 
-### 10.6 Batch Splitting Independence Tests
+### 11.6 Batch Splitting Independence Tests
 
 - **Two batches vs one:** Already tested, but extend to more types and edge cases.
 - **Many small batches:** Split into single-row batches vs one large batch.
 - **Empty batches:** Inserting empty batches between data batches doesn't change the hash.
 
-### 10.7 Edge Cases
+### 11.7 Edge Cases
 
 - **Empty table:** Schema-only hash (no data).
 - **Zero-length arrays:** Arrays with length 0 for each type.
@@ -398,7 +380,7 @@ The following issues have been identified in the current implementation that mus
 - **Unicode strings:** Strings with multi-byte UTF-8 characters.
 - **Sliced arrays:** Arrays created via `array.slice(offset, length)` should hash the same as a fresh array with the same values.
 
-### 10.8 Collision Resistance Tests
+### 11.8 Collision Resistance Tests
 
 - **Binary partition collision:** `[[0x01, 0x02], [0x03]]` vs `[[0x01], [0x02, 0x03]]` (already tested).
 - **String partition collision:** `["ab", "c"]` vs `["a", "bc"]` (already tested).
@@ -406,12 +388,12 @@ The following issues have been identified in the current implementation that mus
 - **Null vs zero:** `[NULL]` vs `[0]` produce different hashes.
 - **Empty vs null:** `[Some("")]` vs `[None]` for string type.
 
-### 10.9 Regression / Golden Value Tests
+### 11.9 Regression / Golden Value Tests
 
 - Maintain golden hash values for a comprehensive schema with data, verified against manually computed expected bytes.
 - Byte-level verification tests (already partially present) for each data type confirming exact bytes fed into the digest.
 
-### 10.10 Cross-Type Distinction Tests
+### 11.10 Cross-Type Distinction Tests
 
 - **Float32 vs Float64:** Same numeric value (e.g., `1.5`) in different float types produces different hashes (schema distinguishes them).
 - **Int32 vs Int64:** Same integer value in different integer types produces different hashes.
diff --git a/docs/implementation-plan.md b/docs/implementation-plan.md
new file mode 100644
index 0000000..bc9a4e9
--- /dev/null
+++ b/docs/implementation-plan.md
@@ -0,0 +1,434 @@
+# Implementation Plan: Complete Stable Logical Hashing
+
+This plan addresses all identified gaps in the Starfix hashing implementation, organized into tiers by priority. Each item follows the project's TDD workflow: write failing tests first, then implement.
+
+**Files primarily affected:**
+- `src/arrow_digester_core.rs` — core implementation
+- `tests/arrow_digester.rs` — integration tests
+- `tests/digest_bytes.rs` — byte-level specification conformance tests
+- `docs/byte-layout-spec.md` — specification updates
+
+---
+
+## Tier 1 — Blocks Production Use
+
+### 1.1 Implement `Timestamp` data hashing
+
+**Current state:** `todo!()` in `array_digest_update` for `DataType::Timestamp`. Schema serialization already works (falls through to Arrow serde: `{"Timestamp":["Nanosecond","UTC"]}`).
+
+**Implementation:** Timestamp is always `i64` (8 bytes LE), regardless of unit or timezone.
+
+```rust
+DataType::Timestamp(_, _) => Self::hash_fixed_size_array(effective_array, digest, 8),
+```
+
+**Design decision — Timezone equivalence:**
+Arrow's serde serializes `Timestamp(Nanosecond, Some("UTC"))` as `{"Timestamp":["Nanosecond","UTC"]}` and `Timestamp(Nanosecond, None)` as `{"Timestamp":["Nanosecond",null]}`. These naturally produce different schema hashes, which means **two columns with the same epoch values but different timezone annotations will hash differently** (because their schemas differ). This is the correct behavior — timezone is part of the logical type identity. **No special handling needed.**
+
+However, there is a subtler question: should `Timestamp(Nanosecond, Some("UTC"))` and `Timestamp(Nanosecond, Some("Etc/UTC"))` hash the same? They refer to the same timezone but have different string representations. **Recommended decision: do NOT normalize timezone strings.** Timezone alias resolution is complex, locale-dependent, and outside Starfix's scope. Document this as a known limitation.
+
+**Tests:**
+- `Timestamp(Nanosecond, Some("UTC"))` basic hashing (hash_array)
+- `Timestamp(Microsecond, None)` with nulls
+- Different units with same raw value produce different schema hashes (schema difference)
+- Same unit, same data, different timezone strings produce different hashes
+- Byte-level test in `digest_bytes.rs`
+
+**Spec update:** Add Section 3.7 for Timestamp, or extend Section 3.1 with a note that Timestamp/Duration are 8-byte fixed-size types.
+
+---
+
+### 1.2 Implement `Duration` data hashing
+
+**Current state:** `todo!()` in `array_digest_update` for `DataType::Duration`. Schema serialization works (`{"Duration":"Millisecond"}`).
+
+**Implementation:** Duration is always `i64` (8 bytes LE).
+
+```rust
+DataType::Duration(_) => Self::hash_fixed_size_array(effective_array, digest, 8),
+```
+
+**Design decision:** None needed. The unit is encoded in the schema JSON, so different Duration units produce different schema hashes. Data is just raw i64 bytes.
+
+**Tests:**
+- `Duration(Millisecond)` basic hashing
+- Different units produce different schema hashes
+- Byte-level test
+
+---
+
+### 1.3 Implement `Interval` data hashing
+
+**Current state:** `todo!()` in `array_digest_update` for `DataType::Interval`.
+
+**Implementation:** Element size depends on the IntervalUnit variant:
+
+```rust
+DataType::Interval(unit) => {
+    let size = match unit {
+        IntervalUnit::YearMonth => 4,   // i32
+        IntervalUnit::DayTime => 8,     // i32 + i32 packed as i64
+        IntervalUnit::MonthDayNano => 16, // i32 + i32 + i64
+    };
+    Self::hash_fixed_size_array(effective_array, digest, size);
+}
+```
+
+**Design decision:** None needed. Schema serialization (`{"Interval":"MonthDayNano"}`) already differentiates variants. Each variant has a fixed physical size, so `hash_fixed_size_array` works directly.
+
+**Tests:**
+- One test per IntervalUnit variant
+- `MonthDayNano` with nulls
+- Different interval units produce different schema hashes
+- Byte-level test for `YearMonth` (simplest, 4-byte)
+
+---
+
+### 1.4 Implement `FixedSizeList` data hashing
+
+**Current state:** `todo!()` in `array_digest_update` for `DataType::FixedSizeList`. Schema normalization and serialization already work correctly (`{"FixedSizeList":[<element>, size]}`). Normalization recurses into the inner field but does **not** collapse `FixedSizeList` → `LargeList`.
+
+**Design decision — Should `FixedSizeList(Int32, 3)` be equivalent to `LargeList(Int32)`?**
+**Recommended: No.** They are semantically different types (fixed-length vs variable-length). A `FixedSizeList` guarantees every element has exactly N items; a `LargeList` does not. Keep them as distinct types in the hash. This is consistent with how FixedSizeBinary is already handled (kept separate from LargeBinary).
+
+**Implementation:** `FixedSizeList` is conceptually a list where every element has exactly `size` items. For hashing, we can treat it like `LargeList` but without structural size prefixes (since all sizes are identical and encoded in the schema).
+
+However, for consistency with `LargeList`, we should still use structural hashing with the fixed size. This ensures that if a user ever needs to compare a `FixedSizeList` hash against a manually reconstructed one, the logic is consistent.
+
+**Alternative (simpler):** Treat `FixedSizeList(field, n)` as a flat buffer of `n * element_size` bytes per row. This only works for fixed-size inner types. For variable-size inner types (e.g., `FixedSizeList(Utf8, 3)`), we must recurse.
+
+**Recommended approach:** Reuse `hash_list_array` logic by casting `FixedSizeListArray` to `LargeListArray`. Arrow's `cast` supports this. This is the simplest and most consistent approach.
+
+```rust
+DataType::FixedSizeList(field, _) => {
+    let as_large_list = cast(effective_array, &DataType::LargeList(Arc::clone(field)))
+        .expect("Failed to cast FixedSizeList to LargeList");
+    Self::hash_list_array(
+        as_large_list.as_any().downcast_ref::<LargeListArray>()
+            .expect("Failed to downcast to LargeListArray"),
+        field.data_type(),
+        digest,
+    );
+}
+```
+
+**Design decision — Normalization update needed?** If we cast at hash time, we should also normalize `FixedSizeList` → `LargeList` in `normalize_data_type` to keep schema and data hashing consistent. But then `FixedSizeList` and `LargeList` with the same element type would be logically equivalent (same hash), which loses the fixed-size guarantee in the hash. **Decision needed from project owner:**
+- **(A)** Normalize `FixedSizeList(f, n)` → `LargeList(f)` — treats them as equivalent (like Utf8/LargeUtf8)
+- **(B)** Keep separate — `FixedSizeList` and `LargeList` always hash differently (different schema JSON)
+- **(C)** Keep schema separate but use same data hashing logic (cast at data time, don't normalize schema) — this is the recommended approach
+
+If **(C)**: schema JSON stays as `{"FixedSizeList":[..., n]}` (preserving the size), but data hashing uses LargeList logic internally. This means two arrays with identical data but different types (`FixedSizeList` vs `LargeList`) produce different hashes (because their schemas differ), which is correct.
+
+**Tests:**
+- `FixedSizeList(Int32, 2)` basic hashing
+- `FixedSizeList(LargeUtf8, 3)` with variable-length inner type
+- Nullable `FixedSizeList` with null elements
+- Verify `FixedSizeList(Int32, 2)` ≠ `LargeList(Int32)` (if option B/C chosen)
+- Byte-level test
+
+---
+
+### 1.5 Implement `Map` data hashing
+
+**Current state:** `todo!()` in `array_digest_update` for `DataType::Map`. Schema normalization and serialization work (`{"Map":[<field>, sorted]}`).
+
+**Background:** A `Map` in Arrow is physically stored as `LargeList<Struct<key, value>>`. The Arrow `MapArray` wraps a `ListArray` of `StructArray` entries.
+
+**Design decision — Should `Map` be normalized to `LargeList<Struct<...>>`?**
+**Recommended: No.** `Map` has semantic meaning (key-value pairs, optional sort guarantee) that `LargeList<Struct>` does not. The `sorted` flag is part of the schema JSON and should affect the hash. Keep `Map` as a distinct type.
+
+**Implementation:** Treat `Map` as a list of structs. Use the same approach as `LargeList`:
+
+```rust
+DataType::Map(field, _sorted) => {
+    // Map is physically stored as a list of key-value structs
+    let map_array = effective_array.as_any()
+        .downcast_ref::<MapArray>()
+        .expect("Failed to downcast to MapArray");
+    // Reinterpret as list of entries
+    // MapArray provides .entries() as StructArray and offsets
+    // Hash like a LargeList<Struct<key, value>>
+    // ...
+}
+```
+
+Concretely, `MapArray` exposes `keys()`, `values()`, and offsets. The cleanest path is to extract the underlying `ListArray` and hash it:
+
+```rust
+DataType::Map(field, _) => {
+    // MapArray is backed by a ListArray of Struct entries
+    let map_array = effective_array.as_any()
+        .downcast_ref::<MapArray>()
+        .expect("Failed to downcast to MapArray");
+    Self::hash_list_array(
+        // MapArray derefs to its inner ListArray representation
+        // We may need to access the underlying storage
+        ...,
+        field.data_type(),
+        digest,
+    );
+}
+```
+
+**Note:** The exact API depends on Arrow's `MapArray` internals. May need to construct a `LargeListArray` from the Map's offsets and entries struct. Check `arrow::array::MapArray` API.
+
+**Tests:**
+- Simple `Map<Utf8, Int32>` with 2 rows
+- Nullable Map with null entries
+- Verify `Map` ≠ `LargeList<Struct<key, value>>` (different schema hashes)
+- Byte-level test
+
+---
+
+### 1.6 Add multi-word validity bitmap test
+
+**Current state:** All existing tests use arrays with ≤ 8 elements, so validity bitmaps always fit in a single `u8` word. No test verifies correct behavior across word boundaries.
+
+**Implementation:** No code change needed — just add tests.
+
+**Tests:**
+- Array with 9 elements (null at position 8 → triggers second u8 word)
+- Array with 16 elements (nulls spanning exactly 2 full words)
+- Array with 20 elements (partial third word, verifying zero-padding of unused high bits)
+- All three as byte-level tests in `digest_bytes.rs` to verify exact word serialization
+
+---
+
+## Tier 2 — Robustness
+
+### 2.1 Implement `Null` type
+
+**Current state:** `todo!()` in `array_digest_update` for `DataType::Null`.
+
+**Design decision:** A `Null` column has no data — every element is null. The only information to hash is the validity bitmap (all zeros) and the count.
+
+**Implementation:**
+```rust
+DataType::Null => {
+    // Null type: no data bytes. Only push null bits (all false).
+    if let Some(ref mut null_bits) = digest.null_bits {
+        null_bits.extend(repeat_n(false, effective_array.len()));
+    }
+    // No data to feed into digest.data — intentionally empty.
+}
+```
+
+**Tests:**
+- `NullArray` with 3 elements via hash_array
+- Nullable vs non-nullable Null column in record batch
+- Byte-level test: verify only validity bits (all 0s) and empty data digest
+
+---
+
+### 2.2 Add nullable list element tests
+
+**Current state:** No test creates a `LargeListArray` where some list entries themselves are NULL (not list *values* being null, but entire list entries absent).
+
+**Tests:**
+- `LargeList<Int32>` with data `[[1,2], NULL, [3]]` — verify null list entry is skipped (no structural size, no data)
+- Byte-level test verifying exact bytes: validity = `[1, 0, 1]`, structural receives only 2 sizes, data receives only `[1,2,3]`
+
+---
+
+### 2.3 Document metadata exclusion in spec
+
+**Current state:** Arrow Field/Schema metadata (`HashMap<String, String>`) is silently ignored. `normalize_field()` drops metadata. This is correct but undocumented.
+
+**Changes:**
+- Add to `docs/byte-layout-spec.md` Section 2.1: "Arrow field metadata and schema metadata are **excluded** from the hash. Only field names, data types (recursively), and nullability are included. This means two schemas that differ only in metadata produce identical hashes."
+- Add a test: two schemas identical except for metadata → same hash
+
+---
+
+### 2.4 Add property-based test: column reorder invariance
+
+**Current state:** Column order independence is tested with 2 fixed examples. A property test would strengthen this.
+
+**Design decision:** Use `proptest` or `quickcheck` crate? **Recommend `proptest`** — more flexible, better shrinking.
+
+**Tests:**
+- Generate random schemas with 2-10 fields of supported types
+- Generate random data matching schema
+- Shuffle column order → hash must be identical
+- This would also serve as a crash test for unsupported types (should not panic for supported types)
+
+**Note:** This is a `dev-dependency` addition. Keep it behind a feature flag if desired.
+
+---
+
+## Tier 3 — Completeness
+
+### 3.1 Implement `Union` types (Dense and Sparse)
+
+**Current state:** `todo!()` in `array_digest_update` for `DataType::Union`.
+
+**Design decision — This is the hardest type to hash correctly:**
+
+A Union contains multiple child arrays and a type_ids buffer that says which child each row comes from. DenseUnion also has an offsets buffer.
+
+Options:
+- **(A) Resolve to concrete values:** For each row, look up the active child + offset, extract the value, hash it. This is like dictionary resolution. Simple but loses the "which variant" information.
+- **(B) Hash type_ids + child data separately:** Feed `type_ids` as a fixed-size array, then hash each child independently. This preserves variant identity.
+- **(C) Hash compositely:** For each row, hash `(type_id, value_bytes)`. This is the most collision-resistant.
+
+**Recommended: (C)** — hash `type_id` byte followed by value bytes for each row. This ensures that a union value `Int32(5)` hashes differently from `Float32(5.0)` even if they happen to have similar byte representations.
+
+**Implementation sketch:**
+```rust
+DataType::Union(fields, mode) => {
+    let union_array = effective_array.as_any()
+        .downcast_ref::<UnionArray>()
+        .expect("Failed to downcast to UnionArray");
+    for i in 0..union_array.len() {
+        let type_id = union_array.type_id(i);
+        digest.data.update(type_id.to_le_bytes());
+        let child = union_array.value(i);
+        // Hash the single-element child value
+        // Need a way to hash a single scalar — possibly slice the child array
+        ...
+    }
+}
+```
+
+**Complexity:** High. Union hashing requires per-element dispatch. Defer if not needed for initial production use.
+
+**Tests:**
+- SparseUnion with Int32 and Utf8 children
+- DenseUnion with nulls (if Union supports nulls — it depends on Arrow version)
+- Byte-level test
+
+---
+
+### 3.2 Implement `RunEndEncoded`
+
+**Current state:** `todo!()` in `array_digest_update` for `DataType::RunEndEncoded`.
+
+**Design decision:** RunEndEncoded is a compression format. Like Dictionary, the logical values are what matter.
+
+**Recommended:** Resolve/decode to the plain array equivalent and hash that. Arrow should support `cast()` from REE to plain arrays.
+
+```rust
+DataType::RunEndEncoded(_, values_field) => {
+    let plain = cast(effective_array, values_field.data_type())
+        .expect("Failed to decode RunEndEncoded");
+    Self::array_digest_update(values_field.data_type(), plain.as_ref(), digest);
+}
+```
+
+**Design decision:** Should REE normalize in the schema? **Recommended: Yes** — normalize `RunEndEncoded(run_ends, values)` → `normalize_data_type(values.data_type())`. This treats REE as a pure encoding optimization, like Dictionary.
+
+**Tests:**
+- REE Int32 array hashes same as plain Int32 array
+- REE with runs of different lengths
+
+---
+
+### 3.3 Implement View types (`BinaryView`, `Utf8View`)
+
+**Current state:** `todo!()` at lines 533, 541.
+
+**Implementation:** View types are logically equivalent to their non-view counterparts. Normalize in both schema and data:
+
+**Schema normalization** (add to `normalize_data_type`):
+```rust
+DataType::Utf8View => DataType::LargeUtf8,
+DataType::BinaryView => DataType::LargeBinary,
+```
+
+**Data hashing** (add to normalization block at top of `array_digest_update`):
+```rust
+DataType::Utf8View => {
+    normalized_type = DataType::LargeUtf8;
+    cast_array = cast(array, &normalized_type).expect("Failed to cast Utf8View to LargeUtf8");
+    (&normalized_type, cast_array.as_ref())
+}
+DataType::BinaryView => {
+    normalized_type = DataType::LargeBinary;
+    cast_array = cast(array, &normalized_type).expect("Failed to cast BinaryView to LargeBinary");
+    (&normalized_type, cast_array.as_ref())
+}
+```
+
+**Tests:**
+- `Utf8View ["hello"]` hashes same as `LargeUtf8 ["hello"]`
+- `BinaryView` hashes same as `LargeBinary`
+- Schema equivalence test
+
+---
+
+### 3.4 Implement `ListView` / `LargeListView`
+
+**Current state:** `todo!()` at lines 542, 554.
+
+**Implementation:** Normalize to `LargeList` (same logical semantics, different physical layout):
+
+**Schema normalization:**
+```rust
+DataType::ListView(field) | DataType::LargeListView(field) => {
+    DataType::LargeList(Arc::new(normalize_field(field)))
+}
+```
+
+**Data hashing:** Cast to `LargeList` at the normalization block in `array_digest_update`.
+
+**Tests:**
+- `ListView<Int32>` hashes same as `LargeList<Int32>`
+- With nulls
+
+---
+
+### 3.5 Add fuzz testing for panic detection
+
+**Implementation:** Add a fuzz target that generates random `RecordBatch` instances from random schemas (using only supported types) and ensures `hash_record_batch` never panics.
+
+**Tool:** `cargo-fuzz` with `libfuzzer` or `afl`.
+
+**Scope:** Generate schemas with 1-20 fields, types drawn from supported set, 0-100 rows, random null patterns.
+
+---
+
+## Execution Order
+
+Recommended implementation sequence (respecting dependencies):
+
+1. **1.1–1.3** (Timestamp, Duration, Interval) — independent, trivial implementations
+2. **1.6** (multi-word validity test) — test-only, no code changes
+3. **2.1** (Null type) — trivial
+4. **2.2** (nullable list test) — test-only
+5. **2.3** (document metadata exclusion) — docs-only
+6. **3.3** (View types) — simple normalization + cast
+7. **3.4** (ListView) — simple normalization + cast
+8. **1.4** (FixedSizeList) — needs design decision on normalization
+9. **1.5** (Map) — moderate complexity, needs Arrow API exploration
+10. **3.2** (RunEndEncoded) — needs design decision on normalization
+11. **3.1** (Union) — highest complexity
+12. **2.4** (property tests) — after all types implemented
+13. **3.5** (fuzz testing) — after all types implemented
+
+Items 1-7 can likely be done in a single PR. Items 8-11 may warrant individual PRs due to design decisions. Items 12-13 are infrastructure additions.
+
+---
+
+## Python Bindings
+
+The Python interface should be provided via **PyO3 bindings** to the Rust library (not a parallel pure-Python implementation). This lives in the separate `nauticalab/starfix-python` repository.
+
+**TODO:**
+- Configure PyO3/maturin build for the starfix crate
+- Expose `ArrowDigester`, `hash_array`, `hash_record_batch`, `hash_table` to Python
+- Use `arrow-rs` ↔ `pyarrow` interop via `arrow::pyarrow` feature or `pyo3-arrow`
+- Publish to PyPI as `starfix`
+
+---
+
+## Open Design Decisions Summary
+
+| # | Question | Recommendation | Impact |
+|---|----------|---------------|--------|
+| 1 | Should timezone strings be normalized (e.g., "UTC" == "Etc/UTC")? | **No** — document as known limitation | Low risk |
+| 2 | Should `FixedSizeList` normalize to `LargeList`? | **No** — keep schema separate, use same data hashing logic (option C) | Affects schema equivalence |
+| 3 | Should `Map` normalize to `LargeList<Struct>`? | **No** — keep as distinct type | Affects schema equivalence |
+| 4 | Should `RunEndEncoded` normalize to its value type? | **Yes** — treat as encoding optimization like Dictionary | Affects schema equivalence |
+| 5 | Should View types normalize to Large equivalents? | **Yes** — `Utf8View`→`LargeUtf8`, etc. | Affects schema equivalence |
+| 6 | How should Union be hashed? | **(C)** — type_id + value bytes per row | Affects hash format |
+| 7 | Should metadata affect the hash? | **No** — current behavior is correct, just document it | Documentation only |
diff --git a/src/arrow_digester_core.rs b/src/arrow_digester_core.rs
index 5dde5a6..d834a99 100644
--- a/src/arrow_digester_core.rs
+++ b/src/arrow_digester_core.rs
@@ -3,41 +3,144 @@
     clippy::todo,
     reason = "First iteration of code, will add proper error handling later. Allow for unsupported data types for now"
 )]
-use std::{collections::BTreeMap, iter::repeat_n};
+use std::{collections::BTreeMap, iter::repeat_n, sync::Arc};
 
 use arrow::{
     array::{
-        Array, BinaryArray, BooleanArray, GenericBinaryArray, GenericListArray, GenericStringArray,
-        LargeBinaryArray, LargeListArray, LargeStringArray, ListArray, OffsetSizeTrait,
-        RecordBatch, StringArray, StructArray,
+        make_array, Array, BooleanArray, GenericBinaryArray, GenericStringArray, LargeBinaryArray,
+        LargeListArray, LargeStringArray, OffsetSizeTrait, RecordBatch, StructArray,
     },
+    buffer::NullBuffer,
+    compute::cast,
     datatypes::{DataType, Schema},
 };
 use arrow_schema::Field;
 use bitvec::prelude::*;
 use digest::Digest;
 
-const NULL_BYTES: &[u8] = b"NULL";
-
 const DELIMITER_FOR_NESTED_FIELD: &str = "/";
 
 #[derive(Clone)]
-enum DigestBufferType<D: Digest> {
-    NonNullable(D),
-    Nullable(BitVec, D), // Where first digest is for the bull bits, while the second is for the actual data
+struct DigestBufferType<D: Digest> {
+    null_bits: Option<BitVec<u8, Lsb0>>,
+    structural: Option<D>,
+    data: Option<D>,
+}
+
+impl<D: Digest> DigestBufferType<D> {
+    /// Create a buffer for a leaf field (data + optional `null_bits`).
+    fn new_data_only(nullable: bool) -> Self {
+        Self {
+            null_bits: nullable.then(BitVec::<u8, Lsb0>::new),
+            structural: None,
+            data: Some(D::new()),
+        }
+    }
+
+    /// Create a buffer for a list-level-only entry (structural + optional `null_bits`, no data).
+    fn new_structural_only(nullable: bool) -> Self {
+        Self {
+            null_bits: nullable.then(BitVec::<u8, Lsb0>::new),
+            structural: Some(D::new()),
+            data: None,
+        }
+    }
+
+    /// Create a buffer for a leaf that is itself a list type (structural + data + optional `null_bits`).
+    fn new_list_leaf(nullable: bool) -> Self {
+        Self {
+            null_bits: nullable.then(BitVec::<u8, Lsb0>::new),
+            structural: Some(D::new()),
+            data: Some(D::new()),
+        }
+    }
+
+    /// Create a buffer for a column-level nullable entry (`null_bits` only).
+    fn new_validity_only() -> Self {
+        Self {
+            null_bits: Some(BitVec::<u8, Lsb0>::new()),
+            structural: None,
+            data: None,
+        }
+    }
+
+    /// Get a mutable reference to the data digest, panicking if absent.
+    #[expect(clippy::panic, reason = "Const fn cannot use expect/unwrap")]
+    const fn data_mut(&mut self) -> &mut D {
+        match &mut self.data {
+            Some(d) => d,
+            None => panic!("data digest not present on this entry"),
+        }
+    }
+}
+
+/// Recursively normalize a `DataType` to its canonical large equivalent.
+///
+/// - `Utf8` → `LargeUtf8`
+/// - `Binary` → `LargeBinary`
+/// - `List(field)` → `LargeList(normalized_field)`
+/// - `Dictionary(_, value_type)` → `normalize_data_type(value_type)`
+/// - `Struct`, `LargeList`, `FixedSizeList`, `Map` have their inner fields normalized recursively.
+fn normalize_data_type(data_type: &DataType) -> DataType {
+    match data_type {
+        DataType::Utf8 => DataType::LargeUtf8,
+        DataType::Binary => DataType::LargeBinary,
+        DataType::List(field) | DataType::LargeList(field) => {
+            DataType::LargeList(Arc::new(normalize_field(field)))
+        }
+        DataType::Struct(fields) => DataType::Struct(
+            fields
+                .iter()
+                .map(|f| Arc::new(normalize_field(f)))
+                .collect(),
+        ),
+        DataType::FixedSizeList(field, size) => {
+            DataType::FixedSizeList(Arc::new(normalize_field(field)), *size)
+        }
+        DataType::Map(field, sorted) => DataType::Map(Arc::new(normalize_field(field)), *sorted),
+        DataType::Dictionary(_, value_type) => normalize_data_type(value_type),
+        other => other.clone(),
+    }
+}
+
+/// Normalize a single field: keep name and nullability, normalize the data type recursively.
+fn normalize_field(field: &Field) -> Field {
+    Field::new(
+        field.name(),
+        normalize_data_type(field.data_type()),
+        field.is_nullable(),
+    )
+}
+
+/// Normalize all fields in a schema to their canonical large equivalents.
+fn normalize_schema(schema: &Schema) -> Schema {
+    Schema::new(
+        schema
+            .fields()
+            .iter()
+            .map(|f| Arc::new(normalize_field(f)))
+            .collect::<Vec<_>>(),
+    )
 }
 
 #[derive(Clone)]
 pub struct ArrowDigesterCore<D: Digest> {
-    schema: Schema,
     schema_digest: Vec<u8>,
+    serialized_schema: String,
     fields_digest_buffer: BTreeMap<String, DigestBufferType<D>>,
 }
 
 impl<D: Digest> ArrowDigesterCore<D> {
-    /// Create a new instance of `ArrowDigesterCore` with the schema which will be enforce through each update.
-    pub fn new(schema: Schema) -> Self {
-        // Hash the schema first
+    /// Create a new instance of `ArrowDigesterCore` with the schema, which will be enforced through each update.
+    #[expect(
+        clippy::shadow_reuse,
+        reason = "Intentional: shadow input with normalized version so all downstream code uses canonical types"
+    )]
+    pub fn new(schema: &Schema) -> Self {
+        // Normalize the schema so all internal state uses canonical large types
+        let schema = normalize_schema(schema);
+
+        // Hash the normalized schema
         let schema_digest = Self::hash_schema(&schema);
 
         // Flatten all nested fields into a single map, this allows us to hash each field individually and efficiently
@@ -46,96 +149,109 @@ impl<D: Digest> ArrowDigesterCore<D> {
             Self::extract_fields_name(field, "", &mut fields_digest_buffer);
         });
 
+        let serialized_schema = Self::serialized_schema(&schema);
+
         // Store it in the new struct for now
         Self {
-            schema,
             schema_digest,
+            serialized_schema,
             fields_digest_buffer,
         }
     }
 
     /// Hash a record batch and update the internal digests.
     pub fn update(&mut self, record_batch: &RecordBatch) {
-        // Verify schema matches
         assert!(
-            *record_batch.schema() == self.schema,
+            Self::serialized_schema(record_batch.schema().as_ref()) == self.serialized_schema,
             "Record batch schema does not match ArrowDigester schema"
         );
 
-        // Iterate through each field and update its digest
-        self.fields_digest_buffer
-            .iter_mut()
-            .for_each(|(field_name, digest)| {
-                // Determine if field name is nested
-                let field_name_hierarchy = field_name
-                    .split(DELIMITER_FOR_NESTED_FIELD)
-                    .collect::<Vec<_>>();
-
-                if field_name_hierarchy.len() == 1 {
-                    Self::array_digest_update(
-                        record_batch
-                            .schema()
-                            .field_with_name(field_name)
-                            .expect("Failed to get field with name")
-                            .data_type(),
-                        record_batch
-                            .column_by_name(field_name)
-                            .expect("Failed to get column by name"),
-                        digest,
-                    );
-                } else {
-                    Self::update_nested_field(
-                        &field_name_hierarchy,
-                        0,
-                        record_batch
-                            .column_by_name(
-                                field_name_hierarchy
-                                    .first()
-                                    .expect("Failed to get field name at idx 0, list is empty!"),
-                            )
-                            .expect("Failed to get column by name")
-                            .as_any()
-                            .downcast_ref::<StructArray>()
-                            .expect("Failed to downcast to StructArray"),
-                        digest,
-                    );
-                }
-            });
+        let schema = record_batch.schema();
+        for col_idx in 0..record_batch.num_columns() {
+            let field = schema.field(col_idx);
+            let array = record_batch.column(col_idx);
+            let path = field.name().to_owned();
+
+            Self::traverse_and_update(
+                field.data_type(),
+                field.is_nullable(),
+                array.as_ref(),
+                &path,
+                None, // no ancestor struct nulls at top level
+                &mut self.fields_digest_buffer,
+            );
+        }
     }
 
-    /// Hash an array directly without needing to create an `ArrowDigester` instance on the user side
-    /// For hash array, we don't have a schema to hash, however we do have field data type.
-    /// So similar to schema, we will hash based on datatype to encode the metadata information into the digest.....
+    /// Hash an array directly without needing to create an `ArrowDigester` instance on the user side.
+    /// Unlike full table hashing, we don't have a schema to hash; however, we do have the field data type.
+    /// Similar to schema hashing, we hash based on the data type to encode metadata information into the digest.
+    ///
+    /// Uses the same recursive decomposition as the record-batch path so that data hashing
+    /// is consistent regardless of which API is used.
     ///
     /// # Panics
     ///
     /// This function will panic if JSON serialization of the data type fails.
     ///
     pub fn hash_array(array: &dyn Array) -> Vec<u8> {
+        // Resolve dictionary arrays to their plain value type
+        let (effective_type, resolved_array);
+        let effective_array: &dyn Array =
+            if let DataType::Dictionary(_, value_type) = array.data_type() {
+                resolved_array = cast(array, value_type.as_ref())
+                    .expect("Failed to cast dictionary to plain array");
+                effective_type = value_type.as_ref().clone();
+                resolved_array.as_ref()
+            } else {
+                effective_type = array.data_type().clone();
+                array
+            };
+
+        // Normalize to canonical large types
+        let normalized_type = normalize_data_type(&effective_type);
+
         let mut final_digest = D::new();
 
-        let data_type_serialized = serde_json::to_string(&array.data_type())
+        // Use canonical type serialization for metadata (data_type_to_value also normalizes,
+        // but we pass the already-normalized type for consistency)
+        let canonical_type = Self::data_type_to_value(&normalized_type);
+        let data_type_serialized = serde_json::to_string(&canonical_type)
             .expect("Failed to serialize data type to string");
 
-        // Update the digest buffer with the array metadata and field data
+        // Update the digest with array metadata
         final_digest.update(data_type_serialized);
 
-        // Now we update it with the actual array data
-        let mut digest_buffer = if array.is_nullable() {
-            DigestBufferType::Nullable(BitVec::new(), D::new())
-        } else {
-            DigestBufferType::NonNullable(D::new())
-        };
-        Self::array_digest_update(array.data_type(), array, &mut digest_buffer);
-        Self::finalize_digest(&mut final_digest, digest_buffer);
+        // Build BTreeMap entries from the type tree (same decomposition as record-batch path)
+        let mut fields = BTreeMap::new();
+        Self::extract_type_entries(
+            &effective_type,
+            effective_array.is_nullable(),
+            "",
+            &mut fields,
+        );
+
+        // Traverse and populate entries
+        Self::traverse_and_update(
+            &effective_type,
+            effective_array.is_nullable(),
+            effective_array,
+            "",
+            None,
+            &mut fields,
+        );
+
+        // Finalize all entries into the digest (same order as record-batch finalize)
+        for (_, digest) in fields {
+            Self::finalize_digest(&mut final_digest, digest);
+        }
 
-        // Finalize and return the digest
         final_digest.finalize().to_vec()
     }
 
-    /// Hash record batch directly without needing to create an `ArrowDigester` instance on the user side.
+    /// Hash a record batch directly without needing to create an `ArrowDigester` instance on the user side.
     pub fn hash_record_batch(record_batch: &RecordBatch) -> Vec<u8> {
-        let mut digester = Self::new(record_batch.schema().as_ref().clone());
+        let mut digester = Self::new(record_batch.schema().as_ref());
         digester.update(record_batch);
         digester.finalize()
     }
@@ -157,28 +273,27 @@ impl<D: Digest> ArrowDigesterCore<D> {
         final_digest.finalize().to_vec()
     }
 
-    #[expect(
-        clippy::big_endian_bytes,
-        reason = "Use for bit packing the null_bit_values"
-    )]
     /// Finalize a single field digest into the final digest.
-    /// Helpers to reduce code duplication.
+    /// Helper to reduce code duplication.
     fn finalize_digest(final_digest: &mut D, digest: DigestBufferType<D>) {
-        match digest {
-            DigestBufferType::NonNullable(data_digest) => {
-                final_digest.update(data_digest.finalize());
-            }
-            DigestBufferType::Nullable(null_bit_digest, data_digest) => {
-                final_digest.update(null_bit_digest.len().to_le_bytes());
-                for &word in null_bit_digest.as_raw_slice() {
-                    final_digest.update(word.to_be_bytes());
-                }
-                final_digest.update(data_digest.finalize());
+        // Null bits first (if nullable)
+        if let Some(null_bit_vec) = &digest.null_bits {
+            final_digest.update((null_bit_vec.len() as u64).to_le_bytes());
+            for &word in null_bit_vec.as_raw_slice() {
+                final_digest.update(word.to_le_bytes());
             }
         }
+        // Structural digest (if list type) — sizes separated from leaf data
+        if let Some(structural) = digest.structural {
+            final_digest.update(structural.finalize());
+        }
+        // Data/leaf digest (if present)
+        if let Some(data) = digest.data {
+            final_digest.update(data.finalize());
+        }
     }
 
-    /// Serialize the schema into a `BTreeMap` for field name and its digest.
+    /// Serialize the schema into a canonical JSON string keyed by field name.
     ///
     /// # Panics
     /// This function will panic if JSON serialization of the schema fails.
@@ -200,34 +315,41 @@ impl<D: Digest> ArrowDigesterCore<D> {
 
     /// Convert a `DataType` to a JSON value, recursively converting any inner `Field`
     /// references to only include `name`, `data_type`, and `nullable`.
+    ///
+    /// Types are first normalized via `normalize_data_type` (Utf8→LargeUtf8, Binary→LargeBinary,
+    /// List→LargeList, Dictionary→value type) so the JSON always reflects canonical forms.
     fn data_type_to_value(data_type: &DataType) -> serde_json::Value {
-        match data_type {
+        // Normalize first so all downstream serialization uses canonical types
+        let canonical = normalize_data_type(data_type);
+        let value = match &canonical {
             DataType::Struct(fields) => {
-                let fields_json: Vec<serde_json::Value> = fields
+                let mut sorted_fields: Vec<_> = fields.iter().collect();
+                sorted_fields.sort_by_key(|f| f.name().clone());
+                let fields_json: Vec<serde_json::Value> = sorted_fields
                     .iter()
                     .map(|f| Self::inner_field_to_value(f))
                     .collect();
                 serde_json::json!({ "Struct": fields_json })
             }
-            DataType::List(field) => {
-                serde_json::json!({ "List": Self::inner_field_to_value(field) })
-            }
+            // After normalization, all list types are LargeList
             DataType::LargeList(field) => {
-                serde_json::json!({ "LargeList": Self::inner_field_to_value(field) })
+                serde_json::json!({ "LargeList": Self::element_type_to_value(field) })
             }
             DataType::FixedSizeList(field, size) => {
-                serde_json::json!({ "FixedSizeList": [Self::inner_field_to_value(field), size] })
+                serde_json::json!({ "FixedSizeList": [Self::element_type_to_value(field), size] })
             }
             DataType::Map(field, sorted) => {
                 serde_json::json!({ "Map": [Self::inner_field_to_value(field), sorted] })
             }
-            // For all non-nested types, Arrow's default serde is sufficient
+            // For all non-nested types (including LargeUtf8, LargeBinary after normalization),
+            // Arrow's default serde is sufficient
             other => serde_json::to_value(other).expect("Failed to serialize data type"),
-        }
+        };
+        Self::sort_json_value(value)
     }
 
-    /// Convert an inner field (e.g., list item, struct child) to a JSON value
-    /// with only `name`, `data_type`, and `nullable`.
+    /// Convert an inner field (e.g., struct child) to a JSON value
+    /// with `name`, `data_type`, and `nullable`.
     fn inner_field_to_value(field: &Field) -> serde_json::Value {
         serde_json::json!({
             "name": field.name(),
@@ -236,6 +358,15 @@ impl<D: Digest> ArrowDigesterCore<D> {
         })
     }
 
+    /// Convert a container element field (e.g., list item) to a JSON value
+    /// with only `data_type` and `nullable`, omitting the Arrow-internal field name.
+    fn element_type_to_value(field: &Field) -> serde_json::Value {
+        serde_json::json!({
+            "data_type": Self::data_type_to_value(field.data_type()),
+            "nullable": field.is_nullable(),
+        })
+    }
+
     /// Recursively sort all JSON object keys for deterministic serialization.
     fn sort_json_value(value: serde_json::Value) -> serde_json::Value {
         match value {
@@ -255,191 +386,407 @@ impl<D: Digest> ArrowDigesterCore<D> {
         }
     }
 
-    /// Serialize the schema into a `BTreeMap` for field name and its digest.
+    /// Hash the schema by serializing it to a canonical JSON string and computing its digest.
     pub fn hash_schema(schema: &Schema) -> Vec<u8> {
         // Hash the entire thing to the digest
         D::digest(Self::serialized_schema(schema)).to_vec()
     }
 
-    /// Recursive function to update nested field digests (structs within structs).
-    fn update_nested_field(
-        field_name_hierarchy: &[&str],
-        current_level: usize,
-        array: &StructArray,
-        digest: &mut DigestBufferType<D>,
+    /// Top-down recursive traversal that routes data to `BTreeMap` entries.
+    fn traverse_and_update(
+        data_type: &DataType,
+        nullable: bool,
+        array: &dyn Array,
+        path: &str,
+        ancestor_struct_nulls: Option<&NullBuffer>,
+        fields: &mut BTreeMap<String, DigestBufferType<D>>,
     ) {
-        let current_level_plus_one = current_level
-            .checked_add(1)
-            .expect("Field nesting level overflow");
-
-        if field_name_hierarchy
-            .len()
-            .checked_sub(1)
-            .expect("field_name_hierarchy underflow")
-            == current_level_plus_one
-        {
-            let array_data = array
-                .column_by_name(
-                    field_name_hierarchy
-                        .last()
-                        .expect("Failed to get field name at idx 0, list is empty!"),
-                )
-                .expect("Failed to get column by name");
-            // Base case, it should be a non-struct field
-            Self::array_digest_update(array_data.data_type(), array_data.as_ref(), digest);
+        // Normalize small variants
+        let (normalized_type, cast_array);
+        let (effective_type, effective_array): (&DataType, &dyn Array) = match data_type {
+            DataType::Utf8 => {
+                normalized_type = DataType::LargeUtf8;
+                cast_array = cast(array, &normalized_type).expect("cast Utf8");
+                (&normalized_type, cast_array.as_ref())
+            }
+            DataType::Binary => {
+                normalized_type = DataType::LargeBinary;
+                cast_array = cast(array, &normalized_type).expect("cast Binary");
+                (&normalized_type, cast_array.as_ref())
+            }
+            DataType::List(field) => {
+                normalized_type = DataType::LargeList(Arc::clone(field));
+                cast_array = cast(array, &normalized_type).expect("cast List");
+                (&normalized_type, cast_array.as_ref())
+            }
+            DataType::Dictionary(_, value_type) => {
+                cast_array = cast(array, value_type.as_ref()).expect("cast Dict");
+                (value_type.as_ref(), cast_array.as_ref())
+            }
+            _ => (data_type, array),
+        };
+
+        let canonical = normalize_data_type(effective_type);
+
+        match &canonical {
+            DataType::LargeList(value_field) => {
+                Self::traverse_list(
+                    effective_array,
+                    value_field,
+                    nullable,
+                    path,
+                    ancestor_struct_nulls,
+                    fields,
+                );
+            }
+            DataType::Struct(struct_fields) => {
+                Self::traverse_struct(
+                    effective_array,
+                    struct_fields,
+                    nullable,
+                    path,
+                    ancestor_struct_nulls,
+                    fields,
+                );
+            }
+            _ => {
+                Self::traverse_leaf(
+                    effective_type,
+                    effective_array,
+                    path,
+                    ancestor_struct_nulls,
+                    fields,
+                );
+            }
+        }
+    }
+
+    fn traverse_list(
+        array: &dyn Array,
+        value_field: &Field,
+        nullable: bool,
+        path: &str,
+        ancestor_struct_nulls: Option<&NullBuffer>,
+        fields: &mut BTreeMap<String, DigestBufferType<D>>,
+    ) {
+        let list_array = array
+            .as_any()
+            .downcast_ref::<LargeListArray>()
+            .expect("downcast to LargeListArray");
+
+        // If the field is nullable, record column/field-level validity at `path`
+        if nullable {
+            if let Some(entry) = fields.get_mut(path) {
+                if let Some(ref mut null_bits) = entry.null_bits {
+                    let effective_nulls =
+                        Self::combine_nulls(list_array.nulls(), ancestor_struct_nulls);
+                    match &effective_nulls {
+                        Some(nb) => {
+                            for i in 0..list_array.len() {
+                                null_bits.push(nb.is_valid(i));
+                            }
+                        }
+                        None => null_bits.extend(repeat_n(true, list_array.len())),
+                    }
+                }
+            }
+        }
+
+        let list_path = format!("{path}{DELIMITER_FOR_NESTED_FIELD}");
+
+        // Determine effective null buffer (field null AND ancestor struct null)
+        let effective_nulls = Self::combine_nulls(list_array.nulls(), ancestor_struct_nulls);
+
+        // For each row, write structural info and recurse into non-null elements
+        for i in 0..list_array.len() {
+            let is_valid = effective_nulls.as_ref().is_none_or(|nb| nb.is_valid(i));
+            if is_valid {
+                let sub_array = list_array.value(i);
+                let sub_len = sub_array.len() as u64;
+
+                // Write list length to structural digest at list_path
+                if let Some(entry) = fields.get_mut(&list_path) {
+                    if let Some(ref mut structural) = entry.structural {
+                        structural.update(sub_len.to_le_bytes());
+                    }
+                }
+
+                // Recurse into the sub-array using the ORIGINAL value type
+                // (not canonical) so traverse_and_update can normalize internally.
+                let original_value_type = sub_array.data_type();
+                Self::traverse_and_update(
+                    original_value_type,
+                    value_field.is_nullable(),
+                    sub_array.as_ref(),
+                    &list_path,
+                    None, // list elements don't have ancestor struct nulls
+                    fields,
+                );
+            }
+        }
+    }
+
+    fn traverse_struct(
+        array: &dyn Array,
+        _struct_fields: &arrow_schema::Fields,
+        nullable: bool,
+        path: &str,
+        ancestor_struct_nulls: Option<&NullBuffer>,
+        fields: &mut BTreeMap<String, DigestBufferType<D>>,
+    ) {
+        let struct_array = array
+            .as_any()
+            .downcast_ref::<StructArray>()
+            .expect("downcast to StructArray");
+
+        // Combine struct's own nulls with ancestor nulls (AND propagation)
+        let combined_nulls = if nullable {
+            Self::combine_nulls(struct_array.nulls(), ancestor_struct_nulls)
         } else {
-            // Recursive case, it should be a struct field
-            let next_array = array
-                .column_by_name(
-                    field_name_hierarchy
-                        .get(current_level_plus_one)
-                        .expect("Failed to get field name at current level"),
-                )
-                .expect("Failed to get column by name")
-                .as_any()
-                .downcast_ref::<StructArray>()
-                .expect("Failed to downcast to StructArray");
-
-            Self::update_nested_field(
-                field_name_hierarchy,
-                current_level_plus_one,
-                next_array,
-                digest,
+            ancestor_struct_nulls.cloned()
+        };
+
+        // Use the ORIGINAL struct array's fields (not the canonical ones from
+        // the type tree) so that data_type matches the actual child array.
+        // traverse_and_update will normalize types internally.
+        let original_fields = struct_array.fields();
+        let mut sorted_children: Vec<(usize, &Field)> = original_fields
+            .iter()
+            .enumerate()
+            .map(|(i, f)| (i, f.as_ref()))
+            .collect();
+        sorted_children.sort_by_key(|(_, f)| f.name().clone());
+
+        for (idx, child_field) in sorted_children {
+            let child_array = struct_array.column(idx);
+            let child_path = Self::construct_field_name_hierarchy(path, child_field.name());
+
+            Self::traverse_and_update(
+                child_field.data_type(),
+                child_field.is_nullable(),
+                child_array.as_ref(),
+                &child_path,
+                combined_nulls.as_ref(),
+                fields,
             );
         }
     }
 
+    fn traverse_leaf(
+        data_type: &DataType,
+        array: &dyn Array,
+        path: &str,
+        ancestor_struct_nulls: Option<&NullBuffer>,
+        fields: &mut BTreeMap<String, DigestBufferType<D>>,
+    ) {
+        let entry = fields
+            .get_mut(path)
+            .expect("entry must exist for leaf path");
+
+        // Compute effective validity (own nulls AND ancestor struct nulls)
+        let effective_nulls = Self::combine_nulls(array.nulls(), ancestor_struct_nulls);
+
+        // Handle null_bits
+        if let Some(ref mut null_bits) = entry.null_bits {
+            match &effective_nulls {
+                Some(nb) => {
+                    for i in 0..array.len() {
+                        null_bits.push(nb.is_valid(i));
+                    }
+                }
+                None => null_bits.extend(repeat_n(true, array.len())),
+            }
+        }
+
+        // Hash leaf data with combined null buffer
+        if let Some(effective) = &effective_nulls {
+            let child_data = array.to_data();
+            let null_count = effective.null_count();
+            let new_data = child_data
+                .into_builder()
+                .null_count(null_count)
+                .null_bit_buffer(Some(effective.clone().into_inner().into_inner()))
+                .build()
+                .expect("rebuild array with combined null buffer");
+            let combined_array = make_array(new_data);
+            Self::hash_leaf_data(data_type, combined_array.as_ref(), entry);
+        } else {
+            Self::hash_leaf_data(data_type, array, entry);
+        }
+    }
+
+    /// Hash leaf data into the entry's data digest, without modifying `null_bits`
+    /// (which are already handled by `traverse_leaf`).
+    fn hash_leaf_data(data_type: &DataType, array: &dyn Array, entry: &mut DigestBufferType<D>) {
+        // Save and restore null_bits so array_digest_update's handle_null_bits
+        // pushes don't pollute the real null_bits (which traverse_leaf manages).
+        // We keep null_bits in place during the call so hash functions use
+        // the null-aware code path (checking array.nulls() to skip null values).
+        let saved = entry.null_bits.take();
+        // Put a temporary empty bitvec so hash functions use the null-aware path
+        // when the array actually has nulls
+        if array.nulls().is_some() {
+            entry.null_bits = Some(BitVec::<u8, Lsb0>::new());
+        }
+        Self::array_digest_update(data_type, array, entry);
+        // Restore the real null_bits
+        entry.null_bits = saved;
+    }
+
+    fn combine_nulls(
+        own_nulls: Option<&NullBuffer>,
+        ancestor_nulls: Option<&NullBuffer>,
+    ) -> Option<NullBuffer> {
+        match (own_nulls, ancestor_nulls) {
+            (Some(own), Some(ancestor)) => Some(NullBuffer::new(own.inner() & ancestor.inner())),
+            (Some(own), None) => Some(own.clone()),
+            (None, Some(ancestor)) => Some(ancestor.clone()),
+            (None, None) => None,
+        }
+    }
+
     #[expect(
         clippy::too_many_lines,
         reason = "Comprehensive match on all data types"
     )]
+    #[expect(
+        clippy::unreachable,
+        reason = "Small types are normalized to large equivalents; List/Struct are handled by traverse_and_update"
+    )]
     fn array_digest_update(
         data_type: &DataType,
         array: &dyn Array,
         digest: &mut DigestBufferType<D>,
     ) {
-        match data_type {
+        // Normalize small variants to their large equivalents so every code path
+        // goes through a single canonical representation.  The cast only widens
+        // offsets (i32 → i64).  These variables extend the lifetime of cast
+        // results.  They are only initialized (and read) in branches that perform
+        // a cast; the default branch never touches them.
+        let (normalized_type, cast_array);
+        let (effective_type, effective_array): (&DataType, &dyn Array) = match data_type {
+            DataType::Utf8 => {
+                normalized_type = DataType::LargeUtf8;
+                cast_array =
+                    cast(array, &normalized_type).expect("Failed to cast Utf8 to LargeUtf8");
+                (&normalized_type, cast_array.as_ref())
+            }
+            DataType::Binary => {
+                normalized_type = DataType::LargeBinary;
+                cast_array =
+                    cast(array, &normalized_type).expect("Failed to cast Binary to LargeBinary");
+                (&normalized_type, cast_array.as_ref())
+            }
+            DataType::List(field) => {
+                normalized_type = DataType::LargeList(Arc::clone(field));
+                cast_array =
+                    cast(array, &normalized_type).expect("Failed to cast List to LargeList");
+                (&normalized_type, cast_array.as_ref())
+            }
+            _ => (data_type, array),
+        };
+
+        match effective_type {
             DataType::Null => todo!(),
             DataType::Boolean => {
                 // Bool Array is stored a bit differently, so we can't use the standard fixed buffer approach
-                let bool_array = array
+                let bool_array = effective_array
                     .as_any()
                     .downcast_ref::<BooleanArray>()
                     .expect("Failed to downcast to BooleanArray");
 
-                match digest {
-                    DigestBufferType::NonNullable(data_digest) => {
-                        // We want to bit pack the boolean values into bytes for hashing
-                        let mut bit_vec = BitVec::<u8, Msb0>::with_capacity(bool_array.len());
-                        for i in 0..bool_array.len() {
+                if let Some(ref mut null_bits) = digest.null_bits {
+                    // Handle null bits first
+                    Self::handle_null_bits(bool_array, null_bits);
+
+                    // Handle the data — only valid bits
+                    let mut bit_vec = BitVec::<u8, Lsb0>::with_capacity(bool_array.len());
+                    for i in 0..bool_array.len() {
+                        if bool_array.is_valid(i) {
                             bit_vec.push(bool_array.value(i));
                         }
-
-                        data_digest.update(bit_vec.as_raw_slice());
                     }
-                    DigestBufferType::Nullable(null_bit_vec, data_digest) => {
-                        // Handle null bits first
-                        Self::handle_null_bits(bool_array, null_bit_vec);
-
-                        // Handle the data
-                        let mut bit_vec = BitVec::<u8, Msb0>::with_capacity(bool_array.len());
-                        for i in 0..bool_array.len() {
-                            // We only want the valid bits, for null we will discard from the hash since that is already capture by null_bits
-                            if bool_array.is_valid(i) {
-                                bit_vec.push(bool_array.value(i));
-                            }
-                        }
-                        data_digest.update(bit_vec.as_raw_slice());
+                    digest.data_mut().update(bit_vec.as_raw_slice());
+                } else {
+                    // Non-nullable: pack all boolean values
+                    let mut bit_vec = BitVec::<u8, Lsb0>::with_capacity(bool_array.len());
+                    for i in 0..bool_array.len() {
+                        bit_vec.push(bool_array.value(i));
                     }
+                    digest.data_mut().update(bit_vec.as_raw_slice());
                 }
             }
-            DataType::Int8 | DataType::UInt8 => Self::hash_fixed_size_array(array, digest, 1),
+            DataType::Int8 | DataType::UInt8 => {
+                Self::hash_fixed_size_array(effective_array, digest, 1);
+            }
             DataType::Int16 | DataType::UInt16 | DataType::Float16 => {
-                Self::hash_fixed_size_array(array, digest, 2);
+                Self::hash_fixed_size_array(effective_array, digest, 2);
             }
             DataType::Int32
             | DataType::UInt32
             | DataType::Float32
             | DataType::Date32
             | DataType::Decimal32(_, _) => {
-                Self::hash_fixed_size_array(array, digest, 4);
+                Self::hash_fixed_size_array(effective_array, digest, 4);
             }
             DataType::Int64
             | DataType::UInt64
             | DataType::Float64
             | DataType::Date64
             | DataType::Decimal64(_, _) => {
-                Self::hash_fixed_size_array(array, digest, 8);
+                Self::hash_fixed_size_array(effective_array, digest, 8);
             }
             DataType::Timestamp(_, _) => todo!(),
-            DataType::Time32(_) => Self::hash_fixed_size_array(array, digest, 4),
-            DataType::Time64(_) => Self::hash_fixed_size_array(array, digest, 8),
+            DataType::Time32(_) => Self::hash_fixed_size_array(effective_array, digest, 4),
+            DataType::Time64(_) => Self::hash_fixed_size_array(effective_array, digest, 8),
             DataType::Duration(_) => todo!(),
             DataType::Interval(_) => todo!(),
-            DataType::Binary => Self::hash_binary_array(
-                array
-                    .as_any()
-                    .downcast_ref::<BinaryArray>()
-                    .expect("Failed to downcast to BinaryArray"),
-                digest,
-            ),
+            // Small variants are normalized above — these arms are unreachable
+            DataType::Binary | DataType::Utf8 | DataType::List(_) => {
+                unreachable!("Normalized to Large variant at the top of array_digest_update")
+            }
             DataType::FixedSizeBinary(element_size) => {
-                Self::hash_fixed_size_array(array, digest, *element_size);
+                Self::hash_fixed_size_array(effective_array, digest, *element_size);
             }
             DataType::LargeBinary => Self::hash_binary_array(
-                array
+                effective_array
                     .as_any()
                     .downcast_ref::<LargeBinaryArray>()
                     .expect("Failed to downcast to LargeBinaryArray"),
                 digest,
             ),
             DataType::BinaryView => todo!(),
-            DataType::Utf8 => Self::hash_string_array(
-                array
-                    .as_any()
-                    .downcast_ref::<StringArray>()
-                    .expect("Failed to downcast to StringArray"),
-                digest,
-            ),
             DataType::LargeUtf8 => Self::hash_string_array(
-                array
+                effective_array
                     .as_any()
                     .downcast_ref::<LargeStringArray>()
                     .expect("Failed to downcast to LargeStringArray"),
                 digest,
             ),
             DataType::Utf8View => todo!(),
-            DataType::List(field) => {
-                Self::hash_list_array(
-                    array
-                        .as_any()
-                        .downcast_ref::<ListArray>()
-                        .expect("Failed to downcast to ListArray"),
-                    field.data_type(),
-                    digest,
-                );
-            }
             DataType::ListView(_) => todo!(),
             DataType::FixedSizeList(_, _) => todo!(),
-            DataType::LargeList(field) => {
-                Self::hash_list_array(
-                    array
-                        .as_any()
-                        .downcast_ref::<LargeListArray>()
-                        .expect("Failed to downcast to LargeListArray"),
-                    field.data_type(),
-                    digest,
-                );
+            // List and Struct types are handled by the recursive decomposition path
+            // (traverse_and_update → traverse_list / traverse_struct). They should
+            // never reach array_digest_update directly.
+            DataType::LargeList(_) | DataType::Struct(_) => {
+                unreachable!(
+                    "List and Struct types are decomposed by traverse_and_update; \
+                     they should not reach array_digest_update"
+                )
             }
             DataType::LargeListView(_) => todo!(),
-            DataType::Struct(_) => todo!(),
             DataType::Union(_, _) => todo!(),
-            DataType::Dictionary(_, _) => todo!(),
+            DataType::Dictionary(_, value_type) => {
+                let resolved = cast(effective_array, value_type.as_ref())
+                    .expect("Failed to cast dictionary to plain array");
+                Self::array_digest_update(value_type.as_ref(), resolved.as_ref(), digest);
+            }
             DataType::Decimal128(_, _) => {
-                Self::hash_fixed_size_array(array, digest, 16);
+                Self::hash_fixed_size_array(effective_array, digest, 16);
             }
             DataType::Decimal256(_, _) => {
-                Self::hash_fixed_size_array(array, digest, 32);
+                Self::hash_fixed_size_array(effective_array, digest, 32);
             }
             DataType::Map(_, _) => todo!(),
             DataType::RunEndEncoded(_, _) => todo!(),
@@ -455,55 +802,59 @@ impl<D: Digest> ArrowDigesterCore<D> {
         let array_data = array.to_data();
         let element_size_usize = element_size as usize;
 
-        // Get the slice with offset accounted for if there is any
+        // Get the slice with offset and length accounted for
+        let start = array_data
+            .offset()
+            .checked_mul(element_size_usize)
+            .expect("Offset multiplication overflow");
+        let end = start
+            .checked_add(
+                array_data
+                    .len()
+                    .checked_mul(element_size_usize)
+                    .expect("Length multiplication overflow"),
+            )
+            .expect("End position overflow");
         let slice = array_data
             .buffers()
             .first()
             .expect("Unable to get first buffer to determine offset")
             .as_slice()
-            .get(
-                array_data
-                    .offset()
-                    .checked_mul(element_size_usize)
-                    .expect("Offset multiplication overflow")..,
-            )
+            .get(start..end)
             .expect("Failed to get buffer slice for FixedSizeBinaryArray");
 
-        match digest_buffer {
-            DigestBufferType::NonNullable(data_digest) => {
-                // No nulls, we can hash the entire buffer directly
-                data_digest.update(slice);
-            }
-            DigestBufferType::Nullable(null_bits, data_digest) => {
-                // Handle null bits first
-                Self::handle_null_bits(array, null_bits);
-
-                match array_data.nulls() {
-                    Some(null_buffer) => {
-                        // There are nulls, so we need to incrementally hash each value
-                        for i in 0..array_data.len() {
-                            if null_buffer.is_valid(i) {
-                                let data_pos = i
-                                    .checked_mul(element_size_usize)
-                                    .expect("Data position multiplication overflow");
-                                let end_pos = data_pos
-                                    .checked_add(element_size_usize)
-                                    .expect("End position addition overflow");
-
-                                data_digest.update(
-                                    slice
-                                        .get(data_pos..end_pos)
-                                        .expect("Failed to get data_slice"),
-                                );
-                            }
+        if let Some(ref mut null_bits) = digest_buffer.null_bits {
+            // Handle null bits first
+            Self::handle_null_bits(array, null_bits);
+
+            match array_data.nulls() {
+                Some(null_buffer) => {
+                    // There are nulls, so we need to incrementally hash each value
+                    for i in 0..array_data.len() {
+                        if null_buffer.is_valid(i) {
+                            let data_pos = i
+                                .checked_mul(element_size_usize)
+                                .expect("Data position multiplication overflow");
+                            let end_pos = data_pos
+                                .checked_add(element_size_usize)
+                                .expect("End position addition overflow");
+
+                            digest_buffer.data_mut().update(
+                                slice
+                                    .get(data_pos..end_pos)
+                                    .expect("Failed to get data_slice"),
+                            );
                         }
                     }
-                    None => {
-                        // No nulls, we can hash the entire buffer directly
-                        data_digest.update(slice);
-                    }
+                }
+                None => {
+                    // No nulls, we can hash the entire buffer directly
+                    digest_buffer.data_mut().update(slice);
                 }
             }
+        } else {
+            // No nulls, we can hash the entire buffer directly
+            digest_buffer.data_mut().update(slice);
         }
     }
 
@@ -511,42 +862,16 @@ impl<D: Digest> ArrowDigesterCore<D> {
         array: &GenericBinaryArray<impl OffsetSizeTrait>,
         digest: &mut DigestBufferType<D>,
     ) {
-        match digest {
-            DigestBufferType::NonNullable(data_digest) => {
-                for i in 0..array.len() {
-                    let value = array.value(i);
-                    data_digest.update(value.len().to_le_bytes());
-                    data_digest.update(value);
-                }
-            }
-            DigestBufferType::Nullable(null_bit_vec, data_digest) => {
-                // Deal with the null bits first
-                if let Some(null_buf) = array.nulls() {
-                    // We would need to iterate through the null buffer and push it into the null_bit_vec
-                    for i in 0..array.len() {
-                        null_bit_vec.push(null_buf.is_valid(i));
-                    }
-
-                    for i in 0..array.len() {
-                        if null_buf.is_valid(i) {
-                            let value = array.value(i);
-                            data_digest.update(value.len().to_le_bytes());
-                            data_digest.update(value);
-                        } else {
-                            data_digest.update(NULL_BYTES);
-                        }
-                    }
-                } else {
-                    // All valid, therefore we can extend the bit vector with all true values
-                    null_bit_vec.extend(repeat_n(true, array.len()));
+        if let Some(ref mut null_bits) = digest.null_bits {
+            Self::handle_null_bits(array, null_bits);
+        }
 
-                    // Deal with the data
-                    for i in 0..array.len() {
-                        let value = array.value(i);
-                        data_digest.update(value.len().to_le_bytes());
-                        data_digest.update(value);
-                    }
-                }
+        let null_buf = array.nulls();
+        for i in 0..array.len() {
+            if null_buf.is_none_or(|nb| nb.is_valid(i)) {
+                let value = array.value(i);
+                digest.data_mut().update((value.len() as u64).to_le_bytes());
+                digest.data_mut().update(value);
             }
         }
     }
@@ -555,112 +880,124 @@ impl<D: Digest> ArrowDigesterCore<D> {
         array: &GenericStringArray<impl OffsetSizeTrait>,
         digest: &mut DigestBufferType<D>,
     ) {
-        match digest {
-            DigestBufferType::NonNullable(data_digest) => {
-                for i in 0..array.len() {
-                    let value = array.value(i);
-                    data_digest.update((value.len() as u64).to_le_bytes());
-                    data_digest.update(value.as_bytes());
-                }
-            }
-            DigestBufferType::Nullable(null_bit_vec, data_digest) => {
-                // Deal with the null bits first
-                Self::handle_null_bits(array, null_bit_vec);
-
-                match array.nulls() {
-                    Some(null_buf) => {
-                        for i in 0..array.len() {
-                            if null_buf.is_valid(i) {
-                                let value = array.value(i);
-                                data_digest.update((value.len() as u64).to_le_bytes());
-                                data_digest.update(value.as_bytes());
-                            } else {
-                                data_digest.update(NULL_BYTES);
-                            }
-                        }
-                    }
-                    None => {
-                        for i in 0..array.len() {
-                            let value = array.value(i);
-                            data_digest.update((value.len() as u64).to_le_bytes());
-                            data_digest.update(value.as_bytes());
-                        }
-                    }
-                }
-            }
+        if let Some(ref mut null_bits) = digest.null_bits {
+            Self::handle_null_bits(array, null_bits);
         }
-    }
 
-    fn hash_list_array(
-        array: &GenericListArray<impl OffsetSizeTrait>,
-        field_data_type: &DataType,
-        digest: &mut DigestBufferType<D>,
-    ) {
-        match digest {
-            // Wildcard `_` avoids binding so `digest` remains usable below
-            DigestBufferType::NonNullable(_) => {
-                for i in 0..array.len() {
-                    let sub = array.value(i);
-                    // Prefix sub-array element count to prevent cross-boundary collisions.
-                    // Without this [[1,2],[3]] and [[1],[2,3]] produce identical byte streams.
-                    // sub.len() returns usize, avoiding the non-primitive OffsetSizeTrait cast.
-                    Self::update_data_digest(digest, (sub.len() as u64).to_le_bytes());
-                    Self::array_digest_update(field_data_type, sub.as_ref(), digest);
-                }
-            }
-            DigestBufferType::Nullable(bit_vec, _) => {
-                // Deal with null bits first; NLL ends bit_vec borrow after this call
-                Self::handle_null_bits(array, bit_vec);
-
-                match array.nulls() {
-                    Some(null_buf) => {
-                        for i in 0..array.len() {
-                            if null_buf.is_valid(i) {
-                                let sub = array.value(i);
-                                Self::update_data_digest(digest, (sub.len() as u64).to_le_bytes());
-                                Self::array_digest_update(field_data_type, sub.as_ref(), digest);
-                            }
-                        }
-                    }
-                    None => {
-                        for i in 0..array.len() {
-                            let sub = array.value(i);
-                            Self::update_data_digest(digest, (sub.len() as u64).to_le_bytes());
-                            Self::array_digest_update(field_data_type, sub.as_ref(), digest);
-                        }
-                    }
-                }
+        let null_buf = array.nulls();
+        for i in 0..array.len() {
+            if null_buf.is_none_or(|nb| nb.is_valid(i)) {
+                let value = array.value(i);
+                digest.data_mut().update((value.len() as u64).to_le_bytes());
+                digest.data_mut().update(value.as_bytes());
             }
         }
     }
 
-    /// Internal recursive function to extract field names from nested structs effectively flattening the schema.
-    /// The format is `parent__child__grandchild__etc`... for nested fields and will be stored in `fields_digest_buffer`.
+    /// Recursively extract field entries from the type tree.
+    ///
+    /// - **List**: creates a structural-only entry at `path/`, then recurses into
+    ///   the value type. If the column field is nullable, also creates a
+    ///   validity-only entry at the field path (before the `/`).
+    /// - **Struct**: transparent — recurses into each child field with `path/childname`.
+    ///   No entry for the struct itself. Struct null propagation is handled at
+    ///   traversal time.
+    /// - **Leaf (non-list, non-struct)**: creates a data entry at the current path.
     fn extract_fields_name(
         field: &Field,
         parent_field_name: &str,
         fields_digest_buffer: &mut BTreeMap<String, DigestBufferType<D>>,
     ) {
-        // Check if field is a nested type of struct
-        if let DataType::Struct(fields) = field.data_type() {
-            // We will add fields in alphabetical order
-            fields.into_iter().for_each(|field_inner| {
-                Self::extract_fields_name(
-                    field_inner,
-                    Self::construct_field_name_hierarchy(parent_field_name, field.name()).as_str(),
-                    fields_digest_buffer,
-                );
-            });
-        } else {
-            // Base case, just add the the combine field name to the map
-            fields_digest_buffer.insert(
-                Self::construct_field_name_hierarchy(parent_field_name, field.name()),
-                if field.is_nullable() {
-                    DigestBufferType::Nullable(BitVec::new(), D::new())
-                } else {
-                    DigestBufferType::NonNullable(D::new())
-                },
-            );
+        let path = Self::construct_field_name_hierarchy(parent_field_name, field.name());
+        Self::extract_type_entries(
+            field.data_type(),
+            field.is_nullable(),
+            &path,
+            fields_digest_buffer,
+        );
+    }
+
+    /// Core recursive type walker — creates `BTreeMap` entries based on the type tree.
+    ///
+    /// `nullable` reflects whether the current position is nullable (from the `Field`).
+    fn extract_type_entries(
+        data_type: &DataType,
+        nullable: bool,
+        path: &str,
+        fields_digest_buffer: &mut BTreeMap<String, DigestBufferType<D>>,
+    ) {
+        let canonical = normalize_data_type(data_type);
+
+        match &canonical {
+            DataType::Struct(fields) => {
+                // Struct is transparent — no entry, just recurse into children.
+                for child_field in fields {
+                    let child_path = Self::construct_field_name_hierarchy(path, child_field.name());
+                    Self::extract_type_entries(
+                        child_field.data_type(),
+                        child_field.is_nullable(),
+                        &child_path,
+                        fields_digest_buffer,
+                    );
+                }
+            }
+            DataType::LargeList(value_field) | DataType::List(value_field) => {
+                // For a nullable field that is a list, create a validity-only entry
+                // at the field path (column-level or field-level null tracking).
+                if nullable {
+                    fields_digest_buffer
+                        .insert(path.to_owned(), DigestBufferType::new_validity_only());
+                }
+
+                // List level: create entry at path + "/"
+                let list_path = format!("{path}{DELIMITER_FOR_NESTED_FIELD}");
+                let inner_type = value_field.data_type();
+                let inner_canonical = normalize_data_type(inner_type);
+
+                match &inner_canonical {
+                    DataType::Struct(_) => {
+                        // List<Struct<...>>: list entry is structural-only,
+                        // struct children become separate entries
+                        fields_digest_buffer.insert(
+                            list_path.clone(),
+                            DigestBufferType::new_structural_only(value_field.is_nullable()),
+                        );
+                        // Recurse into the struct's children
+                        Self::extract_type_entries(
+                            inner_type,
+                            value_field.is_nullable(),
+                            &list_path,
+                            fields_digest_buffer,
+                        );
+                    }
+                    DataType::LargeList(_) | DataType::List(_) => {
+                        // List<List<...>>: list entry is structural-only,
+                        // recurse into the inner list
+                        fields_digest_buffer.insert(
+                            list_path.clone(),
+                            DigestBufferType::new_structural_only(value_field.is_nullable()),
+                        );
+                        Self::extract_type_entries(
+                            inner_type,
+                            value_field.is_nullable(),
+                            &list_path,
+                            fields_digest_buffer,
+                        );
+                    }
+                    _ => {
+                        // List<Primitive>: list entry is both structural + data (leaf)
+                        fields_digest_buffer.insert(
+                            list_path,
+                            DigestBufferType::new_list_leaf(value_field.is_nullable()),
+                        );
+                    }
+                }
+            }
+            _ => {
+                // Leaf type (non-struct, non-list): create data entry
+                fields_digest_buffer
+                    .insert(path.to_owned(), DigestBufferType::new_data_only(nullable));
+            }
         }
     }
 
@@ -672,15 +1009,7 @@ impl<D: Digest> ArrowDigesterCore<D> {
         }
     }
 
-    /// Write bytes directly into the data digest portion of the buffer, bypassing null-bit tracking.
-    /// Used to write length prefixes that sit in the data stream but are not nullable values.
-    fn update_data_digest(digest: &mut DigestBufferType<D>, data: impl AsRef<[u8]>) {
-        match digest {
-            DigestBufferType::NonNullable(d) | DigestBufferType::Nullable(_, d) => d.update(data),
-        }
-    }
-
-    fn handle_null_bits(array: &dyn Array, null_bit_vec: &mut BitVec) {
+    fn handle_null_bits(array: &dyn Array, null_bit_vec: &mut BitVec<u8, Lsb0>) {
         match array.nulls() {
             Some(null_buf) => {
                 // We would need to iterate through the null buffer and push it into the null_bit_vec
@@ -714,10 +1043,10 @@ mod tests {
         array::{
             ArrayRef, BinaryArray, BooleanArray, Date32Array, Date64Array, Decimal128Array,
             Decimal32Array, FixedSizeBinaryBuilder, Float16Array, Float32Array, Float64Array,
-            Int16Array, Int32Array, Int64Array, Int8Array, LargeBinaryArray, LargeListBuilder,
-            LargeStringArray, ListBuilder, PrimitiveBuilder, RecordBatch, StringArray, StructArray,
-            Time32SecondArray, Time64MicrosecondArray, UInt16Array, UInt32Array, UInt64Array,
-            UInt8Array,
+            Int16Array, Int32Array, Int64Array, Int8Array, LargeBinaryArray, LargeListArray,
+            LargeListBuilder, LargeStringArray, ListBuilder, PrimitiveBuilder, RecordBatch,
+            StringArray, StructArray, Time32SecondArray, Time64MicrosecondArray, UInt16Array,
+            UInt32Array, UInt64Array, UInt8Array,
         },
         datatypes::Int32Type,
     };
@@ -727,8 +1056,9 @@ mod tests {
     use pretty_assertions::assert_eq;
     use sha2::{Digest as _, Sha256};
 
-    use crate::arrow_digester_core::{ArrowDigesterCore, DigestBufferType};
+    use crate::arrow_digester_core::ArrowDigesterCore;
     use arrow::array::{Decimal256Array, Decimal64Array};
+    use arrow::buffer::OffsetBuffer;
     use arrow_buffer::i256;
 
     #[expect(
@@ -870,7 +1200,7 @@ mod tests {
             ),
         ]);
 
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema.clone());
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         let field_names: Vec<&String> = digester.fields_digest_buffer.keys().collect();
 
         assert_eq!(field_names.len(), 3);
@@ -920,7 +1250,7 @@ mod tests {
         // Check the digest
         assert_eq!(
             encode(digester.finalize()),
-            "9841aab2dfeb637872d41422d33fca1e939f06b8fa0dcec66ff3782592cf9565"
+            "9b52ad7430dea81b35f14a04d828b2424080fbc210570081c6e6cb62b6566c42"
         );
     }
 
@@ -928,10 +1258,10 @@ mod tests {
 
     #[test]
     fn digest_bool_nullable_bytes() {
-        // [true, None, false, true] — valid values bit-packed Msb0, null skipped
+        // [true, None, false, true] — valid values bit-packed Lsb0, null skipped
         let array = BooleanArray::from(vec![Some(true), None, Some(false), Some(true)]);
         let schema = Schema::new(vec![Field::new("col", DataType::Boolean, true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -944,11 +1274,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 4);
         assert!(null_bit_vec[0], "index 0 (true) should be valid");
@@ -956,10 +1284,10 @@ mod tests {
         assert!(null_bit_vec[2], "index 2 (false) should be valid");
         assert!(null_bit_vec[3], "index 3 (true) should be valid");
 
-        // Valid values [true, false, true] packed Msb0 into one byte:
-        // bit0=1, bit1=0, bit2=1 → 1010_0000 = 0xA0
+        // Valid values [true, false, true] packed Lsb0 into one byte:
+        // bit0=1, bit1=0, bit2=1 → 0000_0101 = 0x05
         let mut manual = Sha256::new();
-        manual.update([0xA0_u8]);
+        manual.update([0x05_u8]);
         assert_eq!(data_digest.clone().finalize(), manual.finalize());
     }
 
@@ -968,7 +1296,7 @@ mod tests {
         // [false, true, false] — all values bit-packed, no nulls
         let array = BooleanArray::from(vec![false, true, false]);
         let schema = Schema::new(vec![Field::new("col", DataType::Boolean, false)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -981,14 +1309,13 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::NonNullable(data_digest) = &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected NonNullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        assert!(buf.null_bits.is_none(), "Expected non-nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
-        // [false, true, false] packed Msb0: bit0=0, bit1=1, bit2=0 → 0100_0000 = 0x40
+        // [false, true, false] packed Lsb0: bit0=0, bit1=1, bit2=0 → 0000_0010 = 0x02
         let mut manual = Sha256::new();
-        manual.update([0x40_u8]);
+        manual.update([0x02_u8]);
         assert_eq!(data_digest.clone().finalize(), manual.finalize());
     }
 
@@ -999,7 +1326,7 @@ mod tests {
         // [10, None, -3] — valid bytes: 0x0A, 0xFD
         let array = Int8Array::from(vec![Some(10_i8), None, Some(-3_i8)]);
         let schema = Schema::new(vec![Field::new("col", DataType::Int8, true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new("col", DataType::Int8, true)])),
@@ -1008,11 +1335,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1030,7 +1355,7 @@ mod tests {
         // [1, 2, 255]
         let array = UInt8Array::from(vec![1_u8, 2_u8, 255_u8]);
         let schema = Schema::new(vec![Field::new("col", DataType::UInt8, false)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new("col", DataType::UInt8, false)])),
@@ -1039,10 +1364,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::NonNullable(data_digest) = &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected NonNullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        assert!(buf.null_bits.is_none(), "Expected non-nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         let mut manual = Sha256::new();
         manual.update([0x01_u8, 0x02_u8, 0xFF_u8]);
@@ -1058,7 +1382,7 @@ mod tests {
         // -512 LE  = 00 fe
         let array = Int16Array::from(vec![Some(1000_i16), None, Some(-512_i16)]);
         let schema = Schema::new(vec![Field::new("col", DataType::Int16, true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new("col", DataType::Int16, true)])),
@@ -1067,11 +1391,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1089,7 +1411,7 @@ mod tests {
         // [100, 200, 65535]
         let array = UInt16Array::from(vec![100_u16, 200_u16, 0xFFFF_u16]);
         let schema = Schema::new(vec![Field::new("col", DataType::UInt16, false)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1102,10 +1424,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::NonNullable(data_digest) = &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected NonNullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        assert!(buf.null_bits.is_none(), "Expected non-nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         let mut manual = Sha256::new();
         manual.update(100_u16.to_le_bytes());
@@ -1125,7 +1446,7 @@ mod tests {
             half::f16::from_f32(-0.5),
         ]);
         let schema = Schema::new(vec![Field::new("col", DataType::Float16, false)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1138,10 +1459,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::NonNullable(data_digest) = &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected NonNullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        assert!(buf.null_bits.is_none(), "Expected non-nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         let mut manual = Sha256::new();
         manual.update(half::f16::from_f32(1.0).to_le_bytes());
@@ -1162,7 +1482,7 @@ mod tests {
 
         let schema = Schema::new(vec![Field::new("int32_col", DataType::Int32, true)]);
 
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
 
         digester.update(
             &RecordBatch::try_new(
@@ -1176,13 +1496,12 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) = digester
+        let buf = digester
             .fields_digest_buffer
             .get("int32_col")
-            .expect("int32_col field should exist in digest buffer")
-        else {
-            panic!("Expected a Nullable digest buffer for int32_col");
-        };
+            .expect("int32_col field should exist in digest buffer");
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         // The null bit vector should be [true, false, true, true] for [Some(42), None, Some(-7), Some(0)]
         assert_eq!(null_bit_vec.len(), 4);
@@ -1208,7 +1527,7 @@ mod tests {
         // [0, None, u32::MAX]
         let array = UInt32Array::from(vec![Some(0_u32), None, Some(u32::MAX)]);
         let schema = Schema::new(vec![Field::new("col", DataType::UInt32, true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new("col", DataType::UInt32, true)])),
@@ -1217,11 +1536,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1243,7 +1560,7 @@ mod tests {
         // 2.5f32 LE: 00 00 20 40
         let array = Float32Array::from(vec![Some(1.0_f32), None, Some(2.5_f32)]);
         let schema = Schema::new(vec![Field::new("col", DataType::Float32, true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1256,11 +1573,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1283,7 +1598,7 @@ mod tests {
             .with_precision_and_scale(9, 2)
             .unwrap();
         let schema = Schema::new(vec![Field::new("col", DataType::Decimal32(9, 2), true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1296,11 +1611,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1320,7 +1633,7 @@ mod tests {
             .with_precision_and_scale(9, 2)
             .unwrap();
         let schema = Schema::new(vec![Field::new("col", DataType::Decimal32(9, 2), false)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1333,10 +1646,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::NonNullable(data_digest) = &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected NonNullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        assert!(buf.null_bits.is_none(), "Expected non-nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         let mut manual = Sha256::new();
         manual.update(0_i32.to_le_bytes());
@@ -1352,7 +1664,7 @@ mod tests {
         // [i64::MIN, None, 9_876_543_210]
         let array = Int64Array::from(vec![Some(i64::MIN), None, Some(9_876_543_210_i64)]);
         let schema = Schema::new(vec![Field::new("col", DataType::Int64, true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new("col", DataType::Int64, true)])),
@@ -1361,11 +1673,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1383,7 +1693,7 @@ mod tests {
         // [0, None, u64::MAX]
         let array = UInt64Array::from(vec![Some(0_u64), None, Some(u64::MAX)]);
         let schema = Schema::new(vec![Field::new("col", DataType::UInt64, true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new("col", DataType::UInt64, true)])),
@@ -1392,11 +1702,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1416,7 +1724,7 @@ mod tests {
         // [1.0, -0.5, π]
         let array = Float64Array::from(vec![1.0_f64, -0.5_f64, f64::consts::PI]);
         let schema = Schema::new(vec![Field::new("col", DataType::Float64, false)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1429,10 +1737,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::NonNullable(data_digest) = &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected NonNullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        assert!(buf.null_bits.is_none(), "Expected non-nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         let mut manual = Sha256::new();
         manual.update(1.0_f64.to_le_bytes());
@@ -1451,7 +1758,7 @@ mod tests {
             .with_precision_and_scale(18, 3)
             .unwrap();
         let schema = Schema::new(vec![Field::new("col", DataType::Decimal64(18, 3), true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1464,11 +1771,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1488,7 +1793,7 @@ mod tests {
             .with_precision_and_scale(18, 3)
             .unwrap();
         let schema = Schema::new(vec![Field::new("col", DataType::Decimal64(18, 3), false)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1501,10 +1806,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::NonNullable(data_digest) = &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected NonNullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        assert!(buf.null_bits.is_none(), "Expected non-nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         let mut manual = Sha256::new();
         manual.update(0_i64.to_le_bytes());
@@ -1520,7 +1824,7 @@ mod tests {
         // Days since Unix epoch: [0, None, 19000]
         let array = Date32Array::from(vec![Some(0_i32), None, Some(19000_i32)]);
         let schema = Schema::new(vec![Field::new("col", DataType::Date32, true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new("col", DataType::Date32, true)])),
@@ -1529,11 +1833,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1551,7 +1853,7 @@ mod tests {
         // Milliseconds since Unix epoch: [0, None, 1_000_000]
         let array = Date64Array::from(vec![Some(0_i64), None, Some(1_000_000_i64)]);
         let schema = Schema::new(vec![Field::new("col", DataType::Date64, true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new("col", DataType::Date64, true)])),
@@ -1560,11 +1862,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1588,7 +1888,7 @@ mod tests {
             DataType::Time32(TimeUnit::Second),
             true,
         )]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1601,11 +1901,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1627,7 +1925,7 @@ mod tests {
             DataType::Time64(TimeUnit::Microsecond),
             true,
         )]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1640,11 +1938,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1667,7 +1963,7 @@ mod tests {
             .with_precision_and_scale(38, 5)
             .unwrap();
         let schema = Schema::new(vec![Field::new("col", DataType::Decimal128(38, 5), true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1680,11 +1976,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1711,7 +2005,7 @@ mod tests {
         .with_precision_and_scale(76, 10)
         .unwrap();
         let schema = Schema::new(vec![Field::new("col", DataType::Decimal256(76, 10), true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1724,11 +2018,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1753,7 +2045,7 @@ mod tests {
         let array = builder.finish();
 
         let schema = Schema::new(vec![Field::new("col", DataType::FixedSizeBinary(4), true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1766,11 +2058,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1789,11 +2079,11 @@ mod tests {
     #[test]
     fn digest_binary_nullable_bytes() {
         // [b"hello", None, b"world"]
-        // Valid entries: (length as usize LE) ++ bytes.
-        // Null entries contribute the sentinel b"NULL" to the data digest.
+        // Valid entries: (length as u64 LE) ++ bytes.
+        // Null entries are skipped entirely in the data digest.
         let array = BinaryArray::from(vec![Some(b"hello".as_ref()), None, Some(b"world".as_ref())]);
         let schema = Schema::new(vec![Field::new("col", DataType::Binary, true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new("col", DataType::Binary, true)])),
@@ -1802,11 +2092,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1814,10 +2102,10 @@ mod tests {
         assert!(null_bit_vec[2]);
 
         let mut manual = Sha256::new();
-        manual.update(5_usize.to_le_bytes()); // len("hello")
+        manual.update(5_u64.to_le_bytes()); // len("hello")
         manual.update(b"hello");
-        manual.update(b"NULL"); // null sentinel
-        manual.update(5_usize.to_le_bytes()); // len("world")
+        // null entry skipped — no sentinel bytes
+        manual.update(5_u64.to_le_bytes()); // len("world")
         manual.update(b"world");
         assert_eq!(data_digest.clone().finalize(), manual.finalize());
     }
@@ -1827,7 +2115,7 @@ mod tests {
         // [b"ab", b"cde"] — all valid, length prefix is usize LE
         let array = LargeBinaryArray::from(vec![b"ab".as_ref(), b"cde".as_ref()]);
         let schema = Schema::new(vec![Field::new("col", DataType::LargeBinary, false)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1840,15 +2128,14 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::NonNullable(data_digest) = &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected NonNullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        assert!(buf.null_bits.is_none(), "Expected non-nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         let mut manual = Sha256::new();
-        manual.update(2_usize.to_le_bytes());
+        manual.update(2_u64.to_le_bytes());
         manual.update(b"ab");
-        manual.update(3_usize.to_le_bytes());
+        manual.update(3_u64.to_le_bytes());
         manual.update(b"cde");
         assert_eq!(data_digest.clone().finalize(), manual.finalize());
     }
@@ -1859,10 +2146,10 @@ mod tests {
     fn digest_utf8_nullable_bytes() {
         // ["foo", None, "ba"]
         // Valid entries: (length as u64 LE) ++ UTF-8 bytes.
-        // Null entries contribute the sentinel b"NULL" to the data digest.
+        // Null entries are skipped entirely in the data digest.
         let array = StringArray::from(vec![Some("foo"), None, Some("ba")]);
         let schema = Schema::new(vec![Field::new("col", DataType::Utf8, true)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new("col", DataType::Utf8, true)])),
@@ -1871,11 +2158,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::Nullable(null_bit_vec, data_digest) =
-            &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected Nullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         assert_eq!(null_bit_vec.len(), 3);
         assert!(null_bit_vec[0]);
@@ -1885,7 +2170,7 @@ mod tests {
         let mut manual = Sha256::new();
         manual.update(3_u64.to_le_bytes()); // len("foo")
         manual.update(b"foo");
-        manual.update(b"NULL"); // null sentinel
+        // null entry skipped — no sentinel bytes
         manual.update(2_u64.to_le_bytes()); // len("ba")
         manual.update(b"ba");
         assert_eq!(data_digest.clone().finalize(), manual.finalize());
@@ -1896,7 +2181,7 @@ mod tests {
         // ["x", "yz"] — all valid, length prefix is u64 LE
         let array = LargeStringArray::from(vec!["x", "yz"]);
         let schema = Schema::new(vec![Field::new("col", DataType::LargeUtf8, false)]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1909,10 +2194,9 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::NonNullable(data_digest) = &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected NonNullable buffer");
-        };
+        let buf = &digester.fields_digest_buffer["col"];
+        assert!(buf.null_bits.is_none(), "Expected non-nullable");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
 
         let mut manual = Sha256::new();
         manual.update(1_u64.to_le_bytes());
@@ -1924,10 +2208,9 @@ mod tests {
 
     // ── List<Int32> / LargeList<Int32> ─────────────────────────────────────
     //
-    // Each outer element is prefixed by its inner element count (u64 LE), then the
-    // raw bytes of the inner array (no length limit — the implementation hashes from
-    // the element's offset to the end of the shared child buffer).
-    // Using a single outer element avoids buffer-bleed from preceding elements.
+    // With recursive decomposition, a non-nullable List<Int32 nullable> column
+    // creates a single entry at "col/" (list_leaf) with structural (element counts),
+    // data (leaf values), and null_bits (item nullability).
 
     #[test]
     fn digest_list_non_nullable_bytes() {
@@ -1945,7 +2228,7 @@ mod tests {
             DataType::List(Arc::clone(&item_field)),
             false,
         )]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -1958,18 +2241,33 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::NonNullable(data_digest) = &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected NonNullable buffer");
-        };
+        // Non-nullable column → no "col" entry; list_leaf entry at "col/"
+        let buf = &digester.fields_digest_buffer["col/"];
+        // Items are nullable → null_bits present (all valid in this case)
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable items");
+        assert_eq!(null_bit_vec.len(), 3);
+        assert!(null_bit_vec.iter().all(|b| *b), "All items should be valid");
 
-        // sub-array has 3 elements at offset 0 → raw buffer slice from byte 0
-        let mut manual = Sha256::new();
-        manual.update(3_u64.to_le_bytes()); // element count prefix
-        manual.update(10_i32.to_le_bytes());
-        manual.update(20_i32.to_le_bytes());
-        manual.update(30_i32.to_le_bytes());
-        assert_eq!(data_digest.clone().finalize(), manual.finalize());
+        let structural_digest = buf
+            .structural
+            .as_ref()
+            .expect("Expected structural digest for list");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
+
+        // Structural digest: element count (sizes separated from leaf data)
+        let mut manual_structural = Sha256::new();
+        manual_structural.update(3_u64.to_le_bytes());
+        assert_eq!(
+            structural_digest.clone().finalize(),
+            manual_structural.finalize()
+        );
+
+        // Data/leaf digest: only the raw leaf values
+        let mut manual_data = Sha256::new();
+        manual_data.update(10_i32.to_le_bytes());
+        manual_data.update(20_i32.to_le_bytes());
+        manual_data.update(30_i32.to_le_bytes());
+        assert_eq!(data_digest.clone().finalize(), manual_data.finalize());
     }
 
     #[test]
@@ -1988,7 +2286,7 @@ mod tests {
             DataType::LargeList(Arc::clone(&item_field)),
             false,
         )]);
-        let mut digester = ArrowDigesterCore::<Sha256>::new(schema);
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
         digester.update(
             &RecordBatch::try_new(
                 Arc::new(Schema::new(vec![Field::new(
@@ -2001,16 +2299,481 @@ mod tests {
             .unwrap(),
         );
 
-        let DigestBufferType::NonNullable(data_digest) = &digester.fields_digest_buffer["col"]
-        else {
-            panic!("Expected NonNullable buffer");
-        };
+        // Non-nullable column → no "col" entry; list_leaf entry at "col/"
+        let buf = &digester.fields_digest_buffer["col/"];
+        let null_bit_vec = buf.null_bits.as_ref().expect("Expected nullable items");
+        assert_eq!(null_bit_vec.len(), 3);
+        assert!(null_bit_vec.iter().all(|b| *b), "All items should be valid");
 
-        let mut manual = Sha256::new();
-        manual.update(3_u64.to_le_bytes());
-        manual.update(1_i32.to_le_bytes());
-        manual.update(2_i32.to_le_bytes());
-        manual.update(3_i32.to_le_bytes());
-        assert_eq!(data_digest.clone().finalize(), manual.finalize());
+        let structural_digest = buf
+            .structural
+            .as_ref()
+            .expect("Expected structural digest for list");
+        let data_digest = buf.data.as_ref().expect("Expected data digest");
+
+        // Structural digest: element count (sizes separated from leaf data)
+        let mut manual_structural = Sha256::new();
+        manual_structural.update(3_u64.to_le_bytes());
+        assert_eq!(
+            structural_digest.clone().finalize(),
+            manual_structural.finalize()
+        );
+
+        // Data/leaf digest: only the raw leaf values
+        let mut manual_data = Sha256::new();
+        manual_data.update(1_i32.to_le_bytes());
+        manual_data.update(2_i32.to_le_bytes());
+        manual_data.update(3_i32.to_le_bytes());
+        assert_eq!(data_digest.clone().finalize(), manual_data.finalize());
+    }
+
+    #[test]
+    fn digest_buffer_type_structural_only() {
+        let buf = super::DigestBufferType::<Sha256>::new_structural_only(true);
+        assert!(buf.null_bits.is_some());
+        assert!(buf.structural.is_some());
+        assert!(buf.data.is_none());
+    }
+
+    #[test]
+    fn digest_buffer_type_data_only() {
+        let buf = super::DigestBufferType::<Sha256>::new_data_only(false);
+        assert!(buf.null_bits.is_none());
+        assert!(buf.structural.is_none());
+        assert!(buf.data.is_some());
+    }
+
+    #[test]
+    fn digest_buffer_type_list_leaf() {
+        let buf = super::DigestBufferType::<Sha256>::new_list_leaf(true);
+        assert!(buf.null_bits.is_some());
+        assert!(buf.structural.is_some());
+        assert!(buf.data.is_some());
+    }
+
+    #[test]
+    fn digest_buffer_type_validity_only() {
+        let buf = super::DigestBufferType::<Sha256>::new_validity_only();
+        assert!(buf.null_bits.is_some());
+        assert!(buf.structural.is_none());
+        assert!(buf.data.is_none());
+    }
+
+    #[test]
+    fn extract_fields_list_of_struct() {
+        // List<Struct<a: Int32, b: String>>
+        let schema = Schema::new(vec![Field::new(
+            "x",
+            DataType::LargeList(Arc::new(Field::new(
+                "item",
+                DataType::Struct(
+                    vec![
+                        Field::new("a", DataType::Int32, false),
+                        Field::new("b", DataType::LargeUtf8, false),
+                    ]
+                    .into(),
+                ),
+                false,
+            ))),
+            true, // column is nullable
+        )]);
+
+        let digester = ArrowDigesterCore::<Sha256>::new(&schema);
+        let field_names: Vec<&String> = digester.fields_digest_buffer.keys().collect();
+
+        // Should have: "x" (validity-only), "x/" (structural), "x//a" (data), "x//b" (data)
+        assert_eq!(
+            field_names.len(),
+            4,
+            "Expected 4 entries, got: {field_names:?}"
+        );
+        assert!(field_names.contains(&&"x".to_owned()));
+        assert!(field_names.contains(&&"x/".to_owned()));
+        assert!(field_names.contains(&&"x//a".to_owned()));
+        assert!(field_names.contains(&&"x//b".to_owned()));
+    }
+
+    #[test]
+    fn extract_fields_nested_list_struct_list() {
+        // x: Nullable<List<Struct<a: Nullable<Int32>, b: Struct<g: Nullable<List<Int32>>, h: Int32>>>>
+        let schema = Schema::new(vec![Field::new(
+            "x",
+            DataType::LargeList(Arc::new(Field::new(
+                "item",
+                DataType::Struct(
+                    vec![
+                        Field::new("a", DataType::Int32, true),
+                        Field::new(
+                            "b",
+                            DataType::Struct(
+                                vec![
+                                    Field::new(
+                                        "g",
+                                        DataType::LargeList(Arc::new(Field::new(
+                                            "item",
+                                            DataType::Int32,
+                                            false,
+                                        ))),
+                                        true,
+                                    ),
+                                    Field::new("h", DataType::Int32, false),
+                                ]
+                                .into(),
+                            ),
+                            false,
+                        ),
+                    ]
+                    .into(),
+                ),
+                false,
+            ))),
+            true,
+        )]);
+
+        let digester = ArrowDigesterCore::<Sha256>::new(&schema);
+        let field_names: Vec<&String> = digester.fields_digest_buffer.keys().collect();
+
+        // Expected entries: "x", "x/", "x//a", "x//b/g", "x//b/g/", "x//b/h"
+        assert_eq!(
+            field_names.len(),
+            6,
+            "Expected 6 entries, got: {field_names:?}"
+        );
+        assert!(field_names.contains(&&"x".to_owned()));
+        assert!(field_names.contains(&&"x/".to_owned()));
+        assert!(field_names.contains(&&"x//a".to_owned()));
+        assert!(field_names.contains(&&"x//b/g".to_owned()));
+        assert!(field_names.contains(&&"x//b/g/".to_owned()));
+        assert!(field_names.contains(&&"x//b/h".to_owned()));
+    }
+
+    #[test]
+    fn recursive_list_struct_decomposition() {
+        use crate::arrow_digester_core::normalize_schema;
+
+        // Schema: x: Nullable<List<Struct<
+        //     a: Nullable<Int32>,
+        //     b: Struct<
+        //         g: Nullable<List<Int32>>,
+        //         h: Int32
+        //     >
+        // >>>
+        let g_field = Field::new(
+            "g",
+            DataType::LargeList(Arc::new(Field::new("item", DataType::Int32, false))),
+            true, // g is nullable
+        );
+        let h_field = Field::new("h", DataType::Int32, false);
+        let b_field = Field::new(
+            "b",
+            DataType::Struct(vec![g_field.clone(), h_field.clone()].into()),
+            false, // b is non-nullable
+        );
+        let a_field = Field::new("a", DataType::Int32, true); // a is nullable
+        let struct_type = DataType::Struct(vec![a_field.clone(), b_field.clone()].into());
+        let item_field = Field::new("item", struct_type, false);
+        let x_field = Field::new(
+            "x",
+            DataType::LargeList(Arc::new(item_field.clone())),
+            true, // column is nullable
+        );
+        let schema = Schema::new(vec![x_field]);
+
+        // Build the data:
+        // Row 0: [{a: 1, b: {g: [10, 20], h: 100}}, {a: null, b: {g: [30], h: 200}}]
+        // Row 1: null
+        // Row 2: [{a: 3, b: {g: null, h: 300}}, {a: 4, b: {g: [], h: 400}}, {a: 5, b: {g: [50], h: 500}}]
+
+        // Inner g values: [10, 20, 30, 50] (across all non-null g lists)
+        let g_values = Int32Array::from(vec![10, 20, 30, 50]);
+        // g list offsets: elem0=[10,20](len2), elem1=[30](len1), elem2=null, elem3=[](len0), elem4=[50](len1)
+        // For 5 struct elements, g has offsets [0, 2, 3, 3, 3, 4]
+        // with validity [true, true, false, true, true]
+        let g_list = LargeListArray::new(
+            Arc::new(Field::new("item", DataType::Int32, false)),
+            OffsetBuffer::new(vec![0_i64, 2, 3, 3, 3, 4].into()),
+            Arc::new(g_values) as ArrayRef,
+            Some(vec![true, true, false, true, true].into()), // g null at struct element 2
+        );
+
+        let h_values = Int32Array::from(vec![100, 200, 300, 400, 500]);
+
+        let b_struct = StructArray::from(vec![
+            (Arc::new(g_field), Arc::new(g_list) as ArrayRef),
+            (Arc::new(h_field), Arc::new(h_values) as ArrayRef),
+        ]);
+
+        let a_values = Int32Array::from(vec![Some(1), None, Some(3), Some(4), Some(5)]);
+
+        let inner_struct = StructArray::from(vec![
+            (Arc::new(a_field), Arc::new(a_values) as ArrayRef),
+            (Arc::new(b_field), Arc::new(b_struct) as ArrayRef),
+        ]);
+
+        // Outer list: Row 0 has 2 elements, Row 1 is null, Row 2 has 3 elements
+        // Offsets: [0, 2, 2, 5] (row 1 is null but offset still present)
+        let outer_list = LargeListArray::new(
+            Arc::new(item_field),
+            OffsetBuffer::new(vec![0_i64, 2, 2, 5].into()),
+            Arc::new(inner_struct) as ArrayRef,
+            Some(vec![true, false, true].into()), // row 1 is null
+        );
+
+        let batch = RecordBatch::try_new(
+            Arc::new(schema.clone()),
+            vec![Arc::new(outer_list) as ArrayRef],
+        )
+        .unwrap();
+
+        // ── Compute expected hash manually ──
+        // BTreeMap entries (in sorted order):
+        // "x"       → null_bits: V,I,V (3 bits)
+        // "x/"      → structural: [2, 3]
+        // "x//a"    → null_bits: V,I,V,V,V (5 bits), data: [1, 3, 4, 5] as i32 LE
+        // "x//b/g"  → null_bits: V,V,I,V,V (5 bits)
+        // "x//b/g/" → structural: [2, 1, 0, 1], data: [10, 20, 30, 50] as i32 LE
+        // "x//b/h"  → data: [100, 200, 300, 400, 500] as i32 LE
+
+        let schema_digest = Sha256::digest(
+            ArrowDigesterCore::<Sha256>::serialized_schema(&normalize_schema(&schema)).as_bytes(),
+        );
+
+        let mut final_digest = Sha256::new();
+        final_digest.update(schema_digest);
+
+        // Entry "x": null_bits V,I,V → bit_count=3, validity=0b101=5
+        final_digest.update(3_u64.to_le_bytes());
+        final_digest.update(5_u8.to_le_bytes());
+
+        // Entry "x/": structural only [2, 3]
+        let mut x_structural = Sha256::new();
+        x_structural.update(2_u64.to_le_bytes());
+        x_structural.update(3_u64.to_le_bytes());
+        final_digest.update(x_structural.finalize());
+
+        // Entry "x//a": null_bits V,I,V,V,V → bit_count=5, validity=0b11101=29
+        //   data: [1, 3, 4, 5] as i32 LE
+        final_digest.update(5_u64.to_le_bytes());
+        final_digest.update(29_u8.to_le_bytes());
+        let mut xa_data = Sha256::new();
+        xa_data.update(1_i32.to_le_bytes());
+        xa_data.update(3_i32.to_le_bytes());
+        xa_data.update(4_i32.to_le_bytes());
+        xa_data.update(5_i32.to_le_bytes());
+        final_digest.update(xa_data.finalize());
+
+        // Entry "x//b/g": null_bits V,V,I,V,V → bit_count=5, validity=0b11011=27
+        final_digest.update(5_u64.to_le_bytes());
+        final_digest.update(27_u8.to_le_bytes());
+
+        // Entry "x//b/g/": structural [2, 1, 0, 1], data [10, 20, 30, 50] as i32 LE
+        let mut xbg_structural = Sha256::new();
+        xbg_structural.update(2_u64.to_le_bytes());
+        xbg_structural.update(1_u64.to_le_bytes());
+        xbg_structural.update(0_u64.to_le_bytes());
+        xbg_structural.update(1_u64.to_le_bytes());
+        final_digest.update(xbg_structural.finalize());
+        let mut xbg_data = Sha256::new();
+        xbg_data.update(10_i32.to_le_bytes());
+        xbg_data.update(20_i32.to_le_bytes());
+        xbg_data.update(30_i32.to_le_bytes());
+        xbg_data.update(50_i32.to_le_bytes());
+        final_digest.update(xbg_data.finalize());
+
+        // Entry "x//b/h": data only [100, 200, 300, 400, 500] as i32 LE
+        let mut h_leaf_data = Sha256::new();
+        h_leaf_data.update(100_i32.to_le_bytes());
+        h_leaf_data.update(200_i32.to_le_bytes());
+        h_leaf_data.update(300_i32.to_le_bytes());
+        h_leaf_data.update(400_i32.to_le_bytes());
+        h_leaf_data.update(500_i32.to_le_bytes());
+        final_digest.update(h_leaf_data.finalize());
+
+        let expected_hash = final_digest.finalize().to_vec();
+
+        let mut digester = ArrowDigesterCore::<Sha256>::new(&schema);
+        digester.update(&batch);
+
+        let actual_hash = digester.finalize();
+
+        assert_eq!(
+            encode(&actual_hash),
+            encode(&expected_hash),
+            "Recursive list/struct decomposition hash mismatch"
+        );
+    }
+
+    #[expect(
+        clippy::too_many_lines,
+        reason = "Test builds multiple complex batches for batch-split independence verification"
+    )]
+    #[test]
+    fn recursive_list_struct_batch_split_independence() {
+        // Same schema and data as recursive_list_struct_decomposition,
+        // split into two batches: rows 0-1 and row 2.
+        // Verify: hash(batch1 + batch2) == hash(combined)
+
+        let g_field = Field::new(
+            "g",
+            DataType::LargeList(Arc::new(Field::new("item", DataType::Int32, false))),
+            true,
+        );
+        let h_field = Field::new("h", DataType::Int32, false);
+        let b_field = Field::new(
+            "b",
+            DataType::Struct(vec![g_field.clone(), h_field.clone()].into()),
+            false,
+        );
+        let a_field = Field::new("a", DataType::Int32, true);
+        let struct_type = DataType::Struct(vec![a_field.clone(), b_field.clone()].into());
+        let item_field = Field::new("item", struct_type, false);
+        let x_field = Field::new("x", DataType::LargeList(Arc::new(item_field.clone())), true);
+        let schema = Arc::new(Schema::new(vec![x_field]));
+
+        // ── Build combined batch (all 3 rows) ──
+        let g_values = Int32Array::from(vec![10, 20, 30, 50]);
+        let g_list = LargeListArray::new(
+            Arc::new(Field::new("item", DataType::Int32, false)),
+            OffsetBuffer::new(vec![0_i64, 2, 3, 3, 3, 4].into()),
+            Arc::new(g_values) as ArrayRef,
+            Some(vec![true, true, false, true, true].into()),
+        );
+        let h_values = Int32Array::from(vec![100, 200, 300, 400, 500]);
+        let b_struct = StructArray::from(vec![
+            (Arc::new(g_field.clone()), Arc::new(g_list) as ArrayRef),
+            (Arc::new(h_field.clone()), Arc::new(h_values) as ArrayRef),
+        ]);
+        let a_values = Int32Array::from(vec![Some(1), None, Some(3), Some(4), Some(5)]);
+        let inner_struct = StructArray::from(vec![
+            (Arc::new(a_field.clone()), Arc::new(a_values) as ArrayRef),
+            (Arc::new(b_field.clone()), Arc::new(b_struct) as ArrayRef),
+        ]);
+        let outer_list = LargeListArray::new(
+            Arc::new(item_field.clone()),
+            OffsetBuffer::new(vec![0_i64, 2, 2, 5].into()),
+            Arc::new(inner_struct) as ArrayRef,
+            Some(vec![true, false, true].into()),
+        );
+        let combined_batch =
+            RecordBatch::try_new(Arc::clone(&schema), vec![Arc::new(outer_list) as ArrayRef])
+                .unwrap();
+
+        // ── Build batch 1: rows 0-1 ──
+        let g_values_1 = Int32Array::from(vec![10, 20, 30]);
+        let g_list_1 = LargeListArray::new(
+            Arc::new(Field::new("item", DataType::Int32, false)),
+            OffsetBuffer::new(vec![0_i64, 2, 3].into()),
+            Arc::new(g_values_1) as ArrayRef,
+            Some(vec![true, true].into()),
+        );
+        let h_values_1 = Int32Array::from(vec![100, 200]);
+        let b_struct_1 = StructArray::from(vec![
+            (Arc::new(g_field.clone()), Arc::new(g_list_1) as ArrayRef),
+            (Arc::new(h_field.clone()), Arc::new(h_values_1) as ArrayRef),
+        ]);
+        let a_values_1 = Int32Array::from(vec![Some(1), None]);
+        let inner_struct_1 = StructArray::from(vec![
+            (Arc::new(a_field.clone()), Arc::new(a_values_1) as ArrayRef),
+            (Arc::new(b_field.clone()), Arc::new(b_struct_1) as ArrayRef),
+        ]);
+        let outer_list_1 = LargeListArray::new(
+            Arc::new(item_field.clone()),
+            OffsetBuffer::new(vec![0_i64, 2, 2].into()),
+            Arc::new(inner_struct_1) as ArrayRef,
+            Some(vec![true, false].into()),
+        );
+        let batch1 = RecordBatch::try_new(
+            Arc::clone(&schema),
+            vec![Arc::new(outer_list_1) as ArrayRef],
+        )
+        .unwrap();
+
+        // ── Build batch 2: row 2 ──
+        let g_values_2 = Int32Array::from(vec![50]);
+        let g_list_2 = LargeListArray::new(
+            Arc::new(Field::new("item", DataType::Int32, false)),
+            OffsetBuffer::new(vec![0_i64, 0, 0, 1].into()),
+            Arc::new(g_values_2) as ArrayRef,
+            Some(vec![false, true, true].into()),
+        );
+        let h_values_2 = Int32Array::from(vec![300, 400, 500]);
+        let b_struct_2 = StructArray::from(vec![
+            (Arc::new(g_field), Arc::new(g_list_2) as ArrayRef),
+            (Arc::new(h_field), Arc::new(h_values_2) as ArrayRef),
+        ]);
+        let a_values_2 = Int32Array::from(vec![Some(3), Some(4), Some(5)]);
+        let inner_struct_2 = StructArray::from(vec![
+            (Arc::new(a_field), Arc::new(a_values_2) as ArrayRef),
+            (Arc::new(b_field), Arc::new(b_struct_2) as ArrayRef),
+        ]);
+        let outer_list_2 = LargeListArray::new(
+            Arc::new(item_field),
+            OffsetBuffer::new(vec![0_i64, 3].into()),
+            Arc::new(inner_struct_2) as ArrayRef,
+            Some(vec![true].into()),
+        );
+        let batch2 = RecordBatch::try_new(
+            Arc::clone(&schema),
+            vec![Arc::new(outer_list_2) as ArrayRef],
+        )
+        .unwrap();
+
+        // ── Compare ──
+        let mut single = ArrowDigesterCore::<Sha256>::new(schema.as_ref());
+        single.update(&combined_batch);
+        let single_hash = single.finalize();
+
+        let mut split = ArrowDigesterCore::<Sha256>::new(schema.as_ref());
+        split.update(&batch1);
+        split.update(&batch2);
+        let split_hash = split.finalize();
+
+        assert_eq!(
+            encode(&single_hash),
+            encode(&split_hash),
+            "Batch split independence failed for recursive list/struct decomposition"
+        );
+    }
+
+    #[test]
+    fn hash_array_list_of_struct() {
+        // Verify hash_array works with List<Struct<...>> using the same recursive
+        // decomposition as the record-batch path.
+        let inner_struct = StructArray::from(vec![
+            (
+                Arc::new(Field::new("a", DataType::Int32, false)),
+                Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef,
+            ),
+            (
+                Arc::new(Field::new("b", DataType::Int32, false)),
+                Arc::new(Int32Array::from(vec![10, 20, 30])) as ArrayRef,
+            ),
+        ]);
+
+        let list_array = LargeListArray::new(
+            Arc::new(Field::new(
+                "item",
+                DataType::Struct(
+                    vec![
+                        Field::new("a", DataType::Int32, false),
+                        Field::new("b", DataType::Int32, false),
+                    ]
+                    .into(),
+                ),
+                false,
+            )),
+            OffsetBuffer::new(vec![0_i64, 2, 3].into()),
+            Arc::new(inner_struct) as ArrayRef,
+            Some(vec![true, true].into()),
+        );
+
+        let hash1 = ArrowDigesterCore::<Sha256>::hash_array(&list_array);
+        let hash2 = ArrowDigesterCore::<Sha256>::hash_array(&list_array);
+        assert_eq!(hash1, hash2, "hash_array should be deterministic");
+        assert_eq!(
+            hash1.len(),
+            32,
+            "core hash_array should return 32 bytes (SHA-256)"
+        );
     }
 }
diff --git a/src/lib.rs b/src/lib.rs
index a3745ff..685bcaf 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -10,15 +10,15 @@ use crate::arrow_digester_core::ArrowDigesterCore;
 
 const VERSION_BYTES: [u8; 3] = [0_u8, 0_u8, 1_u8]; // Version 0.0.1
 
-/// Maps `arrow_digester_core` function to a `sha_256` digester + versioning.
+/// Maps `ArrowDigesterCore` to a SHA-256 digester with version prefix.
 #[derive(Clone)]
 pub struct ArrowDigester {
     digester: ArrowDigesterCore<Sha256>,
 }
 
 impl ArrowDigester {
-    /// Create a new instance of `ArrowDigester` with SHA256 as the digester with the schema which will be enforce through each update.
-    pub fn new(schema: Schema) -> Self {
+    /// Create a new instance of `ArrowDigester` with SHA-256 as the digest algorithm. The schema will be enforced on each update.
+    pub fn new(schema: &Schema) -> Self {
         Self {
             digester: ArrowDigesterCore::<Sha256>::new(schema),
         }
@@ -34,17 +34,17 @@ impl ArrowDigester {
         Self::prepend_version_bytes(self.digester.finalize())
     }
 
-    /// Function to hash an Array in one go.
+    /// Hash an array in one go.
     pub fn hash_array(array: &dyn Array) -> Vec<u8> {
         Self::prepend_version_bytes(ArrowDigesterCore::<Sha256>::hash_array(array))
     }
 
-    /// Function to hash a complete `RecordBatch` in one go.
+    /// Hash a complete `RecordBatch` in one go.
     pub fn hash_record_batch(record_batch: &RecordBatch) -> Vec<u8> {
         Self::prepend_version_bytes(ArrowDigesterCore::<Sha256>::hash_record_batch(record_batch))
     }
 
-    /// Function to hash schema only.
+    /// Hash a schema only.
     pub fn hash_schema(schema: &Schema) -> Vec<u8> {
         Self::prepend_version_bytes(ArrowDigesterCore::<Sha256>::hash_schema(schema))
     }
diff --git a/src/pyarrow.rs b/src/pyarrow.rs
index 03277ba..0477b65 100644
--- a/src/pyarrow.rs
+++ b/src/pyarrow.rs
@@ -67,10 +67,10 @@ pub struct InternalPyArrowDigester {
 
 #[uniffi::export]
 impl InternalPyArrowDigester {
-    /// Create a new instance of `PyArrowDigester` with SHA256 as the digester with the schema which will be enforce through each update
+    /// Create a new instance of `PyArrowDigester` with SHA-256 as the digest algorithm. The schema will be enforced on each update.
     ///
     /// # Panics
-    /// The pointer must be a valid Arrow schema from Python's pyarrow, if failed to convert, it will panic
+    /// The pointer must be a valid Arrow schema from Python's pyarrow. Panics if conversion fails.
 
     #[uniffi::constructor]
     pub fn new(schema_ptr: u64) -> Self {
@@ -81,7 +81,7 @@ impl InternalPyArrowDigester {
             Schema::try_from(&ffi_schema).expect("Failed to convert FFI schema to Arrow schema")
         };
         Self {
-            digester: Arc::new(Mutex::new(ArrowDigester::new(schema))),
+            digester: Arc::new(Mutex::new(ArrowDigester::new(&schema))),
         }
     }
 
@@ -117,7 +117,7 @@ impl InternalPyArrowDigester {
     /// Consume the digester and finalize the hash computation
     ///
     /// # Panics
-    /// If failed to acquire lock on digester
+    /// Panics if it fails to acquire the lock on the digester.
     pub fn finalize(&self) -> Vec<u8> {
         self.digester
             .lock()
diff --git a/tests/arrow_digester.rs b/tests/arrow_digester.rs
index 303e258..602ac26 100644
--- a/tests/arrow_digester.rs
+++ b/tests/arrow_digester.rs
@@ -7,8 +7,9 @@ mod tests {
         array::{
             ArrayRef, BinaryArray, BooleanArray, Date32Array, Date64Array, Decimal32Array,
             Decimal64Array, DictionaryArray, Float32Array, Float64Array, Int16Array, Int32Array,
-            Int64Array, Int8Array, LargeBinaryArray, LargeListArray, LargeStringArray, ListArray,
-            RecordBatch, StringArray, StructArray, Time32MillisecondArray, Time32SecondArray,
+            Int64Array, Int8Array, LargeBinaryArray, LargeListArray, LargeListBuilder,
+            LargeStringArray, LargeStringBuilder, ListArray, ListBuilder, RecordBatch, StringArray,
+            StringBuilder, StructArray, Time32MillisecondArray, Time32SecondArray,
             Time64MicrosecondArray, Time64NanosecondArray, UInt16Array, UInt32Array, UInt64Array,
             UInt8Array,
         },
@@ -72,8 +73,8 @@ mod tests {
         // Empty Table Hashing Check
 
         assert_eq!(
-            encode(ArrowDigester::new(schema.clone()).finalize()),
-            "0000019c75bd0c40bd2fb15e878418c151c0b792c966476b35ded7d0f6fd1922cf5a00"
+            encode(ArrowDigester::new(&schema).finalize()),
+            "0000015955baf5303c8545360b2f0a253065e9d83d91cd44f0bc947c1904dfd9d09aac"
         );
 
         let batch = RecordBatch::try_new(
@@ -129,7 +130,7 @@ mod tests {
         // Hash the record batch
         assert_eq!(
             encode(ArrowDigester::hash_record_batch(&batch)),
-            "00000199f7ba7f6c7ec30ad487996c2b3eb6f0e1c750c318a32b09afcdfdce7de8c08e"
+            "000001487059003be1a84dbe29ba6e90ea50798a76d22e46e221b6a0c332421dc4062e"
         );
     }
 
@@ -139,7 +140,7 @@ mod tests {
         let hash = hex::encode(ArrowDigester::hash_array(&bool_array));
         assert_eq!(
             hash,
-            "000001f9abeb37d9395f359b48a379f0a8467c572b19ecc6cae9fa85e1bf627a52a8f3"
+            "00000185a9c99eba7bcfd9b14fd529b9534f2289319779270aa4a072f117cf90a6ac8b"
         );
     }
 
@@ -150,7 +151,7 @@ mod tests {
         let hash = hex::encode(ArrowDigester::hash_array(&int_array));
         assert_eq!(
             hash,
-            "00000127f2411e6839eb1e3fe706ac3f01e704c7b46357360fb2ddb8a08ec98e8ba4fa"
+            "0000018330f9b8796b9434cbf7bc028c18c58a2a739b980acf9995ce1e5d60b43b0138"
         );
     }
 
@@ -161,7 +162,7 @@ mod tests {
         let hash = hex::encode(ArrowDigester::hash_array(&time_array));
         assert_eq!(
             hash,
-            "0000019000b74aa80f685103a8cafc7e113aa8f33ccc0c94ea3713318d2cc2f3436baa"
+            "000001aba70469e596c735ec13c3d60a9db2d0e5515eb864f07ad5d24572b35f23eacc"
         );
     }
 
@@ -172,7 +173,7 @@ mod tests {
         let hash = hex::encode(ArrowDigester::hash_array(&time_array));
         assert_eq!(
             hash,
-            "00000195f12143d789f364a3ed52f7300f8f91dc21fbe00c34aed798ca8fd54182dea3"
+            "000001c96d705b1278f9ffe1b31fb307408768f14d961c44028a1d0f778dd61786ee26"
         );
     }
 
@@ -199,10 +200,10 @@ mod tests {
         let hash = hex::encode(ArrowDigester::hash_array(&binary_array));
         assert_eq!(
             hash,
-            "000001466801efd880d2acecd6c78915b5c2a51476870f9116912834d79de43a000071"
+            "0000018dc3a0e479d1335553546c8f23c36d75335cbd34805a6f96c5d5225b347fbc57"
         );
 
-        // Test large binary array with same data to ensure consistency
+        // Large binary array with same data should produce identical hash (type canonicalization)
         let large_binary_array = LargeBinaryArray::from(vec![
             Some(b"hello".as_ref()),
             None,
@@ -210,7 +211,7 @@ mod tests {
             Some(b"".as_ref()),
         ]);
 
-        assert_ne!(
+        assert_eq!(
             hex::encode(ArrowDigester::hash_array(&large_binary_array)),
             hash
         );
@@ -263,14 +264,14 @@ mod tests {
         let hash = hex::encode(ArrowDigester::hash_array(&string_array));
         assert_eq!(
             hash,
-            "000001811f2407a0d2e90ef9688514d37cd92225242e7614f02ef5ef36abcae73ca374"
+            "0000016255bde0141ebf26e08c31c96f6112e5e21d101ab8bb90d77f2c3eec02c62d3c"
         );
 
-        // Test large string array with same data to ensure consistency
+        // Large string array with same data should produce identical hash (type canonicalization)
         let large_string_array =
             LargeStringArray::from(vec![Some("hello"), None, Some("world"), Some("")]);
 
-        assert_ne!(
+        assert_eq!(
             hex::encode(ArrowDigester::hash_array(&large_string_array)),
             hash
         );
@@ -289,7 +290,7 @@ mod tests {
         let hash = hex::encode(ArrowDigester::hash_array(&list_array));
         assert_eq!(
             hash,
-            "00000114b8faee7c56d2a94d77095db599152df41aaf4d11e485035eebc94e8981f769"
+            "000001dc359d563a1ed210eb271b314612ea8343f0a0b0955b9053a9eb47962d27163c"
         );
 
         // Collision test: [[1, 2], [3]] vs [[1], [2, 3]]
@@ -324,7 +325,7 @@ mod tests {
 
         assert_eq!(
             encode(ArrowDigester::hash_array(&decimal32_array)),
-            "000001ef29250615f9d6ab34672c3b11dfa2dcda6e8e6164bc55899c13887f17705f5d"
+            "0000014f015bd5c4b6ce6e939a8c890333f3e110c2c28ef8014aafd352f8373791e547"
         );
 
         // Test Decimal64 (precision 10-18)
@@ -338,7 +339,7 @@ mod tests {
         .unwrap();
         assert_eq!(
             encode(ArrowDigester::hash_array(&decimal64_array)),
-            "000001efa4ed72641051233889c07775366cbf2e56eb4b0fcfd46653f5741e81786f08"
+            "000001dc08c7b9c583edecec36bc5dee21cd2edec9f402a651014fea5f8834d16ad737"
         );
 
         // Test Decimal128 (precision 19-38)
@@ -352,7 +353,7 @@ mod tests {
         .unwrap();
         assert_eq!(
             hex::encode(ArrowDigester::hash_array(&decimal128_array)),
-            "00000155cc4d81a048dbca001ca8581673a5a6c93efd870d358df211a545c2af9b658d"
+            "0000011e3b33d28771b3593fd5dc4b68af8091a1ba9cd493ade374e7368e213bef244e"
         );
     }
 
@@ -424,12 +425,12 @@ mod tests {
 
         let batch2 = RecordBatch::try_new(Arc::clone(&schema), vec![uids2, fake_data2]).unwrap();
         // Hash both record batches
-        let mut digester = ArrowDigester::new((*schema).clone());
+        let mut digester = ArrowDigester::new(schema.as_ref());
         digester.update(&batch1);
         digester.update(&batch2);
         assert_eq!(
             encode(digester.finalize()),
-            "0000018aa41f456395dc1d26c8d82895d6c81ed9453c1bb3f401fee637131baa60553e"
+            "0000019f5fa370d315a4b4f2314be7b7284a0549b70ad4e21e584fdebf441ad02f44f0"
         );
     }
 
@@ -507,7 +508,7 @@ mod tests {
         .unwrap();
 
         // Hash batches incrementally
-        let mut digester_batches = ArrowDigester::new((*schema).clone());
+        let mut digester_batches = ArrowDigester::new(schema.as_ref());
         digester_batches.update(&batch1);
         digester_batches.update(&batch2);
         let hash_batches = encode(digester_batches.finalize());
@@ -522,7 +523,7 @@ mod tests {
         )
         .unwrap();
 
-        let mut digester_single = ArrowDigester::new((*schema).clone());
+        let mut digester_single = ArrowDigester::new(schema.as_ref());
         digester_single.update(&combined_batch);
         let hash_single = encode(digester_single.finalize());
 
@@ -559,7 +560,7 @@ mod tests {
         .unwrap();
 
         // Hash batches incrementally
-        let mut digester_batches = ArrowDigester::new((*schema).clone());
+        let mut digester_batches = ArrowDigester::new(schema.as_ref());
         digester_batches.update(&batch1);
         digester_batches.update(&batch2);
         let hash_batches = encode(digester_batches.finalize());
@@ -588,7 +589,7 @@ mod tests {
         )
         .unwrap();
 
-        let mut digester_single = ArrowDigester::new((*schema).clone());
+        let mut digester_single = ArrowDigester::new(schema.as_ref());
         digester_single.update(&combined_batch);
         let hash_single = encode(digester_single.finalize());
 
@@ -603,7 +604,7 @@ mod tests {
     /// Two schemas with the same struct fields in different order should produce identical schema hashes.
     /// Bug: `data_type_to_value()` preserves struct field insertion order in the JSON Vec.
     #[test]
-    #[ignore = "Bug: struct fields not sorted in data_type_to_value (Issue 1)"]
+
     fn struct_field_order_in_schema_should_not_affect_hash() {
         let schema1 = Schema::new(vec![Field::new(
             "my_struct",
@@ -640,7 +641,7 @@ mod tests {
 
     /// Record batches with struct columns whose inner fields are reordered should produce identical hashes.
     #[test]
-    #[ignore = "Bug: struct fields not sorted in data_type_to_value (Issue 1)"]
+
     fn struct_field_order_in_record_batch_should_not_affect_hash() {
         let schema1 = Arc::new(Schema::new(vec![Field::new(
             "s",
@@ -667,8 +668,7 @@ mod tests {
         )]));
 
         let ints = Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef;
-        let bools =
-            Arc::new(BooleanArray::from(vec![Some(true), Some(false), None])) as ArrayRef;
+        let bools = Arc::new(BooleanArray::from(vec![Some(true), Some(false), None])) as ArrayRef;
 
         let struct1 = StructArray::from(vec![
             (
@@ -692,10 +692,8 @@ mod tests {
             ),
         ]);
 
-        let batch1 =
-            RecordBatch::try_new(schema1, vec![Arc::new(struct1) as ArrayRef]).unwrap();
-        let batch2 =
-            RecordBatch::try_new(schema2, vec![Arc::new(struct2) as ArrayRef]).unwrap();
+        let batch1 = RecordBatch::try_new(schema1, vec![Arc::new(struct1) as ArrayRef]).unwrap();
+        let batch2 = RecordBatch::try_new(schema2, vec![Arc::new(struct2) as ArrayRef]).unwrap();
 
         assert_eq!(
             encode(ArrowDigester::hash_record_batch(&batch1)),
@@ -707,7 +705,7 @@ mod tests {
     // ── Issue 5: Type canonicalization (Binary/LargeBinary, Utf8/LargeUtf8, List/LargeList) ──
 
     #[test]
-    #[ignore = "Bug: no type canonicalization for Binary vs LargeBinary (Issue 5)"]
+
     fn binary_and_large_binary_schema_should_hash_equal() {
         let schema1 = Schema::new(vec![Field::new("col", DataType::Binary, true)]);
         let schema2 = Schema::new(vec![Field::new("col", DataType::LargeBinary, true)]);
@@ -720,7 +718,7 @@ mod tests {
     }
 
     #[test]
-    #[ignore = "Bug: no type canonicalization for Utf8 vs LargeUtf8 (Issue 5)"]
+
     fn utf8_and_large_utf8_schema_should_hash_equal() {
         let schema1 = Schema::new(vec![Field::new("col", DataType::Utf8, true)]);
         let schema2 = Schema::new(vec![Field::new("col", DataType::LargeUtf8, true)]);
@@ -733,7 +731,7 @@ mod tests {
     }
 
     #[test]
-    #[ignore = "Bug: no type canonicalization for List vs LargeList (Issue 5)"]
+
     fn list_and_large_list_schema_should_hash_equal() {
         let list_field = Field::new("item", DataType::Int32, true);
         let schema1 = Schema::new(vec![Field::new(
@@ -755,19 +753,74 @@ mod tests {
     }
 
     #[test]
-    #[ignore = "Bug: no type canonicalization for Binary vs LargeBinary in hash_array (Issue 5)"]
-    fn binary_and_large_binary_array_should_hash_equal() {
-        let bin = BinaryArray::from(vec![
-            Some(b"hello".as_ref()),
+    fn list_and_large_list_array_should_hash_equal() {
+        let list = ListArray::from_iter_primitive::<Int32Type, _, _>(vec![
+            Some(vec![Some(1), Some(2)]),
             None,
-            Some(b"world".as_ref()),
+            Some(vec![Some(3)]),
         ]);
-        let large_bin = LargeBinaryArray::from(vec![
-            Some(b"hello".as_ref()),
+        let large_list = LargeListArray::from_iter_primitive::<Int32Type, _, _>(vec![
+            Some(vec![Some(1), Some(2)]),
             None,
-            Some(b"world".as_ref()),
+            Some(vec![Some(3)]),
         ]);
 
+        assert_eq!(
+            encode(ArrowDigester::hash_array(&list)),
+            encode(ArrowDigester::hash_array(&large_list)),
+            "List and LargeList arrays with same data should produce same hash"
+        );
+    }
+
+    #[test]
+    fn list_and_large_list_record_batch_should_hash_equal() {
+        let list_field = Field::new("item", DataType::Int32, true);
+        let schema1 = Arc::new(Schema::new(vec![Field::new(
+            "col",
+            DataType::List(Box::new(list_field.clone()).into()),
+            true,
+        )]));
+        let schema2 = Arc::new(Schema::new(vec![Field::new(
+            "col",
+            DataType::LargeList(Box::new(list_field).into()),
+            true,
+        )]));
+
+        let batch1 = RecordBatch::try_new(
+            schema1,
+            vec![
+                Arc::new(ListArray::from_iter_primitive::<Int32Type, _, _>(vec![
+                    Some(vec![Some(10), Some(20)]),
+                    None,
+                ])) as ArrayRef,
+            ],
+        )
+        .unwrap();
+
+        let batch2 = RecordBatch::try_new(
+            schema2,
+            vec![
+                Arc::new(LargeListArray::from_iter_primitive::<Int32Type, _, _>(
+                    vec![Some(vec![Some(10), Some(20)]), None],
+                )) as ArrayRef,
+            ],
+        )
+        .unwrap();
+
+        assert_eq!(
+            encode(ArrowDigester::hash_record_batch(&batch1)),
+            encode(ArrowDigester::hash_record_batch(&batch2)),
+            "List and LargeList record batches with same data should produce same hash"
+        );
+    }
+
+    #[test]
+
+    fn binary_and_large_binary_array_should_hash_equal() {
+        let bin = BinaryArray::from(vec![Some(b"hello".as_ref()), None, Some(b"world".as_ref())]);
+        let large_bin =
+            LargeBinaryArray::from(vec![Some(b"hello".as_ref()), None, Some(b"world".as_ref())]);
+
         assert_eq!(
             encode(ArrowDigester::hash_array(&bin)),
             encode(ArrowDigester::hash_array(&large_bin)),
@@ -776,7 +829,7 @@ mod tests {
     }
 
     #[test]
-    #[ignore = "Bug: no type canonicalization for Utf8 vs LargeUtf8 in hash_array (Issue 5)"]
+
     fn utf8_and_large_utf8_array_should_hash_equal() {
         let arr = StringArray::from(vec![Some("hello"), None, Some("world")]);
         let large_arr = LargeStringArray::from(vec![Some("hello"), None, Some("world")]);
@@ -789,7 +842,35 @@ mod tests {
     }
 
     #[test]
-    #[ignore = "Bug: no type canonicalization for Binary vs LargeBinary in hash_record_batch (Issue 5)"]
+    fn utf8_and_large_utf8_record_batch_should_hash_equal() {
+        let schema1 = Arc::new(Schema::new(vec![Field::new("col", DataType::Utf8, true)]));
+        let schema2 = Arc::new(Schema::new(vec![Field::new(
+            "col",
+            DataType::LargeUtf8,
+            true,
+        )]));
+
+        let batch1 = RecordBatch::try_new(
+            schema1,
+            vec![Arc::new(StringArray::from(vec![Some("abc"), None])) as ArrayRef],
+        )
+        .unwrap();
+
+        let batch2 = RecordBatch::try_new(
+            schema2,
+            vec![Arc::new(LargeStringArray::from(vec![Some("abc"), None])) as ArrayRef],
+        )
+        .unwrap();
+
+        assert_eq!(
+            encode(ArrowDigester::hash_record_batch(&batch1)),
+            encode(ArrowDigester::hash_record_batch(&batch2)),
+            "Utf8 and LargeUtf8 record batches with same data should produce same hash"
+        );
+    }
+
+    #[test]
+
     fn binary_and_large_binary_record_batch_should_hash_equal() {
         let schema1 = Arc::new(Schema::new(vec![Field::new("col", DataType::Binary, true)]));
         let schema2 = Arc::new(Schema::new(vec![Field::new(
@@ -800,19 +881,13 @@ mod tests {
 
         let batch1 = RecordBatch::try_new(
             schema1,
-            vec![Arc::new(BinaryArray::from(vec![
-                Some(b"abc".as_ref()),
-                None,
-            ])) as ArrayRef],
+            vec![Arc::new(BinaryArray::from(vec![Some(b"abc".as_ref()), None])) as ArrayRef],
         )
         .unwrap();
 
         let batch2 = RecordBatch::try_new(
             schema2,
-            vec![Arc::new(LargeBinaryArray::from(vec![
-                Some(b"abc".as_ref()),
-                None,
-            ])) as ArrayRef],
+            vec![Arc::new(LargeBinaryArray::from(vec![Some(b"abc".as_ref()), None])) as ArrayRef],
         )
         .unwrap();
 
@@ -823,10 +898,184 @@ mod tests {
         );
     }
 
+    // ── Deep nested type normalization ──────────────────────────────────
+
+    #[test]
+    fn list_of_utf8_vs_large_list_of_large_utf8_array_should_hash_equal() {
+        // List(Utf8) vs LargeList(LargeUtf8) — normalization must be recursive
+        let list = {
+            let mut builder = ListBuilder::new(StringBuilder::new());
+            builder.values().append_value("hello");
+            builder.values().append_value("world");
+            builder.append(true);
+            builder.values().append_value("foo");
+            builder.append(true);
+            builder.finish()
+        };
+
+        let large_list = {
+            let mut builder = LargeListBuilder::new(LargeStringBuilder::new());
+            builder.values().append_value("hello");
+            builder.values().append_value("world");
+            builder.append(true);
+            builder.values().append_value("foo");
+            builder.append(true);
+            builder.finish()
+        };
+
+        assert_eq!(
+            encode(ArrowDigester::hash_array(&list)),
+            encode(ArrowDigester::hash_array(&large_list)),
+            "List(Utf8) and LargeList(LargeUtf8) should produce same hash"
+        );
+    }
+
+    #[test]
+    fn list_of_utf8_vs_large_list_of_large_utf8_schema_should_hash_equal() {
+        let schema1 = Schema::new(vec![Field::new(
+            "col",
+            DataType::List(Arc::new(Field::new("item", DataType::Utf8, true))),
+            true,
+        )]);
+        let schema2 = Schema::new(vec![Field::new(
+            "col",
+            DataType::LargeList(Arc::new(Field::new("item", DataType::LargeUtf8, true))),
+            true,
+        )]);
+
+        assert_eq!(
+            encode(ArrowDigester::hash_schema(&schema1)),
+            encode(ArrowDigester::hash_schema(&schema2)),
+            "List(Utf8) and LargeList(LargeUtf8) schemas should be logically equivalent"
+        );
+    }
+
+    #[test]
+    fn struct_with_list_utf8_vs_large_variants_record_batch_should_hash_equal() {
+        // Struct({items: List(Utf8), name: Utf8}) vs Struct({items: LargeList(LargeUtf8), name: LargeUtf8})
+        let schema1 = Arc::new(Schema::new(vec![Field::new(
+            "s",
+            DataType::Struct(
+                vec![
+                    Field::new(
+                        "items",
+                        DataType::List(Arc::new(Field::new("item", DataType::Utf8, true))),
+                        true,
+                    ),
+                    Field::new("name", DataType::Utf8, true),
+                ]
+                .into(),
+            ),
+            false,
+        )]));
+
+        let schema2 = Arc::new(Schema::new(vec![Field::new(
+            "s",
+            DataType::Struct(
+                vec![
+                    Field::new(
+                        "items",
+                        DataType::LargeList(Arc::new(Field::new(
+                            "item",
+                            DataType::LargeUtf8,
+                            true,
+                        ))),
+                        true,
+                    ),
+                    Field::new("name", DataType::LargeUtf8, true),
+                ]
+                .into(),
+            ),
+            false,
+        )]));
+
+        // Build struct with List(Utf8)
+        let list1 = {
+            let mut builder = ListBuilder::new(StringBuilder::new());
+            builder.values().append_value("a");
+            builder.values().append_value("b");
+            builder.append(true);
+            builder.values().append_value("c");
+            builder.append(true);
+            builder.finish()
+        };
+        let names1 = StringArray::from(vec![Some("Alice"), Some("Bob")]);
+        let struct1 = StructArray::from(vec![
+            (
+                Arc::new(Field::new(
+                    "items",
+                    DataType::List(Arc::new(Field::new("item", DataType::Utf8, true))),
+                    true,
+                )),
+                Arc::new(list1) as ArrayRef,
+            ),
+            (
+                Arc::new(Field::new("name", DataType::Utf8, true)),
+                Arc::new(names1) as ArrayRef,
+            ),
+        ]);
+
+        // Build struct with LargeList(LargeUtf8)
+        let list2 = {
+            let mut builder = LargeListBuilder::new(LargeStringBuilder::new());
+            builder.values().append_value("a");
+            builder.values().append_value("b");
+            builder.append(true);
+            builder.values().append_value("c");
+            builder.append(true);
+            builder.finish()
+        };
+        let names2 = LargeStringArray::from(vec![Some("Alice"), Some("Bob")]);
+        let struct2 = StructArray::from(vec![
+            (
+                Arc::new(Field::new(
+                    "items",
+                    DataType::LargeList(Arc::new(Field::new("item", DataType::LargeUtf8, true))),
+                    true,
+                )),
+                Arc::new(list2) as ArrayRef,
+            ),
+            (
+                Arc::new(Field::new("name", DataType::LargeUtf8, true)),
+                Arc::new(names2) as ArrayRef,
+            ),
+        ]);
+
+        let batch1 = RecordBatch::try_new(schema1, vec![Arc::new(struct1) as ArrayRef]).unwrap();
+        let batch2 = RecordBatch::try_new(schema2, vec![Arc::new(struct2) as ArrayRef]).unwrap();
+
+        assert_eq!(
+            encode(ArrowDigester::hash_record_batch(&batch1)),
+            encode(ArrowDigester::hash_record_batch(&batch2)),
+            "Struct with List(Utf8) should hash same as Struct with LargeList(LargeUtf8)"
+        );
+    }
+
+    #[test]
+    fn streaming_with_type_equivalent_schemas_should_succeed() {
+        // Create digester with Utf8 schema, feed batch with LargeUtf8 schema
+        let schema_utf8 = Schema::new(vec![Field::new("col", DataType::Utf8, true)]);
+
+        let mut digester = ArrowDigester::new(&schema_utf8);
+
+        let batch = RecordBatch::try_new(
+            Arc::new(Schema::new(vec![Field::new(
+                "col",
+                DataType::LargeUtf8,
+                true,
+            )])),
+            vec![Arc::new(LargeStringArray::from(vec![Some("hello"), None])) as ArrayRef],
+        )
+        .unwrap();
+
+        digester.update(&batch); // Should NOT panic — schemas are logically equivalent
+        let _hash = encode(digester.finalize());
+    }
+
     // ── Issue 6: Dictionary-encoded array equivalence ───────────────────
 
     #[test]
-    #[ignore = "Bug: Dictionary arrays hit todo!() panic (Issue 6)"]
+
     fn dictionary_utf8_should_hash_same_as_plain_string() {
         let plain = StringArray::from(vec![Some("apple"), Some("banana"), Some("apple")]);
 
@@ -842,13 +1091,12 @@ mod tests {
     }
 
     #[test]
-    #[ignore = "Bug: Dictionary arrays hit todo!() panic (Issue 6)"]
+
     fn dictionary_int_values_should_hash_same_as_plain() {
         let plain = StringArray::from(vec![Some("x"), Some("y"), Some("x")]);
 
-        let dict: DictionaryArray<Int8Type> = vec![Some("x"), Some("y"), Some("x")]
-            .into_iter()
-            .collect();
+        let dict: DictionaryArray<Int8Type> =
+            vec![Some("x"), Some("y"), Some("x")].into_iter().collect();
 
         assert_eq!(
             encode(ArrowDigester::hash_array(&plain)),
@@ -858,13 +1106,12 @@ mod tests {
     }
 
     #[test]
-    #[ignore = "Bug: Dictionary arrays hit todo!() panic (Issue 6)"]
+
     fn dictionary_with_nulls_should_hash_same_as_plain() {
         let plain = StringArray::from(vec![Some("a"), None, Some("b"), None]);
 
-        let dict: DictionaryArray<Int32Type> = vec![Some("a"), None, Some("b"), None]
-            .into_iter()
-            .collect();
+        let dict: DictionaryArray<Int32Type> =
+            vec![Some("a"), None, Some("b"), None].into_iter().collect();
 
         assert_eq!(
             encode(ArrowDigester::hash_array(&plain)),
@@ -877,14 +1124,14 @@ mod tests {
 
     /// Feeding a batch with reordered columns into a digester should not panic.
     #[test]
-    #[ignore = "Bug: update() uses strict schema equality including column order (Issue 7)"]
+
     fn streaming_update_with_reordered_columns_should_succeed() {
         let schema = Schema::new(vec![
             Field::new("a", DataType::Int32, false),
             Field::new("b", DataType::Boolean, true),
         ]);
 
-        let mut digester = ArrowDigester::new(schema);
+        let mut digester = ArrowDigester::new(&schema);
 
         // Batch with columns in DIFFERENT order: [b, a]
         let reordered_schema = Arc::new(Schema::new(vec![
@@ -908,7 +1155,7 @@ mod tests {
     /// A digester fed batches with different column orders should produce the same hash
     /// as one fed batches in the original order.
     #[test]
-    #[ignore = "Bug: update() uses strict schema equality including column order (Issue 7)"]
+
     fn streaming_reordered_columns_produce_same_hash() {
         let schema_ab = Schema::new(vec![
             Field::new("a", DataType::Int32, false),
@@ -934,12 +1181,12 @@ mod tests {
         .unwrap();
 
         // Digester fed batch in original order [a, b]
-        let mut digester1 = ArrowDigester::new(schema_ab.clone());
+        let mut digester1 = ArrowDigester::new(&schema_ab);
         digester1.update(&batch_ab);
         let hash1 = encode(digester1.finalize());
 
         // Digester fed batch in reversed order [b, a]
-        let mut digester2 = ArrowDigester::new(schema_ab);
+        let mut digester2 = ArrowDigester::new(&schema_ab);
         digester2.update(&batch_ba);
         let hash2 = encode(digester2.finalize());
 
diff --git a/tests/digest_bytes.rs b/tests/digest_bytes.rs
index 5c6016f..f64a8b6 100644
--- a/tests/digest_bytes.rs
+++ b/tests/digest_bytes.rs
@@ -1,2 +1,912 @@
+/// Manual byte-level verification tests for the Starfix hashing specification.
+///
+/// Each test in this module manually computes the expected SHA-256 hash by
+/// feeding the exact bytes described in `docs/byte-layout-spec.md` into a
+/// fresh SHA-256 hasher, then asserts that the library produces the identical
+/// result. This serves as both a conformance check and a reference
+/// implementation for anyone porting Starfix to another language.
 #[cfg(test)]
-mod tests {}
+mod tests {
+    #![expect(clippy::unwrap_used, reason = "Okay in test")]
+    #![expect(clippy::redundant_clone, reason = "Clones for clarity in test setup")]
+    #![expect(clippy::absolute_paths, reason = "One-off use in test")]
+
+    use std::sync::Arc;
+
+    use arrow::array::{
+        ArrayRef, BinaryArray, BooleanArray, Int32Array, LargeListArray, LargeStringArray,
+        RecordBatch, StringArray, StructArray,
+    };
+    use arrow::buffer::NullBuffer;
+    use arrow_schema::{DataType, Field, Schema};
+    use sha2::{Digest as _, Sha256};
+    use starfix::ArrowDigester;
+
+    const VERSION: [u8; 3] = [0x00, 0x00, 0x01];
+
+    // ── Helper ───────────────────────────────────────────────────────────
+
+    /// Prepend the 3-byte version prefix to a 32-byte SHA-256 digest,
+    /// returning the full 35-byte Starfix hash.
+    fn with_version(digest: Vec<u8>) -> Vec<u8> {
+        let mut out = VERSION.to_vec();
+        out.extend(digest);
+        out
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example A: Simple Two-Column Table (record batch)
+    //   Schema: {age: Int32 non-nullable, name: LargeUtf8 nullable}
+    //   Row 0:  age=25, name="Alice"
+    //   Row 1:  age=30, name=NULL
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_a_two_column_table() {
+        // ── Build the table ──────────────────────────────────────────────
+        let schema = Schema::new(vec![
+            Field::new("age", DataType::Int32, false),
+            Field::new("name", DataType::LargeUtf8, true),
+        ]);
+        let batch = RecordBatch::try_new(
+            Arc::new(schema.clone()),
+            vec![
+                Arc::new(Int32Array::from(vec![25_i32, 30])) as ArrayRef,
+                Arc::new(LargeStringArray::from(vec![Some("Alice"), None])) as ArrayRef,
+            ],
+        )
+        .unwrap();
+
+        // ── Step 1: Schema digest ────────────────────────────────────────
+        let schema_json = r#"{"age":{"data_type":"Int32","nullable":false},"name":{"data_type":"LargeUtf8","nullable":true}}"#;
+        let schema_digest = Sha256::digest(schema_json.as_bytes());
+
+        // Verify the library agrees on schema hash
+        assert_eq!(
+            ArrowDigester::hash_schema(&schema),
+            with_version(schema_digest.to_vec()),
+            "Schema hash mismatch — canonical JSON may differ"
+        );
+
+        // ── Step 2: Field "age" (Int32, non-nullable) ────────────────────
+        // Values: [25, 30]  →  little-endian bytes
+        let mut age_data = Sha256::new();
+        age_data.update(25_i32.to_le_bytes()); // 19 00 00 00
+        age_data.update(30_i32.to_le_bytes()); // 1e 00 00 00
+        let age_data_finalized = age_data.finalize();
+
+        // ── Step 3: Field "name" (LargeUtf8, nullable) ───────────────────
+        // Values: ["Alice", NULL]
+        //
+        // Validity BitVec (Lsb0, u8 storage):
+        //   bit 0 = 1 (valid), bit 1 = 0 (null)
+        //   → u8 word = 0b01 = 1
+        //   bit_count = 2
+        let bit_count: u64 = 2;
+        let validity_word: u8 = 1; // bits: [1, 0] in Lsb0
+
+        // Data bytes (only valid elements):
+        //   "Alice" → len=5 as u64 LE, then UTF-8 bytes
+        //   NULL → skipped
+        let mut name_data = Sha256::new();
+        name_data.update(5_u64.to_le_bytes()); // length prefix
+        name_data.update(b"Alice"); // raw UTF-8 bytes
+                                    // NULL element: nothing fed
+        let name_data_finalized = name_data.finalize();
+
+        // ── Step 4: Final combination ────────────────────────────────────
+        // Fields in alphabetical order: "age", "name"
+        let mut final_digest = Sha256::new();
+
+        // Schema
+        final_digest.update(schema_digest);
+
+        // Field "age" (non-nullable → just the data digest)
+        final_digest.update(age_data_finalized);
+
+        // Field "name" (nullable → bit_count + validity words + data digest)
+        final_digest.update(bit_count.to_le_bytes()); // 02 00 00 00 00 00 00 00
+        final_digest.update(validity_word.to_le_bytes()); // 01
+        final_digest.update(name_data_finalized);
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        // ── Verify ───────────────────────────────────────────────────────
+        assert_eq!(
+            ArrowDigester::hash_record_batch(&batch),
+            expected,
+            "Example A: two-column table hash mismatch"
+        );
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example B: Boolean Array with Nulls (hash_array API)
+    //   BooleanArray [true, NULL, false, true]  (nullable)
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_b_boolean_array_with_nulls() {
+        let array = BooleanArray::from(vec![Some(true), None, Some(false), Some(true)]);
+
+        // ── Type metadata ────────────────────────────────────────────────
+        // data_type_to_value(Boolean) → JSON value "Boolean"
+        // serde_json::to_string(json!("Boolean")) → "\"Boolean\""
+        let type_json = b"\"Boolean\"";
+
+        // ── Validity bits (Lsb0, u8 storage) ──────────────────────────
+        // [valid, null, valid, valid] → bits [1, 0, 1, 1]
+        // Lsb0 in u8: bit0=1, bit1=0, bit2=1, bit3=1 → 0b1101 = 13
+        let bit_count: u64 = 4;
+        let validity_word: u8 = 0b1101; // = 13
+
+        // ── Data bits (Lsb0 packed, valid values only) ───────────────────
+        // Valid values: [true, false, true] → 3 bits
+        // Lsb0: bit0=1(true), bit1=0(false), bit2=1(true) → 0b101 = 0x05
+        let mut data_digest = Sha256::new();
+        data_digest.update([0x05_u8]);
+        let data_finalized = data_digest.finalize();
+
+        // ── Final combination ────────────────────────────────────────────
+        let mut final_digest = Sha256::new();
+        final_digest.update(type_json);
+        // Nullable finalization
+        final_digest.update(bit_count.to_le_bytes());
+        final_digest.update(validity_word.to_le_bytes());
+        final_digest.update(data_finalized);
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        assert_eq!(
+            ArrowDigester::hash_array(&array),
+            expected,
+            "Example B: boolean array hash mismatch"
+        );
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example C: Non-Nullable Int32 Array (hash_array API)
+    //   Int32Array [1, 2, 3]  (non-nullable)
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_c_non_nullable_int32_array() {
+        let array = Int32Array::from(vec![1_i32, 2, 3]);
+
+        // ── Type metadata ────────────────────────────────────────────────
+        let type_json = b"\"Int32\"";
+
+        // ── Data (contiguous LE buffer) ──────────────────────────────────
+        // [1, 2, 3] as i32 LE:
+        //   01 00 00 00  02 00 00 00  03 00 00 00
+        let mut data_digest = Sha256::new();
+        data_digest.update(1_i32.to_le_bytes());
+        data_digest.update(2_i32.to_le_bytes());
+        data_digest.update(3_i32.to_le_bytes());
+        let data_finalized = data_digest.finalize();
+
+        // ── Final (non-nullable) ─────────────────────────────────────────
+        let mut final_digest = Sha256::new();
+        final_digest.update(type_json);
+        final_digest.update(data_finalized);
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        assert_eq!(
+            ArrowDigester::hash_array(&array),
+            expected,
+            "Example C: non-nullable int32 array hash mismatch"
+        );
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example D: Non-Nullable Binary Array (hash_array API)
+    //   BinaryArray [b"hi", b""]  (non-nullable)
+    //   Tests type canonicalization: Binary → LargeBinary
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_d_non_nullable_binary_array() {
+        let array = BinaryArray::from(vec![b"hi".as_ref(), b"".as_ref()]);
+
+        // ── Type metadata (canonicalized) ────────────────────────────────
+        // Binary → LargeBinary in canonical form
+        let type_json = b"\"LargeBinary\"";
+
+        // ── Data ─────────────────────────────────────────────────────────
+        // b"hi": len=2 as u64 LE + raw bytes
+        // b"":   len=0 as u64 LE + (no bytes)
+        let mut data_digest = Sha256::new();
+        data_digest.update(2_u64.to_le_bytes()); // 02 00 00 00 00 00 00 00
+        data_digest.update(b"hi"); // 68 69
+        data_digest.update(0_u64.to_le_bytes()); // 00 00 00 00 00 00 00 00
+        let data_finalized = data_digest.finalize();
+
+        // ── Final (non-nullable) ─────────────────────────────────────────
+        let mut final_digest = Sha256::new();
+        final_digest.update(type_json);
+        final_digest.update(data_finalized);
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        assert_eq!(
+            ArrowDigester::hash_array(&array),
+            expected,
+            "Example D: non-nullable binary array hash mismatch"
+        );
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example E: Column-Order Independence
+    //   Batch 1: columns [x: Int32, y: Boolean nullable] → x=10, y=true
+    //   Batch 2: columns [y: Boolean nullable, x: Int32] → y=true, x=10
+    //   Both must produce the same hash.
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_e_column_order_independence() {
+        let ints = Arc::new(Int32Array::from(vec![10_i32])) as ArrayRef;
+        let bools = Arc::new(BooleanArray::from(vec![Some(true)])) as ArrayRef;
+
+        let batch_xy = RecordBatch::try_new(
+            Arc::new(Schema::new(vec![
+                Field::new("x", DataType::Int32, false),
+                Field::new("y", DataType::Boolean, true),
+            ])),
+            vec![Arc::clone(&ints), Arc::clone(&bools)],
+        )
+        .unwrap();
+
+        let batch_yx = RecordBatch::try_new(
+            Arc::new(Schema::new(vec![
+                Field::new("y", DataType::Boolean, true),
+                Field::new("x", DataType::Int32, false),
+            ])),
+            vec![Arc::clone(&bools), Arc::clone(&ints)],
+        )
+        .unwrap();
+
+        // ── Manual computation ───────────────────────────────────────────
+        let schema_json = r#"{"x":{"data_type":"Int32","nullable":false},"y":{"data_type":"Boolean","nullable":true}}"#;
+        let schema_digest = Sha256::digest(schema_json.as_bytes());
+
+        // Field "x" (Int32, non-nullable): value 10
+        let mut x_data = Sha256::new();
+        x_data.update(10_i32.to_le_bytes()); // 0a 00 00 00
+        let x_finalized = x_data.finalize();
+
+        // Field "y" (Boolean, nullable): value true (valid)
+        // Validity: [1] → bit_count=1, word=1 (Lsb0)
+        // Data: [true] Lsb0 → bit0=1 → 0x01
+        let bit_count: u64 = 1;
+        let validity_word: u8 = 1;
+
+        let mut y_data = Sha256::new();
+        y_data.update([0x01_u8]); // true in Lsb0 = 0000_0001
+        let y_finalized = y_data.finalize();
+
+        // Final combination: schema, then fields alphabetically (x, y)
+        let mut final_digest = Sha256::new();
+        final_digest.update(schema_digest);
+        // x (non-nullable)
+        final_digest.update(x_finalized);
+        // y (nullable)
+        final_digest.update(bit_count.to_le_bytes());
+        final_digest.update(validity_word.to_le_bytes());
+        final_digest.update(y_finalized);
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        // ── Verify both column orderings produce the same hash ───────────
+        let hash_xy = ArrowDigester::hash_record_batch(&batch_xy);
+        let hash_yx = ArrowDigester::hash_record_batch(&batch_yx);
+
+        assert_eq!(hash_xy, hash_yx, "Column order should not affect hash");
+        assert_eq!(
+            hash_xy, expected,
+            "Example E: column-order independence hash mismatch"
+        );
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example F: Type Equivalence (Utf8 vs LargeUtf8, hash_array API)
+    //   StringArray ["ab"]  (Utf8, non-nullable)
+    //   LargeStringArray ["ab"]  (LargeUtf8, non-nullable)
+    //   Both must produce the same hash.
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_f_utf8_large_utf8_equivalence() {
+        let small = StringArray::from(vec!["ab"]);
+        let large = LargeStringArray::from(vec!["ab"]);
+
+        // ── Manual computation ───────────────────────────────────────────
+        // Type metadata: both canonicalize to "LargeUtf8"
+        let type_json = b"\"LargeUtf8\"";
+
+        // Data: "ab" → len=2 as u64 LE + UTF-8 bytes
+        let mut data_digest = Sha256::new();
+        data_digest.update(2_u64.to_le_bytes());
+        data_digest.update(b"ab");
+        let data_finalized = data_digest.finalize();
+
+        let mut final_digest = Sha256::new();
+        final_digest.update(type_json);
+        final_digest.update(data_finalized);
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        assert_eq!(
+            ArrowDigester::hash_array(&small),
+            expected,
+            "Example F: Utf8 hash mismatch"
+        );
+        assert_eq!(
+            ArrowDigester::hash_array(&large),
+            expected,
+            "Example F: LargeUtf8 hash mismatch"
+        );
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example G: Nullable Int32 Array with Nulls (hash_array API)
+    //   Int32Array [Some(42), None, Some(-7), Some(0)]
+    //   Tests nullable fixed-size path with actual nulls.
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_g_nullable_int32_with_nulls() {
+        let array = Int32Array::from(vec![Some(42), None, Some(-7), Some(0)]);
+
+        // ── Type metadata ────────────────────────────────────────────────
+        let type_json = b"\"Int32\"";
+
+        // ── Validity bits (Lsb0, u8) ──────────────────────────────────
+        // [valid, null, valid, valid] → bits [1, 0, 1, 1] → 0b1101 = 13
+        let bit_count: u64 = 4;
+        let validity_word: u8 = 0b1101; // 13
+
+        // ── Data (only valid elements, in order) ─────────────────────────
+        // 42 as i32 LE:  2a 00 00 00
+        // -7 as i32 LE:  f9 ff ff ff
+        //  0 as i32 LE:  00 00 00 00
+        let mut data_digest = Sha256::new();
+        data_digest.update(42_i32.to_le_bytes());
+        data_digest.update((-7_i32).to_le_bytes());
+        data_digest.update(0_i32.to_le_bytes());
+        let data_finalized = data_digest.finalize();
+
+        // ── Final (nullable) ─────────────────────────────────────────────
+        let mut final_digest = Sha256::new();
+        final_digest.update(type_json);
+        final_digest.update(bit_count.to_le_bytes());
+        final_digest.update(validity_word.to_le_bytes());
+        final_digest.update(data_finalized);
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        assert_eq!(
+            ArrowDigester::hash_array(&array),
+            expected,
+            "Example G: nullable int32 array hash mismatch"
+        );
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example H: Nullable String Array with Nulls (hash_array API)
+    //   StringArray [Some("hello"), None, Some("world"), Some("")]
+    //   Tests nullable variable-length path with type canonicalization.
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_h_nullable_string_array_with_nulls() {
+        let array = StringArray::from(vec![Some("hello"), None, Some("world"), Some("")]);
+
+        // ── Type metadata (canonicalized) ────────────────────────────────
+        // Utf8 → LargeUtf8
+        let type_json = b"\"LargeUtf8\"";
+
+        // ── Validity bits (Lsb0, u8) ──────────────────────────────────
+        // [valid, null, valid, valid] → bits [1, 0, 1, 1] → 0b1101 = 13
+        let bit_count: u64 = 4;
+        let validity_word: u8 = 0b1101;
+
+        // ── Data (only valid elements) ───────────────────────────────────
+        // "hello" → len=5 u64 LE + "hello"
+        // "world" → len=5 u64 LE + "world"
+        // ""      → len=0 u64 LE
+        let mut data_digest = Sha256::new();
+        data_digest.update(5_u64.to_le_bytes());
+        data_digest.update(b"hello");
+        // NULL: skipped
+        data_digest.update(5_u64.to_le_bytes());
+        data_digest.update(b"world");
+        data_digest.update(0_u64.to_le_bytes());
+        let data_finalized = data_digest.finalize();
+
+        // ── Final (nullable) ─────────────────────────────────────────────
+        let mut final_digest = Sha256::new();
+        final_digest.update(type_json);
+        final_digest.update(bit_count.to_le_bytes());
+        final_digest.update(validity_word.to_le_bytes());
+        final_digest.update(data_finalized);
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        assert_eq!(
+            ArrowDigester::hash_array(&array),
+            expected,
+            "Example H: nullable string array hash mismatch"
+        );
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example I: Empty Table (schema only, no data)
+    //   Tests that finalize() on a fresh digester with no update() calls
+    //   produces schema_digest + empty field digests.
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_i_empty_table() {
+        let schema = Schema::new(vec![
+            Field::new("a", DataType::Int32, false),
+            Field::new("b", DataType::Boolean, true),
+        ]);
+
+        // ── Schema digest ────────────────────────────────────────────────
+        let schema_json = r#"{"a":{"data_type":"Int32","nullable":false},"b":{"data_type":"Boolean","nullable":true}}"#;
+        let schema_digest = Sha256::digest(schema_json.as_bytes());
+
+        // ── Field "a" (Int32, non-nullable): no data fed ─────────────────
+        // data_digest = SHA-256() with no updates → SHA-256 of empty input
+        let a_data_finalized = Sha256::digest(b"");
+
+        // ── Field "b" (Boolean, nullable): no data fed ───────────────────
+        // bit_count = 0 (no elements)
+        // as_raw_slice() = [] (no words)
+        // data_digest = SHA-256 of empty input
+        let bit_count: u64 = 0;
+        let b_data_finalized = Sha256::digest(b"");
+
+        // ── Final ────────────────────────────────────────────────────────
+        let mut final_digest = Sha256::new();
+        final_digest.update(schema_digest);
+        // Field "a" (non-nullable)
+        final_digest.update(a_data_finalized);
+        // Field "b" (nullable) — bit_count=0, no words, empty data digest
+        final_digest.update(bit_count.to_le_bytes());
+        // no validity words (raw_slice is empty for 0-length BitVec)
+        final_digest.update(b_data_finalized);
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        let digester = ArrowDigester::new(&schema);
+        assert_eq!(
+            digester.finalize(),
+            expected,
+            "Example I: empty table hash mismatch"
+        );
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example J: Multi-Batch Streaming
+    //   Feeding two small batches must produce the same hash as feeding
+    //   one combined batch (batch-split independence).
+    //   Schema: {v: Int32 non-nullable}
+    //   Batch 1: [1, 2]
+    //   Batch 2: [3]
+    //   Combined: [1, 2, 3]
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_j_multi_batch_streaming() {
+        let schema = Schema::new(vec![Field::new("v", DataType::Int32, false)]);
+
+        // ── Two-batch path ───────────────────────────────────────────────
+        let batch1 = RecordBatch::try_new(
+            Arc::new(schema.clone()),
+            vec![Arc::new(Int32Array::from(vec![1_i32, 2])) as ArrayRef],
+        )
+        .unwrap();
+        let batch2 = RecordBatch::try_new(
+            Arc::new(schema.clone()),
+            vec![Arc::new(Int32Array::from(vec![3_i32])) as ArrayRef],
+        )
+        .unwrap();
+
+        let mut digester_stream = ArrowDigester::new(&schema);
+        digester_stream.update(&batch1);
+        digester_stream.update(&batch2);
+        let hash_stream = digester_stream.finalize();
+
+        // ── Single-batch path ────────────────────────────────────────────
+        let combined = RecordBatch::try_new(
+            Arc::new(schema),
+            vec![Arc::new(Int32Array::from(vec![1_i32, 2, 3])) as ArrayRef],
+        )
+        .unwrap();
+        let hash_combined = ArrowDigester::hash_record_batch(&combined);
+
+        assert_eq!(
+            hash_stream, hash_combined,
+            "Streaming two batches should equal single combined batch"
+        );
+
+        // ── Manual computation ───────────────────────────────────────────
+        let schema_json = r#"{"v":{"data_type":"Int32","nullable":false}}"#;
+        let schema_digest = Sha256::digest(schema_json.as_bytes());
+
+        // Field "v": data is [1, 2, 3] as i32 LE — accumulated across batches
+        // The digester is streaming, so it updates the same SHA-256 state:
+        //   update(01 00 00 00  02 00 00 00)  from batch 1
+        //   update(03 00 00 00)               from batch 2
+        // SHA-256 is incremental, so this is identical to hashing all 12 bytes at once.
+        let mut v_data = Sha256::new();
+        v_data.update(1_i32.to_le_bytes());
+        v_data.update(2_i32.to_le_bytes());
+        v_data.update(3_i32.to_le_bytes());
+        let v_finalized = v_data.finalize();
+
+        let mut final_digest = Sha256::new();
+        final_digest.update(schema_digest);
+        final_digest.update(v_finalized);
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        assert_eq!(
+            hash_stream, expected,
+            "Example J: multi-batch streaming hash mismatch"
+        );
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example K: Struct Column in a Record Batch
+    //   Schema: {person: Struct<age: Int32 non-null, name: LargeUtf8 non-null> non-nullable}
+    //   Row 0: {age: 25, name: "Alice"}
+    //   Row 1: {age: 30, name: "Bob"}
+    //
+    //   In the record-batch path, struct fields are decomposed into leaf
+    //   fields: "person/age" and "person/name", each hashed independently.
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_k_struct_column_in_record_batch() {
+        // ── Build the table ──────────────────────────────────────────────
+        let age = Arc::new(Int32Array::from(vec![25_i32, 30])) as ArrayRef;
+        let name = Arc::new(LargeStringArray::from(vec!["Alice", "Bob"])) as ArrayRef;
+        let struct_array = StructArray::from(vec![
+            (
+                Arc::new(Field::new("age", DataType::Int32, false)),
+                Arc::clone(&age),
+            ),
+            (
+                Arc::new(Field::new("name", DataType::LargeUtf8, false)),
+                Arc::clone(&name),
+            ),
+        ]);
+
+        let schema = Schema::new(vec![Field::new(
+            "person",
+            DataType::Struct(
+                vec![
+                    Field::new("age", DataType::Int32, false),
+                    Field::new("name", DataType::LargeUtf8, false),
+                ]
+                .into(),
+            ),
+            false,
+        )]);
+        let batch = RecordBatch::try_new(
+            Arc::new(schema.clone()),
+            vec![Arc::new(struct_array) as ArrayRef],
+        )
+        .unwrap();
+
+        // ── Step 1: Schema digest ────────────────────────────────────────
+        // Canonical JSON: struct fields sorted by name, keys sorted recursively
+        // "person" has data_type: {"Struct": [{"data_type": "Int32", "name": "age", "nullable": false},
+        //                                     {"data_type": "LargeUtf8", "name": "name", "nullable": false}]}
+        let schema_json = r#"{"person":{"data_type":{"Struct":[{"data_type":"Int32","name":"age","nullable":false},{"data_type":"LargeUtf8","name":"name","nullable":false}]},"nullable":false}}"#;
+        let schema_digest = Sha256::digest(schema_json.as_bytes());
+
+        assert_eq!(
+            ArrowDigester::hash_schema(&schema),
+            with_version(schema_digest.to_vec()),
+            "Example K: schema hash mismatch"
+        );
+
+        // ── Step 2: Leaf field "person/age" (Int32, non-nullable) ────────
+        // Values: [25, 30] as i32 LE
+        let mut age_data = Sha256::new();
+        age_data.update(25_i32.to_le_bytes());
+        age_data.update(30_i32.to_le_bytes());
+        let age_data_finalized = age_data.finalize();
+
+        // ── Step 3: Leaf field "person/name" (LargeUtf8, non-nullable) ───
+        // Values: ["Alice", "Bob"]
+        let mut name_data = Sha256::new();
+        name_data.update(5_u64.to_le_bytes()); // "Alice" length
+        name_data.update(b"Alice");
+        name_data.update(3_u64.to_le_bytes()); // "Bob" length
+        name_data.update(b"Bob");
+        let name_data_finalized = name_data.finalize();
+
+        // ── Step 4: Final combination ────────────────────────────────────
+        // Fields alphabetically: "person/age", "person/name"
+        let mut final_digest = Sha256::new();
+        final_digest.update(schema_digest);
+        // "person/age" (non-nullable): just data digest
+        final_digest.update(age_data_finalized);
+        // "person/name" (non-nullable): just data digest
+        final_digest.update(name_data_finalized);
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        assert_eq!(
+            ArrowDigester::hash_record_batch(&batch),
+            expected,
+            "Example K: struct column record batch hash mismatch"
+        );
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example L: Struct Array via hash_array (non-nullable struct)
+    //   StructArray [{a: 1, b: true}, {a: 2, b: false}]
+    //   Children: a: Int32 non-null, b: Boolean non-null
+    //
+    //   In hash_array, the struct is hashed compositely:
+    //   type_json + data where data = finalized(child_a) || finalized(child_b)
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_l_struct_array_hash_array() {
+        let a = Arc::new(Int32Array::from(vec![1_i32, 2])) as ArrayRef;
+        let b = Arc::new(BooleanArray::from(vec![true, false])) as ArrayRef;
+        let struct_array = StructArray::from(vec![
+            (
+                Arc::new(Field::new("a", DataType::Int32, false)),
+                Arc::clone(&a),
+            ),
+            (
+                Arc::new(Field::new("b", DataType::Boolean, false)),
+                Arc::clone(&b),
+            ),
+        ]);
+
+        // ── Type metadata ────────────────────────────────────────────────
+        // Canonical: {"Struct":[{"data_type":"Int32","name":"a","nullable":false},
+        //                       {"data_type":"Boolean","name":"b","nullable":false}]}
+        let type_json = r#"{"Struct":[{"data_type":"Int32","name":"a","nullable":false},{"data_type":"Boolean","name":"b","nullable":false}]}"#;
+
+        // ── Decomposition ────────────────────────────────────────────────
+        // Struct is transparent: no BTreeMap entry for the struct itself.
+        // Children become separate entries, finalized directly into the
+        // final digest (no parent_data wrapper).
+        //
+        // BTreeMap entries (sorted by key): "a", "b"
+
+        // ── Entry "a" (Int32, non-nullable) ──────────────────────────────
+        // data = SHA256(1_i32_le, 2_i32_le)
+        let mut data_a = Sha256::new();
+        data_a.update(1_i32.to_le_bytes());
+        data_a.update(2_i32.to_le_bytes());
+
+        // ── Entry "b" (Boolean, non-nullable) ────────────────────────────
+        // Values: [true, false] → Lsb0: bit0=1(true), bit1=0(false) → 0x01
+        let mut data_b = Sha256::new();
+        data_b.update([0x01_u8]);
+
+        // ── Final combination ────────────────────────────────────────────
+        // type_json → finalize_digest("a") → finalize_digest("b")
+        // Each entry: non-nullable → no null_bits, no structural, just data.finalize()
+        let mut final_digest = Sha256::new();
+        final_digest.update(type_json.as_bytes());
+        final_digest.update(data_a.finalize());
+        final_digest.update(data_b.finalize());
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        assert_eq!(
+            ArrowDigester::hash_array(&struct_array),
+            expected,
+            "Example L: struct array hash_array mismatch"
+        );
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example M: Nullable Struct Array via hash_array (struct-level nulls)
+    //   StructArray [Some({a: 10, b: "x"}), None, Some({a: 30, b: "z"})]
+    //   Struct is nullable. Children: a: Int32 non-null, b: LargeUtf8 non-null
+    //
+    //   Struct-level nulls propagate to children: at row 1 (null struct),
+    //   children's data is undefined and must be skipped.
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_m_nullable_struct_array_hash_array() {
+        // Build a nullable struct array with a null at row 1
+        let a = Int32Array::from(vec![10_i32, 0, 30]); // row 1 value is undefined (0 placeholder)
+        let b = LargeStringArray::from(vec!["x", "", "z"]); // row 1 value is undefined
+        let struct_array = StructArray::from((
+            vec![
+                (
+                    Arc::new(Field::new("a", DataType::Int32, false)),
+                    Arc::new(a) as ArrayRef,
+                ),
+                (
+                    Arc::new(Field::new("b", DataType::LargeUtf8, false)),
+                    Arc::new(b) as ArrayRef,
+                ),
+            ],
+            // Struct-level validity: [valid, null, valid]
+            NullBuffer::from(vec![true, false, true])
+                .into_inner()
+                .into_inner(),
+        ));
+
+        // ── Type metadata ────────────────────────────────────────────────
+        let type_json = r#"{"Struct":[{"data_type":"Int32","name":"a","nullable":false},{"data_type":"LargeUtf8","name":"b","nullable":false}]}"#;
+
+        // ── Decomposition ────────────────────────────────────────────────
+        // Struct is transparent: no BTreeMap entry. Struct-level nulls
+        // [1, 0, 1] are AND-propagated to children for data hashing.
+        // Children "a" and "b" are non-nullable per their Field definitions,
+        // so their entries have no null_bits — but null rows are skipped
+        // in the data stream.
+        //
+        // BTreeMap entries (sorted by key): "a", "b"
+
+        // ── Entry "a" (Int32, non-nullable) ──────────────────────────────
+        // Struct nulls propagated: rows 0,2 valid → data = [10, 30]
+        let mut data_a = Sha256::new();
+        data_a.update(10_i32.to_le_bytes());
+        // row 1: skipped (struct null)
+        data_a.update(30_i32.to_le_bytes());
+
+        // ── Entry "b" (LargeUtf8, non-nullable) ─────────────────────────
+        // Struct nulls propagated: rows 0,2 valid → data = ["x", "z"]
+        let mut data_b = Sha256::new();
+        data_b.update(1_u64.to_le_bytes()); // "x" len
+        data_b.update(b"x");
+        // row 1: skipped (struct null)
+        data_b.update(1_u64.to_le_bytes()); // "z" len
+        data_b.update(b"z");
+
+        // ── Final combination ────────────────────────────────────────────
+        // type_json → finalize_digest("a") → finalize_digest("b")
+        // Each entry: non-nullable → no null_bits, no structural, just data.finalize()
+        let mut final_digest = Sha256::new();
+        final_digest.update(type_json.as_bytes());
+        final_digest.update(data_a.finalize());
+        final_digest.update(data_b.finalize());
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        assert_eq!(
+            ArrowDigester::hash_array(&struct_array),
+            expected,
+            "Example M: nullable struct array hash_array mismatch"
+        );
+    }
+
+    // ══════════════════════════════════════════════════════════════════════
+    // Example N: List-of-Struct in a Record Batch
+    //   Schema: {items: LargeList<Struct<id: Int32 non-null, label: LargeUtf8 non-null>> nullable}
+    //   Row 0: [{id: 1, label: "a"}, {id: 2, label: "b"}]   (2 elements)
+    //   Row 1: [{id: 3, label: "c"}]                          (1 element)
+    //
+    //   Recursively decomposed into separate BTreeMap entries:
+    //   "items"       → validity-only (null_bits: [V, V])
+    //   "items/"      → structural-only (list lengths: [2, 1])
+    //   "items//id"   → data-only ([1, 2, 3] as i32 LE)
+    //   "items//label"→ data-only (["a", "b", "c"] as LargeUtf8)
+    // ══════════════════════════════════════════════════════════════════════
+
+    #[test]
+    fn example_n_list_of_struct_record_batch() {
+        // ── Build the table ──────────────────────────────────────────────
+        let struct_fields = vec![
+            Field::new("id", DataType::Int32, false),
+            Field::new("label", DataType::LargeUtf8, false),
+        ];
+        let inner_struct_field = Field::new(
+            "item",
+            DataType::Struct(struct_fields.clone().into()),
+            false,
+        );
+        let list_field = Field::new(
+            "items",
+            DataType::LargeList(Arc::new(inner_struct_field.clone())),
+            true,
+        );
+        let schema = Schema::new(vec![list_field.clone()]);
+
+        // Build struct sub-arrays
+        // Row 0: [{id:1, label:"a"}, {id:2, label:"b"}], Row 1: [{id:3, label:"c"}]
+        // Total struct rows: 3 (ids: [1,2,3], labels: ["a","b","c"])
+        let ids = Int32Array::from(vec![1_i32, 2, 3]);
+        let labels = LargeStringArray::from(vec!["a", "b", "c"]);
+        let struct_array = StructArray::from(vec![
+            (
+                Arc::new(Field::new("id", DataType::Int32, false)),
+                Arc::new(ids) as ArrayRef,
+            ),
+            (
+                Arc::new(Field::new("label", DataType::LargeUtf8, false)),
+                Arc::new(labels) as ArrayRef,
+            ),
+        ]);
+
+        // Build large list array with offsets [0, 2, 3]
+        let list_array = LargeListArray::new(
+            Arc::new(inner_struct_field),
+            arrow::buffer::OffsetBuffer::new(vec![0_i64, 2, 3].into()),
+            Arc::new(struct_array) as ArrayRef,
+            None, // all list elements valid
+        );
+
+        let batch = RecordBatch::try_new(
+            Arc::new(schema.clone()),
+            vec![Arc::new(list_array) as ArrayRef],
+        )
+        .unwrap();
+
+        // ── Step 1: Schema digest ────────────────────────────────────────
+        // Canonical: element type has no name (element_type_to_value drops "item")
+        // The inner struct's data_type is {"Struct": [sorted children]}
+        let schema_json = r#"{"items":{"data_type":{"LargeList":{"data_type":{"Struct":[{"data_type":"Int32","name":"id","nullable":false},{"data_type":"LargeUtf8","name":"label","nullable":false}]},"nullable":false}},"nullable":true}}"#;
+        let schema_digest = Sha256::digest(schema_json.as_bytes());
+
+        assert_eq!(
+            ArrowDigester::hash_schema(&schema),
+            with_version(schema_digest.to_vec()),
+            "Example N: schema hash mismatch"
+        );
+
+        // ── Step 2: Recursive decomposition ──────────────────────────────
+        //
+        // With recursive list/struct decomposition, entries are (sorted):
+        //   "items"     → validity-only: null_bits [V, V] (2 bits, both valid)
+        //   "items/"    → structural-only: list lengths [2, 1]
+        //   "items//id" → data-only: [1, 2, 3] as i32 LE
+        //   "items//label" → data-only: ["a", "b", "c"] as LargeUtf8
+
+        // ── Step 3: Final combination ────────────────────────────────────
+        let mut final_digest = Sha256::new();
+        final_digest.update(schema_digest);
+
+        // Entry "items": null_bits V,V → bit_count=2, validity=0b11=3
+        final_digest.update(2_u64.to_le_bytes());
+        final_digest.update(3_u8.to_le_bytes());
+
+        // Entry "items/": structural [2, 1]
+        let mut items_structural = Sha256::new();
+        items_structural.update(2_u64.to_le_bytes());
+        items_structural.update(1_u64.to_le_bytes());
+        final_digest.update(items_structural.finalize());
+
+        // Entry "items//id": data [1, 2, 3] as i32 LE
+        let mut id_data = Sha256::new();
+        id_data.update(1_i32.to_le_bytes());
+        id_data.update(2_i32.to_le_bytes());
+        id_data.update(3_i32.to_le_bytes());
+        final_digest.update(id_data.finalize());
+
+        // Entry "items//label": data ["a", "b", "c"] as LargeUtf8
+        let mut label_data = Sha256::new();
+        label_data.update(1_u64.to_le_bytes());
+        label_data.update(b"a");
+        label_data.update(1_u64.to_le_bytes());
+        label_data.update(b"b");
+        label_data.update(1_u64.to_le_bytes());
+        label_data.update(b"c");
+        final_digest.update(label_data.finalize());
+
+        let expected = with_version(final_digest.finalize().to_vec());
+
+        assert_eq!(
+            ArrowDigester::hash_record_batch(&batch),
+            expected,
+            "Example N: list-of-struct record batch hash mismatch"
+        );
+    }
+}
diff --git a/tests/golden_files/schema_serialization_pretty.json b/tests/golden_files/schema_serialization_pretty.json
index 70cb27d..f2ec2db 100644
--- a/tests/golden_files/schema_serialization_pretty.json
+++ b/tests/golden_files/schema_serialization_pretty.json
@@ -1,6 +1,6 @@
 {
   "binary_name": {
-    "data_type": "Binary",
+    "data_type": "LargeBinary",
     "nullable": true
   },
   "bool_name": {
@@ -45,19 +45,9 @@
   "doubly_nested_struct_name": {
     "data_type": {
       "Struct": [
-        {
-          "data_type": "Int32",
-          "name": "outer_field",
-          "nullable": false
-        },
         {
           "data_type": {
             "Struct": [
-              {
-                "data_type": "Utf8",
-                "name": "middle_field",
-                "nullable": true
-              },
               {
                 "data_type": {
                   "Struct": [
@@ -75,11 +65,21 @@
                 },
                 "name": "inner",
                 "nullable": false
+              },
+              {
+                "data_type": "LargeUtf8",
+                "name": "middle_field",
+                "nullable": true
               }
             ]
           },
           "name": "middle",
           "nullable": false
+        },
+        {
+          "data_type": "Int32",
+          "name": "outer_field",
+          "nullable": false
         }
       ]
     },
@@ -117,7 +117,6 @@
     "data_type": {
       "LargeList": {
         "data_type": "Int32",
-        "name": "item",
         "nullable": true
       }
     },
@@ -129,9 +128,8 @@
   },
   "list_name": {
     "data_type": {
-      "List": {
+      "LargeList": {
         "data_type": "Int32",
-        "name": "item",
         "nullable": true
       }
     },
@@ -146,7 +144,7 @@
           "nullable": false
         },
         {
-          "data_type": "Utf8",
+          "data_type": "LargeUtf8",
           "name": "struct_field2",
           "nullable": true
         }
@@ -195,7 +193,7 @@
     "nullable": false
   },
   "utf8_name": {
-    "data_type": "Utf8",
+    "data_type": "LargeUtf8",
     "nullable": true
   }
 }