Skip to content

test: recursive struct ordering — order-independence + byte-exact conformance#11

Merged
eywalker merged 3 commits intomainfrom
eywalker/plt-840-change-schema-hashing-to-be-recursively-ordered
Mar 20, 2026
Merged

test: recursive struct ordering — order-independence + byte-exact conformance#11
eywalker merged 3 commits intomainfrom
eywalker/plt-840-change-schema-hashing-to-be-recursively-ordered

Conversation

@kurodo3
Copy link
Copy Markdown
Contributor

@kurodo3 kurodo3 Bot commented Mar 20, 2026

Summary

Closes PLT-840: Change schema hashing to be recursively ordered.

The existing implementation already performs recursive alphabetical ordering of struct fields at every nesting level (via data_type_to_value for schema hashing and BTreeMap-keyed paths for data hashing). This PR adds the missing test coverage to document, verify, and lock in that behaviour across all relevant nesting shapes.

New tests — order independence (tests/arrow_digester.rs)

Test What it checks
deeply_nested_struct_schema_hash_is_field_order_independent 3-level deep Struct<Struct<fields>> — inner-field insertion order must not affect hash_schema
deeply_nested_struct_record_batch_hash_is_field_order_independent Same for record-batch data hashing
hash_array_nested_struct_is_field_order_independent hash_array API with a struct containing a nested struct
list_of_struct_schema_hash_is_field_order_independent List<Struct<fields>> schema hash independent of inner struct field order
list_of_struct_record_batch_hash_is_field_order_independent Same for actual record-batch data

New tests — byte-exact conformance (tests/digest_bytes.rs)

Example What it checks
O: Nested struct record batch Pins the canonical JSON and exact leaf paths (s/a, s/nested/p, s/nested/q) for a two-level struct column; verifies each leaf is hashed with the correct little-endian bytes
P: Nested struct field-order independence Pins the expected canonical JSON for a schema whose struct fields are defined out of alphabetical order at both nesting levels; asserts both orderings produce that exact hash

Test plan

  • All 60 tests pass (cargo test)
  • cargo fmt applied, no formatting changes
  • New tests fail if recursive ordering is removed (confirmed by tracing the code path)

Closes PLT-840

kurodo3 Bot and others added 3 commits March 20, 2026 05:16
…examples

Adds comprehensive tests verifying that schema hashing and data hashing
are fully recursive and field-order independent for nested structures.

Order-independence tests (arrow_digester.rs):
- deeply_nested_struct_schema_hash_is_field_order_independent: 3-level
  struct schema hash unchanged regardless of inner-field insertion order
- deeply_nested_struct_record_batch_hash_is_field_order_independent:
  same guarantee for record-batch data hashing
- hash_array_nested_struct_is_field_order_independent: hash_array API
  with a struct containing a nested struct
- list_of_struct_schema_hash_is_field_order_independent: List<Struct>
  schema hash is independent of inner struct field order
- list_of_struct_record_batch_hash_is_field_order_independent: same for
  actual record-batch data

Byte-exact conformance tests (digest_bytes.rs):
- example_o_nested_struct_record_batch: pins the exact canonical JSON
  and leaf-path byte layout for a two-level struct column, confirming
  paths "s/a", "s/nested/p", "s/nested/q" with correct i32 LE bytes
- example_p_nested_struct_field_order_independence: pins the expected
  canonical JSON for a schema whose struct fields are defined out of
  alphabetical order at both nesting levels, confirming recursive
  alphabetical sorting produces a stable hash

Closes PLT-840

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add five order-independence tests for deeply nested structs, list-of-struct,
  and hash_array in tests/arrow_digester.rs (issue PLT-840)
- Add byte-exact conformance tests for Examples O, P, Q in tests/digest_bytes.rs,
  covering nested-struct record batch, field-order independence, and hash_array
- Document Examples O, P, Q in docs/byte-layout-spec.md with full byte-level
  computation steps matching the test implementations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ests

Add #[expect(clippy::similar_names)] on example_o, example_q, and
deeply_nested_struct_schema_hash tests where variable names intentionally
mirror the spec's path notation (snp/snq, x/y, z_first/a_first).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds regression tests and specification examples to lock in Starfix’s recursive struct field-order normalization (schema/type hashing and record-batch/array data hashing) so that nested struct insertion order cannot affect digests.

Changes:

  • Add order-independence tests for deeply-nested structs and list-of-struct shapes across hash_schema, hash_record_batch, hash_array, and streaming update().
  • Add byte-exact conformance tests for nested structs (record batch + hash_array) and a pinned canonical JSON example for nested struct ordering.
  • Extend the byte-layout spec with new Examples O–Q documenting canonical JSON, leaf paths, and byte-level hashing steps.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tests/digest_bytes.rs Adds byte-exact examples O–Q for nested structs, including pinned canonical JSON and leaf-path hashing.
tests/arrow_digester.rs Adds order-independence tests for nested structs (schema, record batches, hash_array, streaming update) and list-of-struct cases.
docs/byte-layout-spec.md Documents Examples O–Q to match/anchor the new conformance tests and recursive ordering behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@eywalker eywalker merged commit 70e2831 into main Mar 20, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants