Refactor PartitionedFile: add ordering field and new_from_meta constructor#19596
Merged
alamb merged 5 commits intoapache:mainfrom Jan 6, 2026
Merged
Refactor PartitionedFile: add ordering field and new_from_meta constructor#19596alamb merged 5 commits intoapache:mainfrom
alamb merged 5 commits intoapache:mainfrom
Conversation
96de14b to
aeda804
Compare
…uctor Add infrastructure to track ordering information per file in preparation for ordering inference from Parquet metadata. Changes: - Add `ordering: Option<LexOrdering>` field to `PartitionedFile` struct - Add `new_from_meta(ObjectMeta)` constructor for creating files from metadata - Add `with_ordering()` and `with_partition_values()` builder methods - Update all PartitionedFile constructors to initialize `ordering: None` - Update callsites in test_util, proto, and substrait to use new_from_meta 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update all PartitionedFile struct literals to use new_from_meta() builder pattern with appropriate builder methods like with_partition_values(), with_extensions(), with_range(), and with_statistics(). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
aeda804 to
bf51e52
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds infrastructure to support per-file ordering information in preparation for ordering inference from Parquet metadata. The changes introduce a new ordering field to PartitionedFile along with a cleaner constructor pattern using new_from_meta() and builder methods.
Key Changes:
- Added
ordering: Option<LexOrdering>field toPartitionedFilestruct with correspondingwith_ordering()builder method - Introduced
new_from_meta(ObjectMeta)constructor andwith_partition_values()builder method for cleaner object construction - Refactored all existing
PartitionedFileinstantiations across the codebase to use the new constructor pattern
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| datafusion/datasource/src/mod.rs | Core changes: added ordering field, new_from_meta() constructor, with_ordering() and with_partition_values() builder methods, updated all constructors to initialize ordering |
| datafusion/substrait/src/physical_plan/consumer.rs | Refactored to use new_from_meta() instead of manual struct construction |
| datafusion/proto/src/physical_plan/from_proto.rs | Refactored protobuf conversion to use new_from_meta() with builder pattern, updated test helper |
| datafusion/datasource/src/file_scan_config.rs | Refactored test helper to use new_from_meta() with builder methods |
| datafusion/datasource/src/display.rs | Simplified test utility to use new_from_meta() |
| datafusion/datasource-parquet/src/row_group_filter.rs | Refactored test setup to use new_from_meta() |
| datafusion/core/tests/parquet/page_pruning.rs | Simplified test setup to use new_from_meta() |
| datafusion/core/tests/parquet/custom_reader.rs | Refactored to use new_from_meta() with with_extensions() builder |
| datafusion/core/src/test_util/parquet.rs | Simplified FileScanConfigBuilder setup to use new_from_meta() |
| datafusion/core/src/datasource/physical_plan/parquet.rs | Refactored multiple test helpers to use new_from_meta() with builder pattern |
| datafusion/core/src/datasource/mod.rs | Simplified test utility to use new_from_meta() |
| datafusion/core/src/datasource/file_format/mod.rs | Simplified test utility to use new_from_meta() |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
zhuqi-lucas
approved these changes
Jan 4, 2026
Contributor
zhuqi-lucas
left a comment
There was a problem hiding this comment.
LGTM, thanks @adriangb !
44 tasks
Contributor
|
🚀 Thanks @adriangb and @zhuqi-lucas |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Part of #19433
Rationale for this change
In preparation for ordering inference from Parquet metadata, we need to be able to store per-file ordering information on
PartitionedFile. This PR adds the necessary infrastructure.What changes are included in this PR?
ordering: Option<LexOrdering>field toPartitionedFilestructnew_from_meta(ObjectMeta)constructor for creating files from metadata (cleaner than manually constructing)with_ordering()builder method to set ordering informationwith_partition_values()builder method for consistencyPartitionedFileconstructors to initializeordering: Nonenew_from_metaAre these changes tested?
Yes, existing tests pass. The new field is currently always
Noneso no new tests are needed yet. Tests for ordering inference will come in a follow-up PR.Are there any user-facing changes?
No user-facing API changes. The
orderingfield is public but users typically constructPartitionedFilevia the provided constructors which handle this automatically.🤖 Generated with Claude Code