-
Notifications
You must be signed in to change notification settings - Fork 1.9k
refactor: Rename FileSource::try_reverse_output to FileSource::try_pushdown_sort
#20043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor: Rename FileSource::try_reverse_output to FileSource::try_pushdown_sort
#20043
Conversation
FileSource::try_pushdown_sort
|
cc @adriangb |
adriangb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nits, thanks!
datafusion/datasource/src/file.rs
Outdated
| /// * `Unsupported` - Cannot optimize for this ordering | ||
| /// | ||
| /// Default implementation returns `Unsupported`. | ||
| /// Default implementation delegates to [`Self::try_reverse_output`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe state the version / timeline at which we will remove try_reverse_output and what the default impl of try_pushdown_sort will be at that point.
Also we should make an explicit recommendation for users: should they immediately implement try_pushdown_sort?
| // Reverse file groups (FileScanConfig's responsibility) if doing so helps satisfy the | ||
| // requested ordering. | ||
| let reverse_file_groups = | ||
| LexOrdering::new(order.iter().cloned()).is_some_and(|requested| { | ||
| self.output_ordering | ||
| .iter() | ||
| .any(|ordering| ordering.is_reverse(&requested)) | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this bit? I can't tell where this code is being moved from. Is it new / handling a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded this check because order is expressed in the scan’s output/projection schema, while self.output_ordering is stored pre-projection. LexOrdering::is_reverse compares PhysicalSortExprs (incl. Column indices), so the old code could miss valid reversals when projections reorder/drop columns. By projecting output_ordering into the scan output schema first (project_orderings), we compare like-for-like and only reverse file groups when it truly matches a reverse request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added new checks to handle when the scan has a projection / reindexed columns, instead of silently failing
to reverse due to mismatched indices.
adriangb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kumarUjjawal !
|
I'm going to explicitly not ask for an entry into the upgrade guide because (1) this is a new method that I doubt many are implementing and (2) there is a deprecation warning and clear instructions for migration. Thus I think it would do more harm (by polluting the document) than good. |
|
@zhuqi-lucas could you give a final look at this please since you wrote the method / API in the first place? |
Which issue does this PR close?
FileSource::try_reverse_outputtoFileSource::try_pushdown_sort#19723.Rationale for this change
What changes are included in this PR?
output_ordering (adds a unit test).
Are these changes tested?
Yes
Are there any user-facing changes?