Skip to content

Conversation

@kumarUjjawal
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

  • The API is not “reverse-only”: it takes an arbitrary requested ordering and can accept/deny/partially match it, so try_pushdown_sort is a more accurate name.
  • FileScanConfig::rebuild_with_source shouldn’t assume reversal; reversal is only valid when it helps satisfy the caller’s requested order.

What changes are included in this PR?

  • Renames FileSource::try_reverse_output to FileSource::try_pushdown_sort (keeps try_reverse_output as a deprecated shim for compatibility).
  • Updates Parquet’s implementation and internal call sites to use the new name.
  • Updates FileScanConfig::rebuild_with_source to take the caller’s requested ordering and only reverse file_groups when the request is actually a reverse of the current
    output_ordering (adds a unit test).

Are these changes tested?

Yes

Are there any user-facing changes?

  • Yes: FileSource implementers should prefer try_pushdown_sort; try_reverse_output remains but is deprecated.
  • Slight behavioral change: file group reversal during sort pushdown is now conditional instead of always reversing

@github-actions github-actions bot added the datasource Changes to the datasource crate label Jan 28, 2026
@kumarUjjawal
Copy link
Contributor Author

cc @adriangb

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits, thanks!

/// * `Unsupported` - Cannot optimize for this ordering
///
/// Default implementation returns `Unsupported`.
/// Default implementation delegates to [`Self::try_reverse_output`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe state the version / timeline at which we will remove try_reverse_output and what the default impl of try_pushdown_sort will be at that point.

Also we should make an explicit recommendation for users: should they immediately implement try_pushdown_sort?

Comment on lines 1164 to 1171
// Reverse file groups (FileScanConfig's responsibility) if doing so helps satisfy the
// requested ordering.
let reverse_file_groups =
LexOrdering::new(order.iter().cloned()).is_some_and(|requested| {
self.output_ordering
.iter()
.any(|ordering| ordering.is_reverse(&requested))
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this bit? I can't tell where this code is being moved from. Is it new / handling a bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded this check because order is expressed in the scan’s output/projection schema, while self.output_ordering is stored pre-projection. LexOrdering::is_reverse compares PhysicalSortExprs (incl. Column indices), so the old code could miss valid reversals when projections reorder/drop columns. By projecting output_ordering into the scan output schema first (project_orderings), we compare like-for-like and only reverse file groups when it truly matches a reverse request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added new checks to handle when the scan has a projection / reindexed columns, instead of silently failing
to reverse due to mismatched indices.

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kumarUjjawal !

@adriangb
Copy link
Contributor

I'm going to explicitly not ask for an entry into the upgrade guide because (1) this is a new method that I doubt many are implementing and (2) there is a deprecation warning and clear instructions for migration. Thus I think it would do more harm (by polluting the document) than good.

@adriangb
Copy link
Contributor

@zhuqi-lucas could you give a final look at this please since you wrote the method / API in the first place?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rename FileSource::try_reverse_output to FileSource::try_pushdown_sort

2 participants