-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Implement preimage for floor function to enable predicate pushdown #20059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This adds a `preimage` implementation for the `floor()` function that transforms `floor(x) = N` into `x >= N AND x < N+1`. This enables statistics-based predicate pushdown for queries using floor(). For example, a query like: SELECT * FROM t WHERE floor(price) = 100 Is rewritten to: SELECT * FROM t WHERE price >= 100 AND price < 101 This allows the query engine to leverage min/max statistics from Parquet row groups, significantly reducing the amount of data scanned. Benchmarks on the ClickBench hits dataset show: - 80% file pruning (89 out of 111 files skipped) - 70x fewer rows scanned (1.4M vs 100M)
bd9b68f to
5c4c771
Compare
| (ScalarValue::Int64(Some(lo)), ScalarValue::Int64(Some(hi))) | ||
| }), | ||
|
|
||
| // Unsupported types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
floor also supports decimal types. Should we add those here?
| _info: &SimplifyContext, | ||
| ) -> Result<PreimageResult> { | ||
| // floor takes exactly one argument | ||
| if args.len() != 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps its good to debug_assert! here?
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the PR looks good to me, thanks @devanshu0987 and @masonh22 for the review, decimal types support makes sense, WDYT can decimals be addressed in this PR or in followup?
|
Can you advise on what I should do?
|
This adds a
preimageimplementation for thefloor()function that transformsfloor(x) = Nintox >= N AND x < N+1. This enables statistics-based predicate pushdown for queries using floor().For example, a query like:
SELECT * FROM t WHERE floor(price) = 100Is rewritten to:
SELECT * FROM t WHERE price >= 100 AND price < 101This allows the query engine to leverage min/max statistics from Parquet row groups, significantly reducing the amount of data scanned.
Benchmarks on the ClickBench hits dataset show:
Which issue does this PR close?
Rationale for this change
#19946
This epic introduced the pre-image API. This PR is using the pre-image API to provide it for
floorfunction where it is applicable.What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
No