Implement preimage for floor function to enable predicate pushdown #20059

devanshu0987 · 2026-01-29T07:32:28Z

This adds a preimage implementation for the floor() function that transforms floor(x) = N into x >= N AND x < N+1. This enables statistics-based predicate pushdown for queries using floor().

For example, a query like:
SELECT * FROM t WHERE floor(price) = 100

Is rewritten to:
SELECT * FROM t WHERE price >= 100 AND price < 101

This allows the query engine to leverage min/max statistics from Parquet row groups, significantly reducing the amount of data scanned.

Benchmarks on the ClickBench hits dataset show:

80% file pruning (89 out of 111 files skipped)
70x fewer rows scanned (1.4M vs 100M)

CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION 'benchmarks/data/hits_partitioned/';

-- Test the floor preimage optimization
EXPLAIN ANALYZE SELECT COUNT(*) FROM hits WHERE floor(CAST("CounterID" AS DOUBLE)) = 62;

Metric	Before (no preimage)	After (with preimage)
Files pruned	111 → 111 (0 pruned)	111 → 22 (89 pruned)
Row groups pruned	325 → 325 (0 pruned)	51 → 4 (47 pruned)
Rows scanned	99,997,497	1,410,000
Output rows	738,172	738,172
Pruning predicate	None	CAST(CounterID_max) >= 62 AND CAST(CounterID_min) < 63

Which issue does this PR close?

Closes #.

Rationale for this change

#19946

This epic introduced the pre-image API. This PR is using the pre-image API to provide it for floor function where it is applicable.

What changes are included in this PR?

Are these changes tested?

Unit Tests added
Existing SLT tests pass for this.

Are there any user-facing changes?

No

This adds a `preimage` implementation for the `floor()` function that transforms `floor(x) = N` into `x >= N AND x < N+1`. This enables statistics-based predicate pushdown for queries using floor(). For example, a query like: SELECT * FROM t WHERE floor(price) = 100 Is rewritten to: SELECT * FROM t WHERE price >= 100 AND price < 101 This allows the query engine to leverage min/max statistics from Parquet row groups, significantly reducing the amount of data scanned. Benchmarks on the ClickBench hits dataset show: - 80% file pruning (89 out of 111 files skipped) - 70x fewer rows scanned (1.4M vs 100M)

masonh22 · 2026-01-29T11:05:59Z

datafusion/functions/src/math/floor.rs

+                (ScalarValue::Int64(Some(lo)), ScalarValue::Int64(Some(hi)))
+            }),
+
+            // Unsupported types


floor also supports decimal types. Should we add those here?

comphead · 2026-01-29T16:34:55Z

datafusion/functions/src/math/floor.rs

+        _info: &SimplifyContext,
+    ) -> Result<PreimageResult> {
+        // floor takes exactly one argument
+        if args.len() != 1 {


perhaps its good to debug_assert! here?

datafusion/functions/src/math/floor.rs

comphead

the PR looks good to me, thanks @devanshu0987 and @masonh22 for the review, decimal types support makes sense, WDYT can decimals be addressed in this PR or in followup?

devanshu0987 · 2026-01-30T01:36:38Z

Hi @comphead and @masonh22

Can you advise on what I should do?

Preimage result is a Interval::try_new(lower, upper)

Inside Interval::try_new, it does a data type assert

      assert_eq_or_internal_err!(
          lower.data_type(),
          upper.data_type(),
          "Endpoints of an Interval should have the same type"
      );

Data type asserts for Decimal types also include checking precision and scale.

          ScalarValue::Decimal32(_, precision, scale) => {
              DataType::Decimal32(*precision, *scale)
          }

When we add 2 Decimals, the precision changes result_precision = max(s1, s2) + max(p1-s1, p2-s2) + 1
- Which makes sense to handle the overflow case: 99 + 1 = 100

Hence, when we add 1 to find the upper bound, the resulting data type is 1 more than the lower bound, and the Interval::try_new fails.

  SELECT arrow_typeof(CAST(100.00 AS DECIMAL(5,2)) + CAST(1.00 AS DECIMAL(5,2)));
  +-----------------------------------------+
  | arrow_typeof(Float64(100) + Float64(1)) |
  +-----------------------------------------+
  | Decimal128(6, 2)                        | 
  +-----------------------------------------+

I am hesitant about the idea of writing my own manual calculation to keep the precision the same post addition.
- Scale < 0, Scale > 0 etc.

github-actions bot added the functions Changes to functions implementation label Jan 29, 2026

devanshu0987 force-pushed the floor-preimage branch from bd9b68f to 5c4c771 Compare January 29, 2026 07:37

Devanshu and others added 2 commits January 29, 2026 14:47

Fix clippy warnings: inline format arguments in floor tests

1a025d4

Merge branch 'main' into floor-preimage

fcad1d6

masonh22 reviewed Jan 29, 2026

View reviewed changes

comphead reviewed Jan 29, 2026

View reviewed changes

datafusion/functions/src/math/floor.rs Show resolved Hide resolved

comphead reviewed Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement preimage for floor function to enable predicate pushdown #20059

Implement preimage for floor function to enable predicate pushdown #20059

devanshu0987 commented Jan 29, 2026

Uh oh!

masonh22 Jan 29, 2026

Uh oh!

comphead Jan 29, 2026

Uh oh!

Uh oh!

comphead left a comment

Uh oh!

devanshu0987 commented Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implement preimage for floor function to enable predicate pushdown #20059

Are you sure you want to change the base?

Implement preimage for floor function to enable predicate pushdown #20059

Conversation

devanshu0987 commented Jan 29, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

masonh22 Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

comphead Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

devanshu0987 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

devanshu0987 commented Jan 30, 2026 •

edited

Loading