[pull] main from apache:main by pull[bot] · Pull Request #193 · buraksenn/datafusion

pull · 2026-05-15T12:33:30Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

…already satisfied (#22150) ## Which issue does this PR close? - Closes #. ## Rationale for this change `GlobalLimitExec` (and `LocalLimitExec`) are sometimes redundant: if the input can be proven via exact statistics to produce no more rows than the fetch value, the limit node does nothing and should be removed entirely. Previously, the `LimitPushdown` rule had no mechanism to eliminate such trivially-satisfied limits. A query like `SELECT * FROM (VALUES ...) LIMIT 10` — where the input is a single-row `PlaceholderRowExec` — still carried an unnecessary `GlobalLimitExec` in the physical plan. Similarly, a `LIMIT N` over an `EmptyExec` or any zero-row plan was retained. ## What changes are included in this PR? - Adds `limit_satisfied_by_input()` in `limit_pushdown.rs`: checks whether a plan's child provably produces at most `fetch` rows (requires `skip == 0` and a single output partition). - Adds `limit_eliminable_exact_num_rows()`: iteratively unwraps `ProjectionExec` wrappers and recognises `EmptyExec` (0 rows), `PlaceholderRowExec` (1 row), and any plan reporting `Precision::Exact(0)` rows as eliminable producers. - When a limit is statically satisfied, marks `global_state.satisfied = true` and returns early — **without** resetting `fetch`/`skip` — so nested limit nodes still receive the correct outer constraints to merge against. - Updates the `merges_local_limit_with_local_limit` snapshot: the result is now bare `EmptyExec` (limit eliminated). - Updates `union.slt`: `ProjectionExec` over `PlaceholderRowExec` (1 row) with `fetch=3` no longer carries a redundant `GlobalLimitExec`. - Adds `explain_tree.slt` test: `SELECT count(*) … LIMIT 10` over a two-row VALUES clause is correctly reduced to `ProjectionExec → PlaceholderRowExec` with no limit node. - Updates copy.slt: `fetch=10` is now correctly pushed all the way down to `DataSourceExec`. ## Are these changes tested? Yes. - `cargo fmt --all` - `cargo clippy --all-targets --all-features -- -D warnings` - `cargo test -p datafusion-core --test physical_optimizer limit` - `cargo test --features backtrace,parquet_encryption --profile ci --package datafusion-sqllogictest --test sqllogictests -- copy.slt union.slt explain_tree.slt` ## Are there any user-facing changes? No API changes. Physical plans for queries with `LIMIT` over statically small inputs (`EmptyExec`, `PlaceholderRowExec`, or zero-row tables) will now have the redundant `GlobalLimitExec`/`LocalLimitExec` nodes eliminated, resulting in simpler and slightly more efficient plans.

## Which issue does this PR close? * Closes #22061. ## Rationale for this change The existing `min_max_scalar_impl!` macro combined dispatch, validation, recursion, and control flow into a large macro expansion, making the logic difficult to debug, review, and test in isolation. This change moves scalar min/max dispatch to a function-based implementation while preserving existing behavior, error messages, and dictionary handling semantics. ## What changes are included in this PR? * Replaced the `min_max_scalar_impl!` macro with function-based dispatch logic. * Added helper functions to separate responsibilities: * `min_max_option` * `min_max_float_option` * `ensure_decimal_compatibility` * `min_max_generic_scalar` * `min_max_interval_scalar` * `min_max_dictionary_scalar` * `min_max_scalar_same_variant` * Kept `min_max_scalar` as the main entry point and delegated behavior through helper functions. * Preserved: * existing min/max behavior across scalar types * dictionary comparison and rewrapping semantics * decimal precision/scale validation behavior * float `NaN` handling via `total_cmp` * existing error classifications and messages * compacting behavior for generic scalar comparisons * Simplified macro usage by retaining only lightweight ordering-selection macros. ## Are these changes tested? Yes. Added unit tests covering: * core scalar min/max parity behavior * float `NaN` handling using `total_cmp` * decimal mismatch error preservation * fixed-size binary mismatch error preservation * mixed interval comparison error preservation Existing dictionary comparison tests are also preserved. ## Are there any user-facing changes? No. This PR is intended to be a refactor only and preserves existing externally visible behavior. ## LLM-generated code disclosure This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

xiedeyantu and others added 2 commits May 15, 2026 09:42

pull Bot locked and limited conversation to collaborators May 15, 2026

pull Bot added the ⤵️ pull label May 15, 2026

pull Bot merged commit d096558 into buraksenn:main May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from apache:main#193

[pull] main from apache:main#193
pull[bot] merged 2 commits into
buraksenn:mainfrom
apache:main

pull Bot commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pull Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pull Bot commented May 15, 2026 •

edited

Loading