[pull] main from apache:main#193
Merged
Merged
Conversation
…already satisfied (#22150) ## Which issue does this PR close? - Closes #. ## Rationale for this change `GlobalLimitExec` (and `LocalLimitExec`) are sometimes redundant: if the input can be proven via exact statistics to produce no more rows than the fetch value, the limit node does nothing and should be removed entirely. Previously, the `LimitPushdown` rule had no mechanism to eliminate such trivially-satisfied limits. A query like `SELECT * FROM (VALUES ...) LIMIT 10` — where the input is a single-row `PlaceholderRowExec` — still carried an unnecessary `GlobalLimitExec` in the physical plan. Similarly, a `LIMIT N` over an `EmptyExec` or any zero-row plan was retained. ## What changes are included in this PR? - Adds `limit_satisfied_by_input()` in `limit_pushdown.rs`: checks whether a plan's child provably produces at most `fetch` rows (requires `skip == 0` and a single output partition). - Adds `limit_eliminable_exact_num_rows()`: iteratively unwraps `ProjectionExec` wrappers and recognises `EmptyExec` (0 rows), `PlaceholderRowExec` (1 row), and any plan reporting `Precision::Exact(0)` rows as eliminable producers. - When a limit is statically satisfied, marks `global_state.satisfied = true` and returns early — **without** resetting `fetch`/`skip` — so nested limit nodes still receive the correct outer constraints to merge against. - Updates the `merges_local_limit_with_local_limit` snapshot: the result is now bare `EmptyExec` (limit eliminated). - Updates `union.slt`: `ProjectionExec` over `PlaceholderRowExec` (1 row) with `fetch=3` no longer carries a redundant `GlobalLimitExec`. - Adds `explain_tree.slt` test: `SELECT count(*) … LIMIT 10` over a two-row VALUES clause is correctly reduced to `ProjectionExec → PlaceholderRowExec` with no limit node. - Updates copy.slt: `fetch=10` is now correctly pushed all the way down to `DataSourceExec`. ## Are these changes tested? Yes. - `cargo fmt --all` - `cargo clippy --all-targets --all-features -- -D warnings` - `cargo test -p datafusion-core --test physical_optimizer limit` - `cargo test --features backtrace,parquet_encryption --profile ci --package datafusion-sqllogictest --test sqllogictests -- copy.slt union.slt explain_tree.slt` ## Are there any user-facing changes? No API changes. Physical plans for queries with `LIMIT` over statically small inputs (`EmptyExec`, `PlaceholderRowExec`, or zero-row tables) will now have the redundant `GlobalLimitExec`/`LocalLimitExec` nodes eliminated, resulting in simpler and slightly more efficient plans.
## Which issue does this PR close? * Closes #22061. ## Rationale for this change The existing `min_max_scalar_impl!` macro combined dispatch, validation, recursion, and control flow into a large macro expansion, making the logic difficult to debug, review, and test in isolation. This change moves scalar min/max dispatch to a function-based implementation while preserving existing behavior, error messages, and dictionary handling semantics. ## What changes are included in this PR? * Replaced the `min_max_scalar_impl!` macro with function-based dispatch logic. * Added helper functions to separate responsibilities: * `min_max_option` * `min_max_float_option` * `ensure_decimal_compatibility` * `min_max_generic_scalar` * `min_max_interval_scalar` * `min_max_dictionary_scalar` * `min_max_scalar_same_variant` * Kept `min_max_scalar` as the main entry point and delegated behavior through helper functions. * Preserved: * existing min/max behavior across scalar types * dictionary comparison and rewrapping semantics * decimal precision/scale validation behavior * float `NaN` handling via `total_cmp` * existing error classifications and messages * compacting behavior for generic scalar comparisons * Simplified macro usage by retaining only lightweight ordering-selection macros. ## Are these changes tested? Yes. Added unit tests covering: * core scalar min/max parity behavior * float `NaN` handling using `total_cmp` * decimal mismatch error preservation * fixed-size binary mismatch error preservation * mixed interval comparison error preservation Existing dictionary comparison tests are also preserved. ## Are there any user-facing changes? No. This PR is intended to be a refactor only and preserves existing externally visible behavior. ## LLM-generated code disclosure This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )