Skip to content

[pull] main from apache:main#193

Merged
pull[bot] merged 2 commits into
buraksenn:mainfrom
apache:main
May 15, 2026
Merged

[pull] main from apache:main#193
pull[bot] merged 2 commits into
buraksenn:mainfrom
apache:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 15, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

xiedeyantu and others added 2 commits May 15, 2026 09:42
…already satisfied (#22150)

## Which issue does this PR close?

- Closes #.

## Rationale for this change

`GlobalLimitExec` (and `LocalLimitExec`) are sometimes redundant: if the
input can be proven via exact statistics to produce no more rows than
the fetch value, the limit node does nothing and should be removed
entirely.

Previously, the `LimitPushdown` rule had no mechanism to eliminate such
trivially-satisfied limits. A query like `SELECT * FROM (VALUES ...)
LIMIT 10` — where the input is a single-row `PlaceholderRowExec` — still
carried an unnecessary `GlobalLimitExec` in the physical plan.
Similarly, a `LIMIT N` over an `EmptyExec` or any zero-row plan was
retained.

## What changes are included in this PR?

- Adds `limit_satisfied_by_input()` in `limit_pushdown.rs`: checks
whether a plan's child provably produces at most `fetch` rows (requires
`skip == 0` and a single output partition).
- Adds `limit_eliminable_exact_num_rows()`: iteratively unwraps
`ProjectionExec` wrappers and recognises `EmptyExec` (0 rows),
`PlaceholderRowExec` (1 row), and any plan reporting
`Precision::Exact(0)` rows as eliminable producers.
- When a limit is statically satisfied, marks `global_state.satisfied =
true` and returns early — **without** resetting `fetch`/`skip` — so
nested limit nodes still receive the correct outer constraints to merge
against.
- Updates the `merges_local_limit_with_local_limit` snapshot: the result
is now bare `EmptyExec` (limit eliminated).
- Updates `union.slt`: `ProjectionExec` over `PlaceholderRowExec` (1
row) with `fetch=3` no longer carries a redundant `GlobalLimitExec`.
- Adds `explain_tree.slt` test: `SELECT count(*) … LIMIT 10` over a
two-row VALUES clause is correctly reduced to `ProjectionExec →
PlaceholderRowExec` with no limit node.
- Updates copy.slt: `fetch=10` is now correctly pushed all the way down
to `DataSourceExec`.

## Are these changes tested?

Yes.

- `cargo fmt --all`
- `cargo clippy --all-targets --all-features -- -D warnings`
- `cargo test -p datafusion-core --test physical_optimizer limit`
- `cargo test --features backtrace,parquet_encryption --profile ci
--package datafusion-sqllogictest --test sqllogictests -- copy.slt
union.slt explain_tree.slt`

## Are there any user-facing changes?

No API changes. Physical plans for queries with `LIMIT` over statically
small inputs (`EmptyExec`, `PlaceholderRowExec`, or zero-row tables)
will now have the redundant `GlobalLimitExec`/`LocalLimitExec` nodes
eliminated, resulting in simpler and slightly more efficient plans.
## Which issue does this PR close?

* Closes #22061.

## Rationale for this change

The existing `min_max_scalar_impl!` macro combined dispatch, validation,
recursion, and control flow into a large macro expansion, making the
logic difficult to debug, review, and test in isolation.

This change moves scalar min/max dispatch to a function-based
implementation while preserving existing behavior, error messages, and
dictionary handling semantics.

## What changes are included in this PR?

* Replaced the `min_max_scalar_impl!` macro with function-based dispatch
logic.
* Added helper functions to separate responsibilities:

  * `min_max_option`
  * `min_max_float_option`
  * `ensure_decimal_compatibility`
  * `min_max_generic_scalar`
  * `min_max_interval_scalar`
  * `min_max_dictionary_scalar`
  * `min_max_scalar_same_variant`
* Kept `min_max_scalar` as the main entry point and delegated behavior
through helper functions.
* Preserved:

  * existing min/max behavior across scalar types
  * dictionary comparison and rewrapping semantics
  * decimal precision/scale validation behavior
  * float `NaN` handling via `total_cmp`
  * existing error classifications and messages
  * compacting behavior for generic scalar comparisons
* Simplified macro usage by retaining only lightweight
ordering-selection macros.

## Are these changes tested?

Yes.

Added unit tests covering:

* core scalar min/max parity behavior
* float `NaN` handling using `total_cmp`
* decimal mismatch error preservation
* fixed-size binary mismatch error preservation
* mixed interval comparison error preservation

Existing dictionary comparison tests are also preserved.

## Are there any user-facing changes?

No. This PR is intended to be a refactor only and preserves existing
externally visible behavior.

## LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated
content has been manually reviewed and tested.
@pull pull Bot locked and limited conversation to collaborators May 15, 2026
@pull pull Bot added the ⤵️ pull label May 15, 2026
@pull pull Bot merged commit d096558 into buraksenn:main May 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants