Skip to content

[pull] main from apache:main#188

Merged
pull[bot] merged 3 commits into
buraksenn:mainfrom
apache:main
May 14, 2026
Merged

[pull] main from apache:main#188
pull[bot] merged 3 commits into
buraksenn:mainfrom
apache:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 14, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

paleolimbot and others added 3 commits May 13, 2026 21:21
…UE window functions (#22112)

## Which issue does this PR close?

- Closes #22108

## Rationale for this change

lead and lag both preserve metadata from the input field but nth_value,
first_value, and last_value do not.

## What changes are included in this PR?

The mechanism to calcluate the return field for nth_value was changed to
match lead/lag (which had already been fixed).

## Are these changes tested?

Yes, I added tests to metadata.slt

## Are there any user-facing changes?

No
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #22137 .

 ## Rationale for this change

`RANGE` window frames with a value offset (e.g. `RANGE BETWEEN 4
PRECEDING AND 4 FOLLOWING`) panicked with `attempt to add/subtract with
overflow` whenever the boundary target (`value ± delta`) wrapped
past the type's representable range. Affected inputs include values
close to `i64::MAX`/`i64::MIN`, `u64::MAX`, and any analogous boundary
for other integer/decimal/timestamp types.

Two `// TODO: Handle ... overflows.` markers in
`WindowFrameStateRange::calculate_index_of_row` had been left for this
case; the unchecked `ScalarValue::add` / `sub` silently wrapped the
target, after
which `search_in_slice` was handed a nonsensical (wrapped) value and
downstream code tripped a debug-assert subtraction in
`functions-window/src/nth_value.rs`.

Semantically, an overflowed boundary is *unbounded* with respect to the
data in the partition — every real value lies strictly inside the
wrapped sentinel — so the correct behavior is to collapse the
search to the appropriate partition edge rather than to search with a
wrapped target.

  ## What changes are included in this PR?

  `datafusion/expr/src/window_state.rs`

- Replace `ScalarValue::add` / `sub` with their `*_checked` counterparts
in the boundary computation.
- On overflow, short-circuit to the correct partition edge:
`search_start` for `PRECEDING`-direction searches, `length` for
`FOLLOWING`-direction searches. The collapse direction depends only on
the
const-generic `SEARCH_SIDE` (the add branch and sub branch both reduce
to `!SEARCH_SIDE` once you expand the `SEARCH_SIDE == is_descending`
invariant that selects each arithmetic branch).
- The pre-existing `value.is_unsigned() && value < delta` clamp-to-zero
path for unsigned subtraction is preserved — it produces a valid
polymorphic zero, not an overflow sentinel.
  - No behavior change on the non-overflow path.

  `datafusion/sqllogictest/test_files/window.slt`

  Regression coverage for positive and negative overflow, across:
- `ASC` + `FOLLOWING` / `ASC` + `PRECEDING` / `DESC` + `PRECEDING` /
`DESC` + `FOLLOWING` (each overflow direction occurs on both sort orders
depending on which arithmetic branch is taken)
- Symmetric `N PRECEDING AND N FOLLOWING` frames where only one side
overflows
  - Signed (`i64`) and unsigned (`u64`) ordering columns
- `first_value` and `last_value` both exercised to verify both frame
edges
- `ROWS` frame regression guard to document that the pre-existing
`saturating_sub` / `min(length)` saturation behavior is unchanged.
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #22138 .

 ## Rationale for this change

`AVG` used as a window aggregate can return `NaN` (and, for `Decimal` /
`Duration`, panic on integer division by zero) when every value in the
window frame is NULL.

  ```sql
  SELECT i,
AVG(v) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING)
  FROM (VALUES (1,1), (2,2), (3,NULL), (4,NULL)) t(i,v);
  ```

  | i | current output | expected (DuckDB/PgSQL) |
  |---|----------------|-------------------|
  | 1 | 1.5            | 1.5               |
  | 2 | 2.0            | 2.0               |
  | 3 | **NaN**        | **NULL**          |
  | 4 | **NaN**        | **NULL**          |

Root cause: sliding-window execution calls `Accumulator::retract_batch`
as rows leave the frame. Once every contributing value has been
retracted, `self.count` drops back to `0` but `self.sum` stays
`Some(0.0)` (or a tiny floating-point residual). `evaluate()` then
computes `sum / 0`, which yields `NaN` on `Float64`, and would panic
with integer division by zero on `DecimalAvgAccumulator` and
  `DurationAvgAccumulator`.
  
The non-sliding aggregation path is unaffected because there `sum`
becomes `Some(_)` only after at least one non-NULL value has been added,
so `count == 0` implies `sum == None`.

  ## What changes are included in this PR?

`datafusion/functions-aggregate/src/average.rs` — guard all three
affected `evaluate()` implementations with an explicit `count == 0 →
None` short-circuit:

  - `AvgAccumulator::evaluate` (Float64)
  - `DecimalAvgAccumulator::evaluate` (Decimal32/64/128/256)
  - `DurationAvgAccumulator::evaluate` (Duration*)

This matches the idiom already used by sibling retractable accumulators
(`variance.rs` uses an explicit `match self.count` before division;
`sum.rs` uses a `(self.count != 0).then_some(..)` guard).
@pull pull Bot locked and limited conversation to collaborators May 14, 2026
@pull pull Bot added the ⤵️ pull label May 14, 2026
@pull pull Bot merged commit 7f2f78d into buraksenn:main May 14, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants