[pull] main from apache:main#188
Merged
Merged
Conversation
…UE window functions (#22112) ## Which issue does this PR close? - Closes #22108 ## Rationale for this change lead and lag both preserve metadata from the input field but nth_value, first_value, and last_value do not. ## What changes are included in this PR? The mechanism to calcluate the return field for nth_value was changed to match lead/lag (which had already been fixed). ## Are these changes tested? Yes, I added tests to metadata.slt ## Are there any user-facing changes? No
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #22137 . ## Rationale for this change `RANGE` window frames with a value offset (e.g. `RANGE BETWEEN 4 PRECEDING AND 4 FOLLOWING`) panicked with `attempt to add/subtract with overflow` whenever the boundary target (`value ± delta`) wrapped past the type's representable range. Affected inputs include values close to `i64::MAX`/`i64::MIN`, `u64::MAX`, and any analogous boundary for other integer/decimal/timestamp types. Two `// TODO: Handle ... overflows.` markers in `WindowFrameStateRange::calculate_index_of_row` had been left for this case; the unchecked `ScalarValue::add` / `sub` silently wrapped the target, after which `search_in_slice` was handed a nonsensical (wrapped) value and downstream code tripped a debug-assert subtraction in `functions-window/src/nth_value.rs`. Semantically, an overflowed boundary is *unbounded* with respect to the data in the partition — every real value lies strictly inside the wrapped sentinel — so the correct behavior is to collapse the search to the appropriate partition edge rather than to search with a wrapped target. ## What changes are included in this PR? `datafusion/expr/src/window_state.rs` - Replace `ScalarValue::add` / `sub` with their `*_checked` counterparts in the boundary computation. - On overflow, short-circuit to the correct partition edge: `search_start` for `PRECEDING`-direction searches, `length` for `FOLLOWING`-direction searches. The collapse direction depends only on the const-generic `SEARCH_SIDE` (the add branch and sub branch both reduce to `!SEARCH_SIDE` once you expand the `SEARCH_SIDE == is_descending` invariant that selects each arithmetic branch). - The pre-existing `value.is_unsigned() && value < delta` clamp-to-zero path for unsigned subtraction is preserved — it produces a valid polymorphic zero, not an overflow sentinel. - No behavior change on the non-overflow path. `datafusion/sqllogictest/test_files/window.slt` Regression coverage for positive and negative overflow, across: - `ASC` + `FOLLOWING` / `ASC` + `PRECEDING` / `DESC` + `PRECEDING` / `DESC` + `FOLLOWING` (each overflow direction occurs on both sort orders depending on which arithmetic branch is taken) - Symmetric `N PRECEDING AND N FOLLOWING` frames where only one side overflows - Signed (`i64`) and unsigned (`u64`) ordering columns - `first_value` and `last_value` both exercised to verify both frame edges - `ROWS` frame regression guard to document that the pre-existing `saturating_sub` / `min(length)` saturation behavior is unchanged.
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #22138 . ## Rationale for this change `AVG` used as a window aggregate can return `NaN` (and, for `Decimal` / `Duration`, panic on integer division by zero) when every value in the window frame is NULL. ```sql SELECT i, AVG(v) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES (1,1), (2,2), (3,NULL), (4,NULL)) t(i,v); ``` | i | current output | expected (DuckDB/PgSQL) | |---|----------------|-------------------| | 1 | 1.5 | 1.5 | | 2 | 2.0 | 2.0 | | 3 | **NaN** | **NULL** | | 4 | **NaN** | **NULL** | Root cause: sliding-window execution calls `Accumulator::retract_batch` as rows leave the frame. Once every contributing value has been retracted, `self.count` drops back to `0` but `self.sum` stays `Some(0.0)` (or a tiny floating-point residual). `evaluate()` then computes `sum / 0`, which yields `NaN` on `Float64`, and would panic with integer division by zero on `DecimalAvgAccumulator` and `DurationAvgAccumulator`. The non-sliding aggregation path is unaffected because there `sum` becomes `Some(_)` only after at least one non-NULL value has been added, so `count == 0` implies `sum == None`. ## What changes are included in this PR? `datafusion/functions-aggregate/src/average.rs` — guard all three affected `evaluate()` implementations with an explicit `count == 0 → None` short-circuit: - `AvgAccumulator::evaluate` (Float64) - `DecimalAvgAccumulator::evaluate` (Decimal32/64/128/256) - `DurationAvgAccumulator::evaluate` (Duration*) This matches the idiom already used by sibling retractable accumulators (`variance.rs` uses an explicit `match self.count` before division; `sum.rs` uses a `(self.count != 0).then_some(..)` guard).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )