Optimize away unused UNNEST under duplicate-insensitive aggregates#22161
Open
kosiew wants to merge 3 commits into
Open
Optimize away unused UNNEST under duplicate-insensitive aggregates#22161kosiew wants to merge 3 commits into
UNNEST under duplicate-insensitive aggregates#22161kosiew wants to merge 3 commits into
Conversation
- Pruned unused Unnest under aggregate/group-by when safe. - Handled direct Unnest and Projection -> Unnest scenarios. - Ensured Unnest is retained when empty/null lists may drop rows. - Added unit tests to cover these changes. - Updated safe GROUP BY explain in sqllogictest: no Unnest/UnnestExec. - Added regression tests for empty/null grouped scenarios to ensure Unnest retention.
- Added a unit test for the optimizer to verify that GROUP BY elements keep UNNEST in the case of `keep_referenced_unnest_under_group_by`. - Implemented SLT DISTINCT regression test for `SELECT DISTINCT id ... UNNEST(make_array(...))`, ensuring the correctness of results and that UNNEST is pruned via the distinct-to-aggregate path.
- Removed Vec allocation for unnested input indices to enhance performance. - Replaced try_fold boolean flow with a more explicit for loop for better readability. - Introduced is_unnested_input_index function to clarify input index processing. - Simplified logic for handling empty lists in .all() calls. - Added has_valid_first_value helper for validation purposes. - Introduced new test helpers: id_schema, list_literal_expr, and id_elem_unnest_plan to facilitate testing. - Streamlined repeated optimizer test setup for efficiency. - Improved construction of empty-list tests for clarity.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
This change implements a conservative optimization for removing unused
UNNESToperators in cases where the parent operator is duplicate-insensitive and the unnested output is not referenced.Previously, queries such as
GROUP BYorDISTINCTover non-unnested columns would still retainUNNESTeven when it only introduced duplicate rows and had no effect on the final grouped result. However, removingUNNESTis only safe in narrowly scoped cases because empty or NULL lists can change row cardinality by removing rows entirely.This PR adds a targeted optimization that only removes
UNNESTwhen it is provably semantics-preserving.What changes are included in this PR?
Extend projection optimization for
Aggregateplans to detect removableUNNESTinputs.Add logic to eliminate
LogicalPlan::Unnestwhen:GROUP BYwith no aggregate expressions, includingDISTINCT),UNNESTinput is provably guaranteed to preserve at least one row per input row.Support pruning through an intermediate
Projection.Add safety checks to avoid removing
UNNESTwhen:Add helper logic for detecting non-empty literal list inputs across supported list scalar types.
Are these changes tested?
Yes.
Added targeted optimizer unit tests covering:
UNNESTunderGROUP BY,Added sqllogictest coverage for:
GROUP BYpruning,DISTINCTpruning,UNNESTwould change cardinality.Are there any user-facing changes?
This change improves logical and physical plan optimization for certain queries involving unused
UNNESTexpressions underGROUP BYorDISTINCT, but does not introduce user-facing API changes.LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.