Skip to content

Add sqllogictest coverage for unused UNNEST pruning edge cases#22074

Open
kosiew wants to merge 2 commits intoapache:mainfrom
kosiew:unnested-pruned-01-20118
Open

Add sqllogictest coverage for unused UNNEST pruning edge cases#22074
kosiew wants to merge 2 commits intoapache:mainfrom
kosiew:unnested-pruned-01-20118

Conversation

@kosiew
Copy link
Copy Markdown
Contributor

@kosiew kosiew commented May 8, 2026

Which issue does this PR close?

Rationale for this change

This PR adds test-only coverage documenting the current optimization gap around unused UNNEST outputs.

The new tests capture a case where the unnested column becomes duplicate-insensitive under a GROUP BY, while also documenting counterexamples where removing UNNEST would incorrectly change row cardinality or null/empty-array semantics. These tests are intended to guide future optimizer work without changing current behavior.

What changes are included in this PR?

  • Added a regression dataset in sqllogictest/test_files/unnest.slt.
  • Added a reproducer showing an unused UNNEST output under GROUP BY.
  • Added EXPLAIN assertions documenting that the current logical and physical plans still contain Unnest / UnnestExec.
  • Added counterexamples demonstrating cases where removing UNNEST would change result cardinality.
  • Added coverage for empty and NULL array semantics to document current select-list UNNEST behavior.
  • Added cleanup for the temporary test table.

Are these changes tested?

Yes.

This PR adds SQL logic tests in datafusion/sqllogictest/test_files/unnest.slt, including:

  • A reproducer for unused UNNEST output below GROUP BY
  • EXPLAIN plan assertions for Unnest and UnnestExec
  • Cardinality-sensitive counterexamples
  • Empty/NULL array semantic coverage

Are there any user-facing changes?

No. This PR only adds tests and documentation of current behavior; it does not change optimizer behavior or query semantics.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

kosiew added 2 commits May 8, 2026 13:47
…d cardinality issues

- Added tests to check for unused UNNEST under GROUP BY; confirmed plan includes Unnest / UnnestExec.
- Introduced a counterexample for plain SELECT id showcasing cardinality changes.
- Added a test for empty/null array cases where rows are dropped but count remains 2.
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label May 8, 2026
@kosiew kosiew marked this pull request as ready for review May 8, 2026 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant