DRAFT - NOT READY: Fix array_except nullability mismatch#4237
Draft
yuboxx wants to merge 1 commit intoapache:mainfrom
Draft
DRAFT - NOT READY: Fix array_except nullability mismatch#4237yuboxx wants to merge 1 commit intoapache:mainfrom
yuboxx wants to merge 1 commit intoapache:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #3646.
Draft status
Draft PR, not ready for review. This PR publishes the proposed direction and tests for visibility, but it still has a known blocker before review should be requested: the runtime Arrow list type may diverge from the
ScalarFunctionExprfield that Comet planned before the wrapper was installed.Rationale for this change
Spark arrays carry an element nullability flag through
ArrayType(..., containsNull = ...). Equivalent Spark expressions can produce the same element type with differentcontainsNullvalues. For example, a literal array such asarray(1, 2, 3)can be represented as non-null elements, while an array built from a nullable column can be represented as nullable elements.DataFusion's nested
array_exceptimplementation validates inputs with strict Arrow datatype equality. That equality includes list child field nullability, so Comet can reject otherwise compatible Spark arrays with an error likearray_except received incompatible types: List(Int32), List(non-null Int32). Issue #3646 reports this failure for the SQL array expression tests.The proposed fix treats list child nullability as a Spark/Arrow representation detail for
array_exceptcompatibility. Before delegating to DataFusion's implementation, Comet normalizes list child fields to nullable so inputs that differ only bycontainsNullcan pass DataFusion's datatype check.