Skip to content

DRAFT - NOT READY: Fix array_except nullability mismatch#4237

Draft
yuboxx wants to merge 1 commit intoapache:mainfrom
yuboxx:fix-array-except-nullability-3646
Draft

DRAFT - NOT READY: Fix array_except nullability mismatch#4237
yuboxx wants to merge 1 commit intoapache:mainfrom
yuboxx:fix-array-except-nullability-3646

Conversation

@yuboxx
Copy link
Copy Markdown
Contributor

@yuboxx yuboxx commented May 6, 2026

Which issue does this PR close?

Closes #3646.

Draft status

Draft PR, not ready for review. This PR publishes the proposed direction and tests for visibility, but it still has a known blocker before review should be requested: the runtime Arrow list type may diverge from the ScalarFunctionExpr field that Comet planned before the wrapper was installed.

Rationale for this change

Spark arrays carry an element nullability flag through ArrayType(..., containsNull = ...). Equivalent Spark expressions can produce the same element type with different containsNull values. For example, a literal array such as array(1, 2, 3) can be represented as non-null elements, while an array built from a nullable column can be represented as nullable elements.

DataFusion's nested array_except implementation validates inputs with strict Arrow datatype equality. That equality includes list child field nullability, so Comet can reject otherwise compatible Spark arrays with an error like array_except received incompatible types: List(Int32), List(non-null Int32). Issue #3646 reports this failure for the SQL array expression tests.

The proposed fix treats list child nullability as a Spark/Arrow representation detail for array_except compatibility. Before delegating to DataFusion's implementation, Comet normalizes list child fields to nullable so inputs that differ only by containsNull can pass DataFusion's datatype check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

arrays_except type mismatch

1 participant