Skip to content

Relax OneOf schema comparison to ignore nullability differences#37

Merged
zhuqi-lucas merged 1 commit into
branch-52from
fix/oneof-nullability-mismatch
Apr 1, 2026
Merged

Relax OneOf schema comparison to ignore nullability differences#37
zhuqi-lucas merged 1 commit into
branch-52from
fix/oneof-nullability-mismatch

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Collaborator

Summary

OneOf candidate schema comparison now ignores nullability differences, only checking field names and data types.

Problem

After DF 52's removal of SchemaAdapter, FileScanTable must force all file columns nullable to pass RecordBatch::try_new_with_options validation. This causes OneOf candidates to have mismatched nullability:

  • day_aggs_v3: date is a partition column → non-nullable
  • day_aggs_alltime_v3: date is a file column → forced nullable

OneOf's strict schema != schema comparison rejected these as incompatible, causing:

Error during planning: candidate logical plans should have the same schema

Fix

Replace strict PartialEq schema comparison with schemas_equal_ignoring_nullability() that only checks field names and data types. Nullability differences between candidates are acceptable because they don't affect query correctness — OneOf just picks the best candidate to execute.

Tests

4 unit tests for schemas_equal_ignoring_nullability:

  • Schemas equal when only nullability differs
  • Schemas not equal when types differ
  • Schemas not equal when names differ
  • Schemas not equal when field count differs

Copilot AI review requested due to automatic review settings April 1, 2026 09:18
Different table types (FileScanTable, LiveTable, MV rollups) may expose
different nullability for the same column. For example:
- A partition column in one table is non-nullable
- The same column in a rollup MV is a file column, forced nullable for
  DF 52 RecordBatch validation compatibility

Previously, OneOf rejected candidates with any schema difference
including nullability. Now it only checks field names and data types,
allowing nullable/non-nullable variants of the same column.
@zhuqi-lucas zhuqi-lucas force-pushed the fix/oneof-nullability-mismatch branch from b64bbd7 to 7202ae6 Compare April 1, 2026 09:18
@zhuqi-lucas zhuqi-lucas merged commit a16868e into branch-52 Apr 1, 2026
8 checks passed
@zhuqi-lucas zhuqi-lucas deleted the fix/oneof-nullability-mismatch branch April 1, 2026 09:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the OneOf candidate compatibility checks in the view exploitation planner to treat schemas as equal even when they differ only by field nullability, addressing mismatches introduced by forced-nullable file columns.

Changes:

  • Replaced strict logical/physical schema equality checks with a comparison that ignores field nullability.
  • Added schemas_equal_ignoring_nullability() helper for Arrow schemas.
  • Added unit tests covering equality/inequality cases for the new helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +257 to +261
.any(|schema| {
!schemas_equal_ignoring_nullability(
schema.as_arrow(),
logical_inputs[0].schema().as_arrow(),
)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By relaxing the logical schema compatibility check to ignore nullability, OneOf can now contain branches whose DFSchema differs from branches[0].schema() (which is what OneOf::schema() reports). This makes the OneOf node’s output schema potentially stricter (non-nullable) than some candidate branches, which can violate DataFusion’s expectation that a logical node’s schema matches all its inputs and can enable incorrect optimizations based on non-nullability. Consider normalizing the OneOf schema to a merged/most-permissive schema (nullable if any branch is nullable) or otherwise ensuring the OneOf-reported schema is compatible with every branch when nullability differs.

Copilot uses AI. Check for mistakes.
Comment on lines 254 to +258
if logical_inputs
.iter()
.map(|plan| plan.schema())
.any(|schema| schema != logical_inputs[0].schema())
.any(|schema| {
!schemas_equal_ignoring_nullability(
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: logical_inputs[0].schema().as_arrow() is recomputed inside the .any(...) closure for every input. Consider binding the baseline schema once before the iterator (and similarly for the physical schema check) to reduce repeated work and make the comparison a bit easier to read.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants