Skip to content

Conversation

@mohanakatari119-bit
Copy link

Root Cause

For LeftMark/RightMark joins:

  • The output schema is [left columns, mark column] (for LeftMark) or [right columns, mark column] (for RightMark)
  • The mark column is synthetic, added by the join itself
  • The previous implementation treated LeftMark/RightMark like regular joins
  • When split_join_requirements split indices at left_len, it incorrectly tried to route the mark column requirement to the right child
  • This caused a schema mismatch because the mark column doesn't exist in either child

Solution

Modified split_join_requirements to handle LeftMark and RightMark joins specially:

  • For LeftMark: Route indices before left_len to left child, discard mark column requirement
  • For RightMark: Route indices before left_len to right child, discard mark column requirement

Test Plan

Added test exists_or_exists_with_join that:

  1. Creates a query with EXISTS(subquery_a) OR EXISTS(subquery_b) filter
  2. Follows it with a LEFT JOIN
  3. Runs through full optimizer pipeline
  4. Verifies no schema mismatch error occurs

Fixes #20083

Fixes schema mismatch error when EXISTS subqueries in OR conditions
are followed by join operations. LeftMark and RightMark joins have
a synthetic mark column that shouldn't be routed to child nodes.

Fixes apache#20083
@github-actions github-actions bot added the optimizer Optimizer rules label Jan 30, 2026
@mohanakatari119-bit
Copy link
Author

@alamb @findepi @viirya @korowa , can you review this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

optimize_projections faild after mark-join involved

1 participant