Skip to content

Unifying Operator Handling with the Scalar Function Framework #20018

@Acfboy

Description

@Acfboy

Is your feature request related to a problem or challenge?

While working on #11250, I noticed that the current handling of operators is quite scattered and complex. This is the root cause of why operator diagnostic messages are less user-friendly than those of scalar functions, and it may creates maintenance hurdles.

For example:

  • For SELECT 2.0 << 3.5;, constant folding occurs during the optimization phase, the type mismatch is caught in evaluate_with_resolved_args.
  • But for SELECT 1 + 'a';, the error is caught during projection_schema in the LogicalPlan phase.:
  • And for SELECT a << b FROM ...;, the error not be surfaced until the physical execution phase in evaluate_expressions_to_arrays_with_metrics.

Describe the solution you'd like

I propose that we gradually refactor operator handling by bringing it under the same framework as scalar functions.

In my initial thinking, the roadmap is like this:

  1. Start by defining signatures for simple binary operators. This allows us to reuse the existing function-based type coercion logic during LogicalPlan generation.
  2. Implement rewrite rules to gradually transform these operators into function-based calls.
  3. Extend this to more unary and binary operations.
  4. Finally, address operators like LIKE, which have unique syntax or optimization paths compared to standard binary operators.

Describe alternatives you've considered

No response

Additional context

What do the maintainers think about this direction? I would love to hear your thoughts on whether this unification aligns with the long-term vision of DataFusion, or suggestions on task decomposition and how to best phase this refactor.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions