Skip to content

Tighten CometUDF API with input/return type validation at registration #4173

@andygrove

Description

@andygrove

Describe the problem

The CometUDF trait (common/src/main/scala/org/apache/comet/udf/CometUDF.scala, introduced on the JVM-scalar-UDF prototype branch prototype-jvm-scalar-udf) is too thin to support fail-fast validation:

trait CometUDF {
  def evaluate(inputs: Array[ValueVector]): ValueVector
}

Today the proto carries class_name, args, return_type, and return_nullable. There is no compile-time or registration-time check that the UDF actually consumes the declared input types or produces the declared return type. A mismatch (e.g. UDF returns IntVector but proto declares Boolean) only manifests as a hard JVM crash inside Arrow's from_ffi on the native side.

Describe the potential solution

Extend the trait with self-describing methods:

trait CometUDF {
  def inputTypes: Seq[DataType]
  def returnType: DataType
  def nullable: Boolean
  def evaluate(inputs: Array[ValueVector]): ValueVector
}

The bridge can then validate inputs.length == inputTypes.length and that each input vector's Arrow type matches inputTypes(i) before dispatch, and that the returned vector matches returnType before export. Mismatches produce a Java exception that surfaces as a clean DataFusion error rather than crashing the executor.

A registry could also be added so users register CometUDF impls by name, decoupling the convert path from Class.forName lookups and making the API's surface explicit.

Additional context

Identified during code review of the JVM-scalar-UDF prototype. Filed for follow-up so the prototype PR can ship without a public-API design decision.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:expressionsExpression evaluationpriority:mediumFunctional bugs, performance regressions, broken features

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions