Describe the problem
The CometUDF trait (common/src/main/scala/org/apache/comet/udf/CometUDF.scala, introduced on the JVM-scalar-UDF prototype branch prototype-jvm-scalar-udf) is too thin to support fail-fast validation:
trait CometUDF {
def evaluate(inputs: Array[ValueVector]): ValueVector
}
Today the proto carries class_name, args, return_type, and return_nullable. There is no compile-time or registration-time check that the UDF actually consumes the declared input types or produces the declared return type. A mismatch (e.g. UDF returns IntVector but proto declares Boolean) only manifests as a hard JVM crash inside Arrow's from_ffi on the native side.
Describe the potential solution
Extend the trait with self-describing methods:
trait CometUDF {
def inputTypes: Seq[DataType]
def returnType: DataType
def nullable: Boolean
def evaluate(inputs: Array[ValueVector]): ValueVector
}
The bridge can then validate inputs.length == inputTypes.length and that each input vector's Arrow type matches inputTypes(i) before dispatch, and that the returned vector matches returnType before export. Mismatches produce a Java exception that surfaces as a clean DataFusion error rather than crashing the executor.
A registry could also be added so users register CometUDF impls by name, decoupling the convert path from Class.forName lookups and making the API's surface explicit.
Additional context
Identified during code review of the JVM-scalar-UDF prototype. Filed for follow-up so the prototype PR can ship without a public-API design decision.
Describe the problem
The
CometUDFtrait (common/src/main/scala/org/apache/comet/udf/CometUDF.scala, introduced on the JVM-scalar-UDF prototype branchprototype-jvm-scalar-udf) is too thin to support fail-fast validation:Today the proto carries
class_name,args,return_type, andreturn_nullable. There is no compile-time or registration-time check that the UDF actually consumes the declared input types or produces the declared return type. A mismatch (e.g. UDF returnsIntVectorbut proto declares Boolean) only manifests as a hard JVM crash inside Arrow'sfrom_ffion the native side.Describe the potential solution
Extend the trait with self-describing methods:
The bridge can then validate
inputs.length == inputTypes.lengthand that each input vector's Arrow type matchesinputTypes(i)before dispatch, and that the returned vector matchesreturnTypebefore export. Mismatches produce a Java exception that surfaces as a clean DataFusion error rather than crashing the executor.A registry could also be added so users register
CometUDFimpls by name, decoupling the convert path fromClass.forNamelookups and making the API's surface explicit.Additional context
Identified during code review of the JVM-scalar-UDF prototype. Filed for follow-up so the prototype PR can ship without a public-API design decision.