Describe the problem
The JVM-scalar-UDF dispatch path (on the prototype branch prototype-jvm-scalar-udf) is invisible at runtime:
- No SQL metric distinguishes "rlike ran on the JVM path" from "rlike ran natively".
- No log line on first dispatch per executor identifies which UDF is being routed and which classloader resolved it.
EXPLAIN output for JvmScalarUdfExpr is not surfaced in Spark's plan summary.
This makes "why is rlike slow in prod?" or "why is the JVM path being chosen?" unanswerable without source-level debugging.
Describe the potential solution
Add SQL metrics on the parent CometNativeExec (or a dedicated metric collector keyed by UDF class name):
numJvmUdfBatches — counter, batches dispatched
jvmUdfElapsedNanos — accumulator, time spent in JNI + UDF body
jvmUdfBytesIn / jvmUdfBytesOut — optional, for transfer-cost visibility
And add an info-level log per (executor, UDF class name) on first dispatch, naming the resolved class and the classloader that resolved it.
Optionally, surface JvmScalarUdf(<class_name>) as the operator name in EXPLAIN by hooking the appropriate Comet plan-formatter site.
Additional context
Identified during code review of the JVM-scalar-UDF prototype. Filed as a follow-up so the prototype PR can ship without a metrics-design decision.
Describe the problem
The JVM-scalar-UDF dispatch path (on the prototype branch
prototype-jvm-scalar-udf) is invisible at runtime:EXPLAINoutput forJvmScalarUdfExpris not surfaced in Spark's plan summary.This makes "why is rlike slow in prod?" or "why is the JVM path being chosen?" unanswerable without source-level debugging.
Describe the potential solution
Add SQL metrics on the parent
CometNativeExec(or a dedicated metric collector keyed by UDF class name):numJvmUdfBatches— counter, batches dispatchedjvmUdfElapsedNanos— accumulator, time spent in JNI + UDF bodyjvmUdfBytesIn/jvmUdfBytesOut— optional, for transfer-cost visibilityAnd add an
info-level log per(executor, UDF class name)on first dispatch, naming the resolved class and the classloader that resolved it.Optionally, surface
JvmScalarUdf(<class_name>)as the operator name in EXPLAIN by hooking the appropriate Comet plan-formatter site.Additional context
Identified during code review of the JVM-scalar-UDF prototype. Filed as a follow-up so the prototype PR can ship without a metrics-design decision.