Describe the problem
In RegExpLikeUDF.evaluate (common/src/main/scala/org/apache/comet/udf/RegExpLikeUDF.scala, on the JVM-scalar-UDF prototype branch) the per-row loop calls pattern.matcher(s).find() synchronously and uninterruptibly. A pathological backtracking pattern can run for seconds (or longer) on a single row, during which Spark task cancellation (executor task kill, query cancel) has no effect: the batch must finish before any cancel signal is observed.
The same issue applies to any future stateful CometUDF that does heavy per-row work. Native side JvmScalarUdfExpr::evaluate also has no cancellation check between batches.
Describe the potential solution
- In
JvmScalarUdfExpr::evaluate, check DataFusion's cancellation token (via Context) before crossing JNI for each batch.
- In long-running UDFs, periodically check
TaskContext.get().isInterrupted() inside the row loop (e.g. every N rows) and throw InterruptedException so the batch fails fast and Spark's cancel propagates.
- Document the contract on the
CometUDF trait: implementations of long-running UDFs should poll TaskContext.isInterrupted() periodically.
For RegExpLikeUDF specifically, even checking once per batch (before the row loop) would catch most cancel windows.
Additional context
Identified during code review of the JVM-scalar-UDF prototype. Filed as a follow-up so the prototype PR can ship without a cancellation-protocol design.
Describe the problem
In
RegExpLikeUDF.evaluate(common/src/main/scala/org/apache/comet/udf/RegExpLikeUDF.scala, on the JVM-scalar-UDF prototype branch) the per-row loop callspattern.matcher(s).find()synchronously and uninterruptibly. A pathological backtracking pattern can run for seconds (or longer) on a single row, during which Spark task cancellation (executor task kill, query cancel) has no effect: the batch must finish before any cancel signal is observed.The same issue applies to any future stateful CometUDF that does heavy per-row work. Native side
JvmScalarUdfExpr::evaluatealso has no cancellation check between batches.Describe the potential solution
JvmScalarUdfExpr::evaluate, check DataFusion's cancellation token (viaContext) before crossing JNI for each batch.TaskContext.get().isInterrupted()inside the row loop (e.g. every N rows) and throwInterruptedExceptionso the batch fails fast and Spark's cancel propagates.CometUDFtrait: implementations of long-running UDFs should pollTaskContext.isInterrupted()periodically.For
RegExpLikeUDFspecifically, even checking once per batch (before the row loop) would catch most cancel windows.Additional context
Identified during code review of the JVM-scalar-UDF prototype. Filed as a follow-up so the prototype PR can ship without a cancellation-protocol design.