Skip to content

Make JVM-scalar-UDF dispatch responsive to task cancellation #4175

@andygrove

Description

@andygrove

Describe the problem

In RegExpLikeUDF.evaluate (common/src/main/scala/org/apache/comet/udf/RegExpLikeUDF.scala, on the JVM-scalar-UDF prototype branch) the per-row loop calls pattern.matcher(s).find() synchronously and uninterruptibly. A pathological backtracking pattern can run for seconds (or longer) on a single row, during which Spark task cancellation (executor task kill, query cancel) has no effect: the batch must finish before any cancel signal is observed.

The same issue applies to any future stateful CometUDF that does heavy per-row work. Native side JvmScalarUdfExpr::evaluate also has no cancellation check between batches.

Describe the potential solution

  1. In JvmScalarUdfExpr::evaluate, check DataFusion's cancellation token (via Context) before crossing JNI for each batch.
  2. In long-running UDFs, periodically check TaskContext.get().isInterrupted() inside the row loop (e.g. every N rows) and throw InterruptedException so the batch fails fast and Spark's cancel propagates.
  3. Document the contract on the CometUDF trait: implementations of long-running UDFs should poll TaskContext.isInterrupted() periodically.

For RegExpLikeUDF specifically, even checking once per batch (before the row loop) would catch most cancel windows.

Additional context

Identified during code review of the JVM-scalar-UDF prototype. Filed as a follow-up so the prototype PR can ship without a cancellation-protocol design.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:expressionsExpression evaluationpriority:mediumFunctional bugs, performance regressions, broken features

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions