Describe the problem
CometUdfBridge.evaluate (common/src/main/java/org/apache/comet/udf/CometUdfBridge.java, on the JVM-scalar-UDF prototype branch) allocates output Arrow vectors via the project-wide CometArrowAllocator. That allocator is a RootAllocator that is not registered with Spark's TaskMemoryManager, so off-heap memory consumed by the UDF dispatch path is invisible to Spark's task memory accounting and back-pressure machinery.
Under workloads with many concurrent JVM-UDF tasks per executor, this can drive native off-heap usage past the operator-level limits Spark would otherwise enforce.
Describe the potential solution
Either:
- Register
CometArrowAllocator as a MemoryConsumer in Spark's TaskMemoryManager so allocations and frees update the task's accounting.
- Allocate UDF output vectors from a child allocator that is itself registered as a per-task consumer, so leakage and accounting stay scoped to the task.
Option (2) is closer to the existing Spark-Arrow integration pattern.
Additional context
Identified during code review of the JVM-scalar-UDF prototype. Filed as a follow-up so the prototype PR can ship without a Spark-integration redesign.
Describe the problem
CometUdfBridge.evaluate(common/src/main/java/org/apache/comet/udf/CometUdfBridge.java, on the JVM-scalar-UDF prototype branch) allocates output Arrow vectors via the project-wideCometArrowAllocator. That allocator is aRootAllocatorthat is not registered with Spark'sTaskMemoryManager, so off-heap memory consumed by the UDF dispatch path is invisible to Spark's task memory accounting and back-pressure machinery.Under workloads with many concurrent JVM-UDF tasks per executor, this can drive native off-heap usage past the operator-level limits Spark would otherwise enforce.
Describe the potential solution
Either:
CometArrowAllocatoras aMemoryConsumerin Spark'sTaskMemoryManagerso allocations and frees update the task's accounting.Option (2) is closer to the existing Spark-Arrow integration pattern.
Additional context
Identified during code review of the JVM-scalar-UDF prototype. Filed as a follow-up so the prototype PR can ship without a Spark-integration redesign.