You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CometUrlDecodeStaticInvoke (in spark/src/main/scala/org/apache/comet/serde/statics.scala) only reads expr.children.head and emits scalarFunctionExprToProto("url_decode", child). The Literal(false) carrying the failOnError flag is silently dropped. The native SparkUrlDecode (datafusion-spark) always errors on malformed percent-encoding, so the Comet path errors where Spark would have returned NULL.
This affects only Spark 4.0+ (try_url_decode does not exist in 3.4 / 3.5).
Reproduction (Spark 4.0)
-- Spark returns NULLSELECT try_url_decode('http%3A%2F%2spark.apache.org');
-- With Comet enabled, this throws "Invalid percent-encoding: ..." instead of returning NULL.
Suggested fix
Either of:
Detect the Spark-4.0 StaticInvoke(UrlCodec, "decode", Seq(child, Literal(false)), ...) shape in CometUrlDecodeStaticInvoke and emit a try_url_decode scalar function. datafusion_spark::function::url::try_url_decode::TryUrlDecode already exists in the datafusion-spark crate but is not registered in native/core/src/execution/jni_api.rs. Registering it (similar to how SparkTryParseUrl is already registered) would make the dispatch trivial.
Mark this StaticInvoke shape as Unsupported / Incompatible so Comet falls back to Spark whenever failOnError=false.
Notes
For url_decode(x) (i.e. failOnError=true, the default), Comet's behavior matches Spark: both error on malformed input. The error class differs (CANNOT_DECODE_URL vs DataFusion's Invalid percent-encoding) but both fail.
This was discovered while auditing the url_decode expression added in feat: add support for parse_url, url_encode, url_decode #4152. A test for url_decode malformed input was added in that PR. A try_url_decode test should be added once this issue is fixed.
Describe the bug
On Spark 4.0,
try_url_decode(x)should returnNULLwhenxis not a valid percent-encoded string. Comet errors instead.Root cause
Spark 4.0 rewrites
try_url_decode(x)through Catalyst'sRuntimeReplaceablechain:CometUrlDecodeStaticInvoke(inspark/src/main/scala/org/apache/comet/serde/statics.scala) only readsexpr.children.headand emitsscalarFunctionExprToProto("url_decode", child). TheLiteral(false)carrying thefailOnErrorflag is silently dropped. The nativeSparkUrlDecode(datafusion-spark) always errors on malformed percent-encoding, so the Comet path errors where Spark would have returned NULL.This affects only Spark 4.0+ (
try_url_decodedoes not exist in 3.4 / 3.5).Reproduction (Spark 4.0)
Suggested fix
Either of:
StaticInvoke(UrlCodec, "decode", Seq(child, Literal(false)), ...)shape inCometUrlDecodeStaticInvokeand emit atry_url_decodescalar function.datafusion_spark::function::url::try_url_decode::TryUrlDecodealready exists in thedatafusion-sparkcrate but is not registered innative/core/src/execution/jni_api.rs. Registering it (similar to howSparkTryParseUrlis already registered) would make the dispatch trivial.Unsupported/Incompatibleso Comet falls back to Spark wheneverfailOnError=false.Notes
url_decode(x)(i.e.failOnError=true, the default), Comet's behavior matches Spark: both error on malformed input. The error class differs (CANNOT_DECODE_URLvs DataFusion'sInvalid percent-encoding) but both fail.url_decodeexpression added in feat: add support for parse_url, url_encode, url_decode #4152. A test forurl_decodemalformed input was added in that PR. Atry_url_decodetest should be added once this issue is fixed.