fix: fix string to timestamp cast for UTC timestamps#3656
fix: fix string to timestamp cast for UTC timestamps#3656parthchandra merged 5 commits intoapache:mainfrom
Conversation
9365bfc to
2fb1632
Compare
add5ee4 to
0511026
Compare
5dc841d to
a712414
Compare
I'm planning a follow up PR with timezone handling and the additional formats. |
| DataTypes.TimestampType, | ||
| "Not all valid formats are supported") | ||
| test("cast StringType to TimestampType - UTC") { | ||
| withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> "UTC") { |
There was a problem hiding this comment.
cast from string to timestamp is marked as incompatible. Does it need to be enabled here by enabling the relevant config?
There was a problem hiding this comment.
I intend to do that in a separate PR after the timezone and other formats support is also merged.
| let patterns = &[ | ||
| ( | ||
| Regex::new(r"^\d{4,5}$").unwrap(), | ||
| Regex::new(r"^\d{4,7}$").unwrap(), |
There was a problem hiding this comment.
I already commented on this on the other PR, but are these regexes compiled on each invocation or are these static? I wasn't sure.
There was a problem hiding this comment.
You're right all these regexes will be compiled on each invocation. This is existing behaviour. However, let me try to address that.
|
Thanks @parthchandra. Could you run CometCastStringToTemporalBenchmark before and after these changes so we can see performance impact? |
…nput comet-test-apache-spark defaults spark.sql.ansi.enabled to true, causing CAST on intentionally invalid benchmark data to throw instead of returning NULL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Numbers are basically unchanged. |
1a6a87c to
32d1e4c
Compare
|
merged. thank you @andygrove |
Which issue does this PR close?
Part of #376
Rationale for this change
Part of support for spark 4.0
What changes are included in this PR?
Adds missing Ansi support for cast string to timestamp. Also adds a new error explicitly reporting invalid input for cast to timestamp. (previously we were reporting invalid numeric format).
Also enables the tests for UTC timestamps.
The cast is still marked invalid because some timestamp formats are still not supported. Also, timezone handling is not complete.
How are these changes tested?
Updated unit test