Skip to content

fix: fix string to timestamp cast for UTC timestamps#3656

Merged
parthchandra merged 5 commits intoapache:mainfrom
parthchandra:cast-timestamp
Mar 23, 2026
Merged

fix: fix string to timestamp cast for UTC timestamps#3656
parthchandra merged 5 commits intoapache:mainfrom
parthchandra:cast-timestamp

Conversation

@parthchandra
Copy link
Copy Markdown
Contributor

@parthchandra parthchandra commented Mar 10, 2026

Which issue does this PR close?

Part of #376

Rationale for this change

Part of support for spark 4.0

What changes are included in this PR?

Adds missing Ansi support for cast string to timestamp. Also adds a new error explicitly reporting invalid input for cast to timestamp. (previously we were reporting invalid numeric format).
Also enables the tests for UTC timestamps.
The cast is still marked invalid because some timestamp formats are still not supported. Also, timezone handling is not complete.

How are these changes tested?

Updated unit test

@parthchandra parthchandra marked this pull request as draft March 10, 2026 21:03
@parthchandra parthchandra changed the title fix: fix error reporting for string to timestamp cast fix: fix string to timestamp cast for UTC timestamps Mar 12, 2026
@parthchandra parthchandra force-pushed the cast-timestamp branch 2 times, most recently from 5dc841d to a712414 Compare March 17, 2026 21:14
@parthchandra parthchandra requested a review from andygrove March 17, 2026 21:45
@parthchandra parthchandra marked this pull request as ready for review March 17, 2026 21:45
@parthchandra
Copy link
Copy Markdown
Contributor Author

The cast is still marked invalid because some timestamp formats are still not supported. Also, timezone handling is not complete.

I'm planning a follow up PR with timezone handling and the additional formats.

DataTypes.TimestampType,
"Not all valid formats are supported")
test("cast StringType to TimestampType - UTC") {
withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> "UTC") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast from string to timestamp is marked as incompatible. Does it need to be enabled here by enabling the relevant config?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend to do that in a separate PR after the timezone and other formats support is also merged.

let patterns = &[
(
Regex::new(r"^\d{4,5}$").unwrap(),
Regex::new(r"^\d{4,7}$").unwrap(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already commented on this on the other PR, but are these regexes compiled on each invocation or are these static? I wasn't sure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right all these regexes will be compiled on each invocation. This is existing behaviour. However, let me try to address that.

@andygrove
Copy link
Copy Markdown
Member

Thanks @parthchandra. Could you run CometCastStringToTemporalBenchmark before and after these changes so we can see performance impact?

…nput

comet-test-apache-spark defaults spark.sql.ansi.enabled to true, causing
CAST on intentionally invalid benchmark data to throw instead of returning NULL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@parthchandra
Copy link
Copy Markdown
Contributor Author

Thanks @parthchandra. Could you run CometCastStringToTemporalBenchmark before and after these changes so we can see performance impact?

Benchmark Baseline (ms) PR3656 (ms) Change
Cast String to Date 71 70 1% faster
Try_Cast String to Date 70 68 3% faster
Cast String to Timestamp 161 164 2% slower
Try_Cast String to Timestamp 162 163 1% slower

Numbers are basically unchanged.

Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @parthchandra

@parthchandra parthchandra merged commit 4d2c398 into apache:main Mar 23, 2026
61 checks passed
@parthchandra
Copy link
Copy Markdown
Contributor Author

merged. thank you @andygrove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants