Fix backslash escaping in Hive string literals for Trino conversion#583
Open
Fix backslash escaping in Hive string literals for Trino conversion#583
Conversation
In Hive SQL, backslash is an escape character in string literals: '\\d' represents the string \d (single backslash + d). Previously, Coral only unescaped \' and \" but not \\, causing double backslashes to be preserved in the internal representation. When outputting to Trino SQL (where backslash has no special meaning), the extra backslash produced incorrect regex patterns in REGEXP_LIKE calls. Renamed removeBackslashBeforeQuotes to unescapeHiveStringLiteral and extended it to also handle \\ -> \ escape sequences. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- testRlikeBackslashEscapingWithColumn: column reference with regex - testRegexpBackslashEscaping: REGEXP synonym with \w pattern - testStringLiteralWithEscapedBackslash: general string literal escaping
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
\'and\"but not\\. In Hive SQL,\\is an escape sequence for a single backslash. Trino SQL does not use backslash escaping, so the preserved double backslash caused incorrect regex patterns inREGEXP_LIKEcalls.'\\d{4}'(string value\d{4}) was output as Trino'\\d{4}'(string value\\d{4}— wrong). Now correctly outputs'\d{4}'.removeBackslashBeforeQuotestounescapeHiveStringLiteraland extended the regex to handle\\→\Testing Done
HiveToTrinoConverterTest.testRlikeBackslashEscapingverifying date regex pattern conversion./gradlew clean buildpasses (547 tasks, all tests green including existing backslash-related tests)