Skip to content

Fix backslash escaping in Hive string literals for Trino conversion#583

Open
zheliu2 wants to merge 2 commits intomasterfrom
zheliu/fix-hive-backslash-escaping
Open

Fix backslash escaping in Hive string literals for Trino conversion#583
zheliu2 wants to merge 2 commits intomasterfrom
zheliu/fix-hive-backslash-escaping

Conversation

@zheliu2
Copy link
Contributor

@zheliu2 zheliu2 commented Mar 8, 2026

Summary

  • Fixes Escape characters are not handled correctly in regular expressions while convert Hive SQL to Trino SQL #305: Escape characters not handled correctly in regular expressions during Hive-to-Trino conversion
  • Root cause: The Hive parser's string literal handler only unescaped \' and \" but not \\. In Hive SQL, \\ is an escape sequence for a single backslash. Trino SQL does not use backslash escaping, so the preserved double backslash caused incorrect regex patterns in REGEXP_LIKE calls.
  • Example: Hive '\\d{4}' (string value \d{4}) was output as Trino '\\d{4}' (string value \\d{4} — wrong). Now correctly outputs '\d{4}'.
  • Renamed removeBackslashBeforeQuotes to unescapeHiveStringLiteral and extended the regex to handle \\\

Testing Done

  • Unit test added in HiveToTrinoConverterTest.testRlikeBackslashEscaping verifying date regex pattern conversion
  • Full ./gradlew clean build passes (547 tasks, all tests green including existing backslash-related tests)

In Hive SQL, backslash is an escape character in string literals:
'\\d' represents the string \d (single backslash + d). Previously,
Coral only unescaped \' and \" but not \\, causing double backslashes
to be preserved in the internal representation. When outputting to
Trino SQL (where backslash has no special meaning), the extra
backslash produced incorrect regex patterns in REGEXP_LIKE calls.

Renamed removeBackslashBeforeQuotes to unescapeHiveStringLiteral and
extended it to also handle \\ -> \ escape sequences.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zheliu2 zheliu2 marked this pull request as ready for review March 8, 2026 22:51
- testRlikeBackslashEscapingWithColumn: column reference with regex
- testRegexpBackslashEscaping: REGEXP synonym with \w pattern
- testStringLiteralWithEscapedBackslash: general string literal escaping
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Escape characters are not handled correctly in regular expressions while convert Hive SQL to Trino SQL

1 participant