Skip to content

feat: add SQL dialect support with Spark SQL dialect#150

Merged
ChunxuTang merged 6 commits intolance-format:mainfrom
yuchen-pipi:feat/spark-sql-dialect
Mar 25, 2026
Merged

feat: add SQL dialect support with Spark SQL dialect#150
ChunxuTang merged 6 commits intolance-format:mainfrom
yuchen-pipi:feat/spark-sql-dialect

Conversation

@yuchen-pipi
Copy link
Copy Markdown
Contributor

Summary

  • Add SqlDialect enum (Default, Spark, PostgreSql, MySql, Sqlite) and SparkDialect implementation using DataFusion's unparser Dialect trait
  • Refactor to_sql() to accept an optional dialect parameter instead of a separate method per dialect
  • Add Python API support: query.to_sql(datasets, dialect="spark")

Spark SQL dialect differences

  • Backtick identifier quoting
  • STRING type instead of VARCHAR
  • EXTRACT(field FROM expr) for date parts
  • LENGTH() instead of CHARACTER_LENGTH()
  • TIMESTAMP without timezone info
  • Subqueries in FROM require aliases

Usage

Rust:

use lance_graph::{CypherQuery, SqlDialect};

let sql = query.to_sql(datasets, Some(SqlDialect::Spark)).await?;

Python:

sql = query.to_sql(datasets, dialect="spark")

Test plan

  • 7 new Spark SQL integration tests (backtick quoting, filters, relationships, complex queries, dialect comparison, PostgreSQL dialect)
  • 5 unit tests for SparkDialect trait implementation
  • 12 existing to_sql tests updated and passing

🤖 Generated with Claude Code

Add SqlDialect enum and SparkDialect implementation using DataFusion's
unparser Dialect trait. The to_sql() method now accepts an optional
dialect parameter to generate dialect-specific SQL from Cypher queries.

Supported dialects: Default, Spark, PostgreSQL, MySQL, SQLite.

Spark SQL differences: backtick quoting, STRING type, EXTRACT for date
parts, LENGTH instead of CHARACTER_LENGTH, required subquery aliases.

Python API updated: to_sql(datasets, dialect="spark")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation for adding SQL dialects makes sense to me.
Left some comments on the design of the new feature.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 8, 2026

Codecov Report

❌ Patch coverage is 91.66667% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/lance-graph/src/query.rs 74.07% 7 Missing ⚠️

📢 Thoughts on this report? Let us know!

- Use CustomDialectBuilder for Spark dialect instead of manual Dialect impl
- Move SqlDialect enum from spark_dialect.rs to query.rs as general-purpose type
- Expose SqlDialect as a Python enum instead of error-prone string parameter

Co-authored-by: Isaac
Copy link
Copy Markdown
Collaborator

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuchen-pipi
LGTM! Just left a very minor comment.
Meanwhile, could you fix the style lint error?

Move DialectUnparser and SqlDialect::unparser() from spark_dialect.rs
to query.rs to co-locate with the SqlDialect enum. Box the Spark
CustomDialect variant to fix clippy::large_enum_variant lint.

Co-authored-by: Isaac
Yu Chen added 3 commits March 23, 2026 22:57
…translation

Adds a CLI tool that translates Cypher queries into SQL for various
dialects (default, spark, postgresql, mysql, sqlite) using a JSON
config file that describes the graph schema.

Co-authored-by: Isaac
Co-authored-by: Isaac
@jja725
Copy link
Copy Markdown
Contributor

jja725 commented Mar 24, 2026

@ChunxuTang The workflow passed, do you mind helping merge it since we don't have this button on our end.

@ChunxuTang ChunxuTang merged commit b4b3be5 into lance-format:main Mar 25, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants