Extraction correctness fixes: QUALIFY scope, MERGE RETURNING, tuple SET, INSERT arity, GROUP BY ordinals#35
Merged
Merged
Conversation
QUALIFY filters on window / projection outputs, so it sees the SELECT's output aliases. It was bound against the FROM scope (in select_clause_reads), so an alias reference (`QUALIFY rn = 1` for `… AS rn`) resolved to a phantom base column `t.rn`. Bind it in bind_select against the output-aware clause_scope like HAVING — an alias binds Derived (dropped from reads, the dependency already counted at the window expr) while a real column / inline window key / subquery source still reads. QUALIFY is filter-position, so it adds reads but never lineage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Merge node carried no `returning`, and bind_merge ignored `merge.output`, so `MERGE … RETURNING s.b, t.c` / MSSQL `OUTPUT inserted.a, s.b` dropped its projected columns entirely (no reads, no lineage, no diagnostic). Add a `returning` field to the Merge plan node (like Insert/Update/Delete), bind the OUTPUT clause's select items over the target + source scope, and walk them for reads + `QueryOutput` lineage. MSSQL `OUTPUT … INTO <table>` (a secondary write) and the `inserted` / `deleted` pseudo-tables are not modelled yet. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A `SELECT * REPLACE (price * 1.1 AS price)` discarded its REPLACE clause entirely — the replacement expression's reads and lineage were lost. Each REPLACE element is a real value-producing output (the wildcard's same-named column with its value swapped), so bind it as a named projection output, like a standalone `expr AS col`: its reads / lineage are identical. The wildcard's own columns stay unexpanded (WildcardSuppressed), so the replacement's output position is best-effort. bind_select_item now returns Vec<NamedExpr> (a wildcard can yield several REPLACE outputs); the three call sites flat_map over it. Documented the behavior + position caveat in the crate Limitations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bind_delete ignored the MySQL `ORDER BY` / `LIMIT` tail, so `DELETE … ORDER BY priority LIMIT 5` dropped `priority` from reads. Bind both as filter-position reads (ORDER BY keys reference the target's columns → reads; a constant LIMIT adds none). bind_update gets the same treatment for its `LIMIT` (parity; no observable read since LIMIT is a row count, but no longer silently dropped). Audit of the other DML clauses: INSERT ON DUPLICATE / ON CONFLICT, the SET form, and MERGE RETURNING are already bound; INSERT PARTITION stays intentionally unextracted. UPDATE has no ORDER BY in this sqlparser version. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In value position, `a IN (list)` flowed `a` to the output (a boolean transformation), but `a IN (subquery)` flowed nothing — its origins were empty. Treat the InSubquery LHS as a value operand (a Transformation), like InList; the subquery's columns stay a membership test (reads, no origin). Fixed in both origins_of_expr and conflict_value_origins. Filter-position IN is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A tuple assignment target dropped its whole assignment (no writes, the RHS reads/lineage lost). Expand a tuple `SET (a, b) = …` to one Assignment per target column, paired positionally with the RHS: a row value `(e0, e1)` element-wise, a `(SELECT x, y)` subquery by output column. Generalize Expr::Subquery with an `output` index (0 for a scalar; the i-th for each tuple target), reusing trace_nth_output — so `(a,b)=(SELECT x,y)` flows a←x, b←y (and over a UNION, each branch's i-th output), with the shared subquery read once. Applies to UPDATE, MERGE WHEN UPDATE, and ON CONFLICT / ON DUPLICATE SET. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ural ones The normalizer's "every literal → ?" pass also rewrites JSON paths, CAST FORMAT strings, and TABLESAMPLE counts — consistent with how the same literals normalize as function arguments (JSON_EXTRACT path, TO_CHAR format) and with LIMIT/OFFSET. Document this neutrally so consumers know queries differing only in such a literal collapse together. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`generate_series(1, 10) AS g` parses as TableFactor::Table with args, which was bound as a real-table Scan — so the function name appeared in table reads and a ref through its alias (`g.value`) surfaced as a phantom base read `generate_series.value`. Bind it as an opaque table-producing factor (like UNNEST / TableFactor::Function): its args are reads, a ref through its alias is a synthetic source, and the name is not a real table. This matches the documented "table functions are opaque" contract and the UNNEST path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ources The InsertColumnsArityMismatch diagnostic only fired for an explicit column list against a SELECT source. Extend it: derive the source value count from a VALUES row width too, and check it for plain INSERT, MERGE WHEN-INSERT, and the column-less catalog-filled case. An explicit list is flagged on any mismatch (both directions are a hard error in real engines); a column-less target is flagged only when the source overflows it (a narrower source may fill the first N columns + defaults, e.g. PostgreSQL). New diagnose_insert_arity helper. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A positional ordinal key (`GROUP BY 1`, `ORDER BY 2`) bound to an empty literal, so it added no read — `GROUP BY 1` undercounted vs `GROUP BY a` (occurrence-based reads). Resolve a plain positive-integer key to the 1-based n-th query output and bind it by name, reusing clause-alias resolution: an identity output reads its column again, an introduced alias binds Derived (suppressed, the dependency is at the projection). Out-of-range / non-integer / unnamed positions bind as written. New ordinal_output_name helper. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #35 +/- ##
==========================================
+ Coverage 94.30% 94.87% +0.57%
==========================================
Files 27 27
Lines 5054 5214 +160
Branches 5054 5214 +160
==========================================
+ Hits 4766 4947 +181
+ Misses 212 192 -20
+ Partials 76 75 -1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Add coverage for constructs the suite didn't exercise: a function called with a subquery argument, a named-window `OVER w` reference with frame-bound expressions, Hive `CLUSTER BY` / `DISTRIBUTE BY`, and an `ON CONFLICT DO UPDATE` value that is a subquery. Add unit tests for `TableReference`'s public `TryFrom` conversions (`&ObjectName` / `&TableFactor` / `&Insert`, plus the derived-factor error path) and three-part-name `Display`. Lifts the weakest module (reference.rs 83% -> 95% region, functions -> 100%); total region 94.2% -> 94.7%, line 95.6% -> 96.1%. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merged
takaebato
added a commit
that referenced
this pull request
Jun 28, 2026
## 🤖 New release
* `sql-insight`: 0.2.0 -> 0.3.0 (⚠ API breaking changes)
* `sql-insight-cli`: 0.2.0 -> 0.2.1
### ⚠ `sql-insight` breaking changes
```text
--- failure constructible_struct_adds_field: externally-constructible struct adds field ---
Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/constructible_struct_adds_field.ron
Failed in:
field CrudTables.diagnostics in /tmp/.tmpY44O3z/sql-insight/sql-insight/src/extractor/crud_table_extractor.rs:83
field NormalizerOptions.alphabetize_insert_columns in /tmp/.tmpY44O3z/sql-insight/sql-insight/src/normalizer.rs:94
--- failure function_missing: pub fn removed or renamed ---
Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/function_missing.ron
Failed in:
function sql_insight::extractor::crud_table_extractor::extract_crud_tables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/crud_table_extractor.rs:28
function sql_insight::crud_table_extractor::extract_crud_tables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/crud_table_extractor.rs:28
function sql_insight::extract_crud_tables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/crud_table_extractor.rs:28
function sql_insight::extractor::table_extractor::extract_tables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:27
function sql_insight::table_extractor::extract_tables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:27
function sql_insight::extractor::extract_tables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:27
function sql_insight::extract_tables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:27
function sql_insight::format, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/formatter.rs:21
function sql_insight::normalize_with_options, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/normalizer.rs:43
function sql_insight::normalize, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/normalizer.rs:26
--- failure inherent_method_missing: pub method removed or renamed ---
Description:
A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/inherent_method_missing.ron
Failed in:
TableReference::has_alias, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:46
TableReference::has_qualifiers, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:49
--- failure method_parameter_count_changed: pub method parameter count changed ---
Description:
A publicly-visible method now takes a different number of parameters, not counting the receiver (self) parameter.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/method_parameter_count_changed.ron
Failed in:
sql_insight::formatter::Formatter::format takes 2 parameters in /tmp/.tmpsfoEJJ/sql-insight/src/formatter.rs:31, but now takes 3 parameters in /tmp/.tmpY44O3z/sql-insight/sql-insight/src/formatter.rs:93
--- failure module_missing: pub module removed or renamed ---
Description:
A publicly-visible module cannot be imported by its prior path. A `pub use` may have been removed, or the module may have been renamed, removed, or made non-public.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/module_missing.ron
Failed in:
mod sql_insight::extractor::crud_table_extractor, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/crud_table_extractor.rs:1
mod sql_insight::crud_table_extractor, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/crud_table_extractor.rs:1
mod sql_insight::extractor::table_extractor, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:1
mod sql_insight::table_extractor, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:1
mod sql_insight::extractor::helper, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/helper.rs:1
mod sql_insight::helper, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/helper.rs:1
--- failure struct_missing: pub struct removed or renamed ---
Description:
A publicly-visible struct cannot be imported by its prior path. A `pub use` may have been removed, or the struct itself may have been renamed or removed entirely.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/struct_missing.ron
Failed in:
struct sql_insight::extractor::table_extractor::TableExtractor, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:156
struct sql_insight::table_extractor::TableExtractor, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:156
struct sql_insight::extractor::TableExtractor, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:156
struct sql_insight::TableExtractor, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:156
struct sql_insight::extractor::table_extractor::TableReference, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:38
struct sql_insight::table_extractor::TableReference, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:38
struct sql_insight::extractor::TableReference, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:38
struct sql_insight::extractor::crud_table_extractor::CrudTables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/crud_table_extractor.rs:37
struct sql_insight::crud_table_extractor::CrudTables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/crud_table_extractor.rs:37
struct sql_insight::CrudTables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/crud_table_extractor.rs:37
struct sql_insight::extractor::table_extractor::Tables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:140
struct sql_insight::table_extractor::Tables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:140
struct sql_insight::extractor::Tables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:140
struct sql_insight::Tables, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:140
struct sql_insight::Normalizer, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/normalizer.rs:80
struct sql_insight::Formatter, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/formatter.rs:27
struct sql_insight::NormalizerOptions, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/normalizer.rs:53
struct sql_insight::extractor::crud_table_extractor::CrudTableExtractor, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/crud_table_extractor.rs:70
struct sql_insight::crud_table_extractor::CrudTableExtractor, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/crud_table_extractor.rs:70
struct sql_insight::CrudTableExtractor, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/crud_table_extractor.rs:70
--- failure struct_pub_field_missing: pub struct's pub field removed or renamed ---
Description:
A publicly-visible struct has at least one public field that is no longer available under its prior name. It may have been renamed or removed entirely.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/struct_pub_field_missing.ron
Failed in:
field alias of struct TableReference, previously in file /tmp/.tmpsfoEJJ/sql-insight/src/extractor/table_extractor.rs:42
```
<details><summary><i><b>Changelog</b></i></summary><p>
## `sql-insight`
<blockquote>
##
[0.3.0](v0.2.0...v0.3.0)
- 2026-06-28
### Added
- bucket REPLACE / INSERT OVERWRITE as Create + Delete in CRUD by
@takaebato in #36
- add Catalog::from_ddl_with_casing for casing-override alignment by
@takaebato in #36
- fan in NATURAL JOIN merge columns (catalog-aware) by @takaebato in #36
- count GROUP BY / ORDER BY positional ordinals as reads by @takaebato
in #35
- flag INSERT/MERGE arity mismatches for VALUES and column-less sources
by @takaebato in #35
- carry catalog resolution on written columns and lineage targets by
@takaebato in #34
- read the LIKE / CLONE shape source; CLONE feeds lineage by @takaebato
in #34
- carry catalog resolution on write targets (TableWrite) by @takaebato
in #34
### Fixed
- correct extraction edge cases in CTEs, DML targets, lineage, and
catalog ([#37](#37)) by
@takaebato in #37
- don't duplicate a recursive CTE's anchor diagnostics by @takaebato in
#36
- flag a non-table UPDATE target instead of dropping it silently by
@takaebato in #36
- normalize a TOP constant quantity by @takaebato in #36
- flow the WITHIN GROUP value of an ordered-set aggregate by @takaebato
in #36
- treat a parameterized FROM function as an opaque table factor by
@takaebato in #35
- extract tuple SET (a, b) = … assignments by @takaebato in #35
- flow the LHS of IN (subquery) in value position by @takaebato in #35
- bind DELETE/UPDATE ORDER BY and LIMIT clauses by @takaebato in #35
- extract wildcard REPLACE (expr AS col) outputs by @takaebato in #35
- extract MERGE RETURNING / OUTPUT columns by @takaebato in #35
- bind QUALIFY against the post-projection scope by @takaebato in #35
- model DML reads as a data-flow source/sink split by @takaebato in #34
- attribute a multi-table UPDATE's SET targets to their own tables by
@takaebato in #34
### Documentation
- note that normalization replaces all literals, including structural
ones by @takaebato in #35
### Other Changes
- add PR-title lint and release-plz release automation
([#40](#40)) by @takaebato
in #40
- cover untested SQL constructs and public reference conversions by
@takaebato in #35
- remove the redundant flat table extractor by @takaebato in #34
- migrate extractor/mod.rs to extractor.rs
([#25](#25)) by @takaebato
in #25
</blockquote>
## `sql-insight-cli`
<blockquote>
##
[0.2.1](sql-insight-cli-v0.2.0...sql-insight-cli-v0.2.1)
- 2026-06-28
### Added
- *(cli)* add --pretty option to format command
([#44](#44)) by @takaebato
in #44
- carry catalog resolution on written columns and lineage targets by
@takaebato in #34
- *(cli)* show catalog resolution on write targets by @takaebato in #34
- carry catalog resolution on write targets (TableWrite) by @takaebato
in #34
### Fixed
- correct extraction edge cases in CTEs, DML targets, lineage, and
catalog ([#37](#37)) by
@takaebato in #37
### Other Changes
- add PR-title lint and release-plz release automation
([#40](#40)) by @takaebato
in #40
- remove the redundant flat table extractor by @takaebato in #34
</blockquote>
</p></details>
---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Takahiro Ebato <takahiro.ebato@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A batch of extraction-correctness fixes, each its own commit with regression tests. The full suite stays green (669) and the CI quartet (fmt / clippy
-D warnings/ test / doc-Dwarnings) is clean throughout.Correctness fixes (wrong / missing output)
QUALIFYwas bound against the FROM scope, so an output-alias reference (… AS rn … QUALIFY rn = 1) surfaced a phantom base readt.rn. Now bound against the post-projection scope likeHAVING: an alias bindsDerived(dropped), while a real column / inline-window key / subquery source still reads.MERGE … RETURNING s.b, t.c(and MSSQLOUTPUT) discarded its projected columns. TheMergenode gained areturning; its select items are walked for reads +QueryOutputlineage like the other DML roots. (OUTPUT … INTO <table>and theinserted/deletedpseudo-tables remain out of scope.)REPLACE (expr AS col)dropped — the replacement expression's reads / lineage were lost. Each replacement now binds as a named projection output (identical to a standaloneexpr AS col);bind_select_itemreturnsVec<NamedExpr>. The*stays unexpanded, so the output position is best-effort (documented).DELETE … ORDER BY priority LIMIT 5tail droppedpriority. ORDER BY keys + LIMIT now bind as filter-position reads (a constant LIMIT adds none; bound for parity).IN (subquery)LHS lineage asymmetry — in value positiona IN (list)flowedabuta IN (subquery)flowed nothing. The InSubquery LHS now flows as a value operand (a transformation), symmetric with InList; the subquery stays a filter.SET (a, b) = …dropped — a tuple assignment target dropped the whole assignment. It now expands to one assignment per target column: a row value pairs element-wise, a(SELECT x, y)subquery by output column (Expr::Subquerygained anoutputindex, reusing the positional output trace — UNION-safe), with the shared subquery read once.generate_series(1, 10) AS gwas scanned as a real table, surfacing a phantomgenerate_series.valuebase read. Now bound as an opaque table-producing factor (likeUNNEST), matching the documented "table functions are opaque" contract.Diagnostics / occurrence consistency
InsertColumnsArityMismatchonly fired for an explicit column list against a SELECT source. Extended to VALUES rows, MERGE WHEN-INSERT, and the column-less catalog-overflow case (explicit lists flag either direction; column-less only on overflow, since a narrower source may fill the first N columns + defaults). Newdiagnose_insert_arityhelper.GROUP BY 1/ORDER BY 1ordinals — a positional ordinal bound to an empty literal, undercounting reads vs the named form. Now resolved to the n-th output and bound by name (identity → read, introduced alias →Derived), keeping occurrence-based reads consistent.Investigated, no change (consistent behavior)
?", and it already rewrites the same literals as function args (JSON_EXTRACTpath,TO_CHARformat) and asLIMIT/OFFSET. Rewriting a JSON path / CAST FORMAT / TABLESAMPLE count is consistent, not a bug. Documented the behavior neutrally; no code change.EXCLUDEDunder aSensitivecasing override —EXCLUDEDis parsed as a plain identifier (not a keyword) and is PostgreSQL/SQLite-specific. Under every built-in dialect the standard alias fold already resolves it; it only degraded under an explicitwith_casing(Sensitive)override, whereEXCLUDED— an internal pseudo-table name matched like any identifier — behaving case-sensitively is consistent with that override. No change.🤖 Generated with Claude Code