Skip to content

Allow multiple DML statements in a single WITH clause#36

Open
krleonid wants to merge 13 commits intomainfrom
feature/multi-dml-cte-support
Open

Allow multiple DML statements in a single WITH clause#36
krleonid wants to merge 13 commits intomainfrom
feature/multi-dml-cte-support

Conversation

@krleonid
Copy link
Copy Markdown
Owner

@krleonid krleonid commented Apr 9, 2026

Lift the single-DML-per-WITH restriction for catalogs that opt in, initially enabling it for DuckCatalog (native storage).

Key changes:

  • Add Catalog::SupportsMultipleDMLCTEs() virtual method (default false), override to true in DuckCatalog. External catalogs can opt in after individual testing.
  • Replace the hard dml_cte_count > 1 binder error with post-binding validation that checks each modified catalog supports multi-DML CTEs.
  • Fix LogicalInsert/LogicalDelete/LogicalUpdate::GetColumnBindings() to use the operator's actual table_index instead of hardcoded TableIndex(0) when return_chunk is false. The shared TableIndex(0) caused the projection_pullup optimizer's ColumnBindingReplacer to corrupt bindings across CTE boundaries.
  • Add comprehensive tests covering cross-table DML, RETURNING cross-references, same-table snapshot semantics, unreferenced CTEs, transaction rollback, prepared statements, and MATERIALIZED hint.

Made-with: Cursor

Lift the single-DML-per-WITH restriction for catalogs that opt in,
initially enabling it for DuckCatalog (native storage).

Key changes:
- Add Catalog::SupportsMultipleDMLCTEs() virtual method (default false),
  override to true in DuckCatalog. External catalogs can opt in after
  individual testing.
- Replace the hard dml_cte_count > 1 binder error with post-binding
  validation that checks each modified catalog supports multi-DML CTEs.
- Fix LogicalInsert/LogicalDelete/LogicalUpdate::GetColumnBindings() to
  use the operator's actual table_index instead of hardcoded TableIndex(0)
  when return_chunk is false. The shared TableIndex(0) caused the
  projection_pullup optimizer's ColumnBindingReplacer to corrupt bindings
  across CTE boundaries.
- Add comprehensive tests covering cross-table DML, RETURNING
  cross-references, same-table snapshot semantics, unreferenced CTEs,
  transaction rollback, prepared statements, and MATERIALIZED hint.

Made-with: Cursor
@krleonid krleonid force-pushed the feature/multi-dml-cte-support branch from c0788aa to 2e91159 Compare April 9, 2026 18:29
krleonid and others added 12 commits April 9, 2026 23:27
Cross-CTE visibility on the same table differs between in-memory and
persistent storage modes. Adjust tests to avoid asserting specific
snapshot behavior for same-table operations, and rewrite the UPDATE
feeding test to use different tables for the two DML CTEs.

Made-with: Cursor
Fix CommonSubplanOptimizer incorrectly deduplicating subplans across a
DML CTE boundary (e.g. two COUNT(i) scans on the same table merged into
one materialized CTE, causing the outer SELECT to reuse the pre-INSERT
result).  Bail out of the optimizer entirely when a DML statement is
present as a non-root node.

Fix PhysicalCTE::BuildPipelines missing pipeline dependencies when the
CTE body contains DML: child MetaPipelines spawned by the query side
(e.g. the scan pipeline under an aggregate) were not ordered after the
DML pipeline and could race with it.  The outer SELECT now correctly
sees rows inserted/updated/deleted by the DML CTE.

Update and expand tests per reviewer requests:
- Rewrite snapshot-semantics section with accurate comments reflecting
  actual DuckDB CTE-influence semantics (not accidental isolation).
- Add same-table multi-DML cases: two deletes on same rows, disjoint
  update+delete on different rows (id%2), and insert+update on the
  same table.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The StatisticsPropagator uses table statistics captured at plan time.
When a DML CTE (INSERT/UPDATE/DELETE) appears as a non-root node, those
statistics reflect the pre-DML table state and are stale by the time the
query side executes. This caused filters and scans to be incorrectly
replaced with EMPTY_RESULT — e.g. SELECT * FROM t WHERE i=1 returned no
rows even after an INSERT CTE populated the table.

Fix: introduce PlanContainsNonRootDML() in optimizer.cpp and use it to
skip both STATISTICS_PROPAGATION and COMMON_SUBPLAN when a DML CTE is
present. This consolidates the guard that was previously duplicated in
common_subplan_optimizer.cpp.

Add a regression test for the statistics-elimination case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…assert, inlining

Three related fixes for DML CTEs that operate on the same table:

1. physical_cte.cpp: Use recursive=true in GetMetaPipelines() so that grandchild
   scanner pipelines (e.g. the SEQ_SCAN feeding a sibling DELETE's sink) are
   captured in the before-set and forced to depend on the upstream DML pipeline.
   Without this, the DELETE scanner could race with INSERT and see empty local
   storage, silently skipping locally-inserted rows.

2. physical_delete.cpp: Replace D_ASSERT(deleted == delete_count) with
   conditional RETURNING collection. When two sibling DML CTEs delete overlapping
   rows, the second DELETE passes global row-id dedup but table.Delete() returns 0
   because the rows were already gone. The assertion fired as SIGABRT; now we
   handle partial/zero deletes gracefully.

3. cte_inlining.cpp: Prevent inlining of any DML CTE (HasSideEffects check).
   Inlining removes the LOGICAL_MATERIALIZED_CTE node that guarantees the DML
   executes exactly once and before the query side reads the modified table.

Test updates: correct expectations to reflect sequential execution (INSERT rows
are now visible to subsequent UPDATE/DELETE in the same CTE), and add three new
3-CTE chain tests (INS→UPD→DEL, INS→UPD→UPD, INS→INS→DEL) on the same rows.

Co-Authored-By: Claude Sonnet 3.7 <noreply@anthropic.com>
Revert the partial-delete workaround in PhysicalDelete::Sink — with
pipeline dependencies enforcing definition order, later CTEs always
see the effects of earlier ones, so duplicate deletes on the same rows
no longer occur. Restore the original D_ASSERT(deleted == delete_count).

Update DML CTE tests to verify sequential execution semantics:
- Multiple UPDATEs on the same rows chain correctly (RETURNING reflects
  each CTE's state at execution time)
- Multiple DELETEs on the same rows: first CTE deletes, second finds
  rows already gone and returns nothing
- Add explanatory comment for CommonSubplanOptimizer dedup guard

Made-with: Cursor
INSERT ON CONFLICT rewrites to LOGICAL_MERGE_INTO / MERGE_INTO. Include
these in DML and side-effect detection so upsert CTEs are not inlined or
dropped. Add regression tests for referenced and unreferenced upsert CTEs.

Made-with: Cursor
Centralize recursive INSERT/UPDATE/DELETE/MERGE INTO detection on the logical
plan. Reuse it in CTE inlining, optimizer non-root DML checks, and physical
CTE planning (cte_body_is_dml) instead of duplicating walks over logical and
physical operators.

Made-with: Cursor
Avoid dereferencing a null child pointer when scanning for DML side effects.

Made-with: Cursor
Wire query-side MetaPipelines after the DML CTE body using the same pattern as
joins, instead of diffing meta pipeline snapshots. Extend AddRecursiveDependencies
with an optional force flag so dependencies are always added for DML CTEs
(correctness), while joins keep the thread-count heuristic by default.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant