Allow multiple DML statements in a single WITH clause#36
Open
Allow multiple DML statements in a single WITH clause#36
Conversation
Lift the single-DML-per-WITH restriction for catalogs that opt in, initially enabling it for DuckCatalog (native storage). Key changes: - Add Catalog::SupportsMultipleDMLCTEs() virtual method (default false), override to true in DuckCatalog. External catalogs can opt in after individual testing. - Replace the hard dml_cte_count > 1 binder error with post-binding validation that checks each modified catalog supports multi-DML CTEs. - Fix LogicalInsert/LogicalDelete/LogicalUpdate::GetColumnBindings() to use the operator's actual table_index instead of hardcoded TableIndex(0) when return_chunk is false. The shared TableIndex(0) caused the projection_pullup optimizer's ColumnBindingReplacer to corrupt bindings across CTE boundaries. - Add comprehensive tests covering cross-table DML, RETURNING cross-references, same-table snapshot semantics, unreferenced CTEs, transaction rollback, prepared statements, and MATERIALIZED hint. Made-with: Cursor
c0788aa to
2e91159
Compare
Cross-CTE visibility on the same table differs between in-memory and persistent storage modes. Adjust tests to avoid asserting specific snapshot behavior for same-table operations, and rewrite the UPDATE feeding test to use different tables for the two DML CTEs. Made-with: Cursor
Fix CommonSubplanOptimizer incorrectly deduplicating subplans across a DML CTE boundary (e.g. two COUNT(i) scans on the same table merged into one materialized CTE, causing the outer SELECT to reuse the pre-INSERT result). Bail out of the optimizer entirely when a DML statement is present as a non-root node. Fix PhysicalCTE::BuildPipelines missing pipeline dependencies when the CTE body contains DML: child MetaPipelines spawned by the query side (e.g. the scan pipeline under an aggregate) were not ordered after the DML pipeline and could race with it. The outer SELECT now correctly sees rows inserted/updated/deleted by the DML CTE. Update and expand tests per reviewer requests: - Rewrite snapshot-semantics section with accurate comments reflecting actual DuckDB CTE-influence semantics (not accidental isolation). - Add same-table multi-DML cases: two deletes on same rows, disjoint update+delete on different rows (id%2), and insert+update on the same table. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The StatisticsPropagator uses table statistics captured at plan time. When a DML CTE (INSERT/UPDATE/DELETE) appears as a non-root node, those statistics reflect the pre-DML table state and are stale by the time the query side executes. This caused filters and scans to be incorrectly replaced with EMPTY_RESULT — e.g. SELECT * FROM t WHERE i=1 returned no rows even after an INSERT CTE populated the table. Fix: introduce PlanContainsNonRootDML() in optimizer.cpp and use it to skip both STATISTICS_PROPAGATION and COMMON_SUBPLAN when a DML CTE is present. This consolidates the guard that was previously duplicated in common_subplan_optimizer.cpp. Add a regression test for the statistics-elimination case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Made-with: Cursor
…assert, inlining Three related fixes for DML CTEs that operate on the same table: 1. physical_cte.cpp: Use recursive=true in GetMetaPipelines() so that grandchild scanner pipelines (e.g. the SEQ_SCAN feeding a sibling DELETE's sink) are captured in the before-set and forced to depend on the upstream DML pipeline. Without this, the DELETE scanner could race with INSERT and see empty local storage, silently skipping locally-inserted rows. 2. physical_delete.cpp: Replace D_ASSERT(deleted == delete_count) with conditional RETURNING collection. When two sibling DML CTEs delete overlapping rows, the second DELETE passes global row-id dedup but table.Delete() returns 0 because the rows were already gone. The assertion fired as SIGABRT; now we handle partial/zero deletes gracefully. 3. cte_inlining.cpp: Prevent inlining of any DML CTE (HasSideEffects check). Inlining removes the LOGICAL_MATERIALIZED_CTE node that guarantees the DML executes exactly once and before the query side reads the modified table. Test updates: correct expectations to reflect sequential execution (INSERT rows are now visible to subsequent UPDATE/DELETE in the same CTE), and add three new 3-CTE chain tests (INS→UPD→DEL, INS→UPD→UPD, INS→INS→DEL) on the same rows. Co-Authored-By: Claude Sonnet 3.7 <noreply@anthropic.com>
Revert the partial-delete workaround in PhysicalDelete::Sink — with pipeline dependencies enforcing definition order, later CTEs always see the effects of earlier ones, so duplicate deletes on the same rows no longer occur. Restore the original D_ASSERT(deleted == delete_count). Update DML CTE tests to verify sequential execution semantics: - Multiple UPDATEs on the same rows chain correctly (RETURNING reflects each CTE's state at execution time) - Multiple DELETEs on the same rows: first CTE deletes, second finds rows already gone and returns nothing - Add explanatory comment for CommonSubplanOptimizer dedup guard Made-with: Cursor
INSERT ON CONFLICT rewrites to LOGICAL_MERGE_INTO / MERGE_INTO. Include these in DML and side-effect detection so upsert CTEs are not inlined or dropped. Add regression tests for referenced and unreferenced upsert CTEs. Made-with: Cursor
Centralize recursive INSERT/UPDATE/DELETE/MERGE INTO detection on the logical plan. Reuse it in CTE inlining, optimizer non-root DML checks, and physical CTE planning (cte_body_is_dml) instead of duplicating walks over logical and physical operators. Made-with: Cursor
Made-with: Cursor
Avoid dereferencing a null child pointer when scanning for DML side effects. Made-with: Cursor
Wire query-side MetaPipelines after the DML CTE body using the same pattern as joins, instead of diffing meta pipeline snapshots. Extend AddRecursiveDependencies with an optional force flag so dependencies are always added for DML CTEs (correctness), while joins keep the thread-count heuristic by default. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Lift the single-DML-per-WITH restriction for catalogs that opt in, initially enabling it for DuckCatalog (native storage).
Key changes:
Made-with: Cursor