feat(query): local block_id repartition before RowFetch in MERGE INTO by dantengsky · Pull Request #19689 · databendlabs/databend

dantengsky · 2026-04-09T08:00:08Z

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Problem

During join-based mutations (MERGE INTO, UPDATE...WHERE with subqueries) with lazy columns, the TransformRowsFetcher reads target table blocks by _row_id. Since Hash Join output order follows the probe side, rows from the same target block scatter across different RowFetch processors or batches. Each batch flushes its block data, causing the same physical block to be read from storage multiple times — a significant I/O overhead when the target table has many columns and the matched set is large.

In distributed mode, a cross-node shuffle by block_id already exists (build_block_id_shuffle_exchange), but within a single node no such grouping is performed.

Solution

Add a local pipeline.exchange() by block_id (extracted from _row_id) before RowFetch, reusing the existing PartitionProcessor / MergePartitionProcessor infrastructure.

After repartition, each RowFetch processor handles a disjoint set of blocks, which eliminates cross-processor duplicate block reads. Within a single processor, same-block rows tend to arrive consecutively, greatly reducing cross-batch duplicates as well — though a residual duplicate read can still occur if a block's rows are split across a BlockThreshold flush boundary.

The existing per-batch deduplication inside ParquetRowsFetcher::fetch() (which groups row_ids by block_id within a single flush) remains effective. This PR complements it by ensuring the rows reaching each processor are block-local in the first place.

Pipeline changes

MixedMatched (WHEN MATCHED + WHEN NOT MATCHED):

                    Before                                      After
                    ──────                                      ─────
              HashJoin output                             HashJoin output
                     │                                           │
             try_resize(N)                          Exchange(block_id, N)  ← NEW
                     │                                           │
               MutationSplit                               MutationSplit
              ╱             ╲                              ╱             ╲
     [even ports]     [odd ports]                [even ports]     [odd ports]
      matched          unmatched                  matched          unmatched
         │                 │                         │                 │
      RowFetch          dummy                     RowFetch          dummy
         │                 │                         │                 │
         └────────┬────────┘                         └────────┬────────┘
        MutationManipulate                          MutationManipulate

Exchange replaces try_resize(N) — same output port count, but data is partitioned by block_id.

MatchedOnly (MERGE INTO with only WHEN MATCHED, or UPDATE...WHERE with subquery):

                    Before                                      After
                    ──────                                      ─────
              HashJoin output                             HashJoin output
                     │                                           │
                  RowFetch                          Exchange(block_id, N)  ← NEW
                     │                                           │
            MutationManipulate                              RowFetch
                                                                 │
                                                        MutationManipulate

NotMatchedOnly: no RowFetch, no change.

New setting: `enable_mutation_block_id_repartition`


Default	`1` (on)
Scope	Session / Global
Purpose	Controls whether to insert a local block_id repartition before RowFetch in join-based mutation pipelines

When to disable: If matched rows are heavily concentrated in a few large blocks, the repartition may cause load skew (one processor handles most of the work). In this case, SET enable_mutation_block_id_repartition = 0 falls back to the original try_resize round-robin distribution, which provides better load balancing at the cost of more duplicate block reads.

Affected paths:

MERGE INTO with lazy columns (MixedMatched and MatchedOnly strategies)
UPDATE...WHERE with subqueries and lazy columns (MatchedOnly strategy)

Not affected:

Plain UPDATE/DELETE without subqueries (MutationStrategy::Direct, returns early before RowFetch)
DELETE with subqueries (lazy materialization is always skipped for deletes — only _row_id is needed)
SELECT+LIMIT RowFetch (exchange would destroy sort order; enable_block_id_repartition flag is false)
Single-threaded execution (max_threads = 1, guarded)

Key changes

BlockIdPartitionExchange: a new Exchange implementation that partitions rows by the block_id prefix of _row_id. For nullable row_ids (unmatched rows in MixedMatched), an incrementing counter spreads them evenly across partitions.
For MixedMatched, the exchange is inserted in MutationSplit::build_pipeline2 before the matched/unmatched split. Guarded by has_row_fetch flag so non-lazy workloads are not affected.
For MatchedOnly, the exchange is inserted in RowFetch::build_pipeline2. An enable_block_id_repartition flag on RowFetch controls this.

Performance impact

Reduces storage I/O in the RowFetch stage by avoiding redundant block reads across processors. The trade-off is that load balancing now depends on how matched rows are distributed across blocks (previously try_resize distributed rows evenly via round-robin). The overhead is one local scatter + merge per pipeline, operating on thin data (only join output columns, before lazy columns are materialized). Guarded by max_threads > 1 so single-threaded execution has zero overhead.

Tests

Unit Test
Logic Test
Benchmark Test
No Test - Explain why

New sqllogictest 09_0051_merge_into_block_id_repartition.test covers:

MixedMatched UPDATE (repartition ON)
MixedMatched UPDATE (repartition OFF, identical results)
MatchedOnly UPDATE
NotMatchedOnly INSERT (no-op)
MixedMatched DELETE
UPDATE...WHERE (subquery) with lazy columns

All existing merge_into tests pass without regression.

Type of change

Bug Fix (non-breaking change which fixes an issue)
New Feature (non-breaking change which adds functionality)
Breaking Change (fix or feature that could cause existing functionality not to work as expected)
Documentation Update
Refactoring
Performance Improvement
Other (please describe):

This change is

…INTO During MERGE INTO with lazy columns, the RowFetch stage may repeatedly read the same physical block when rows from that block are scattered across different processors or batches. This adds a local pipeline exchange that partitions data by block_id (extracted from _row_id) before RowFetch, ensuring each processor handles a disjoint set of blocks and eliminating duplicate block reads. - Add BlockIdPartitionExchange implementing the Exchange trait - Insert exchange before MutationSplit for MixedMatched strategy - Insert exchange before RowFetch for MatchedOnly strategy - Add is_mutation flag to RowFetch to avoid affecting SELECT+LIMIT path - Add enable_merge_into_block_id_repartition setting (default on)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7deef3342

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/query/storages/fuse/src/operations/merge_into/processors/block_id_partition_exchange.rs

…xchange Avoids truncation when max_threads exceeds 255.

dantengsky · 2026-04-09T10:22:09Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e311156f1d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/query/service/src/physical_plans/physical_mutation_into_split.rs

…partition indices - Add has_row_fetch flag to MutationSplit so the exchange is only inserted when RowFetch follows (lazy columns exist). Without this, non-lazy workloads pay shuffle cost for no I/O benefit. - Use u16 instead of u8 for partition indices to avoid truncation when max_threads exceeds 255.

dantengsky · 2026-04-10T01:03:06Z

@codex review

chatgpt-codex-connector · 2026-04-10T01:12:27Z

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

dantengsky · 2026-04-10T07:05:50Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fddfa19d9c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/query/service/src/physical_plans/physical_mutation.rs

dantengsky

👍

…test - Rename setting: enable_merge_into_block_id_repartition → enable_mutation_block_id_repartition - Rename RowFetch flag: is_merge_into → enable_block_id_repartition to accurately reflect scope (MERGE INTO + UPDATE...WHERE subquery) - Remove unnecessary serde(default) annotations - Improve comments: explain why SELECT+LIMIT skips repartition, use "reducing" instead of "eliminating" for duplicate reads - Rewrite test with proper structure, comments, and CREATE OR REPLACE - Add Test 6: UPDATE...WHERE (subquery) with lazy columns - Avoid unwrap in lazy_columns handling

dantengsky · 2026-04-10T08:36:54Z

@codex review

chatgpt-codex-connector · 2026-04-10T08:46:10Z

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Apr 9, 2026

chatgpt-codex-connector bot reviewed Apr 9, 2026

View reviewed changes

src/query/storages/fuse/src/operations/merge_into/processors/block_id_partition_exchange.rs Outdated Show resolved Hide resolved

fix: use u16 instead of u8 for partition indices in BlockIdPartitionE…

e311156

…xchange Avoids truncation when max_threads exceeds 255.

chatgpt-codex-connector bot reviewed Apr 9, 2026

View reviewed changes

src/query/service/src/physical_plans/physical_mutation_into_split.rs Outdated Show resolved Hide resolved

fix: add comment explaining why SELECT+LIMIT RowFetch skips repartition

4cbc49b

dantengsky force-pushed the feat/merge-into-block-id-repartition branch 8 times, most recently from eb1ed7b to fddfa19 Compare April 10, 2026 06:05

chatgpt-codex-connector bot reviewed Apr 10, 2026

View reviewed changes

src/query/service/src/physical_plans/physical_mutation.rs Outdated Show resolved Hide resolved

dantengsky commented Apr 10, 2026

View reviewed changes

This comment was marked as duplicate.

Sign in to view

dantengsky force-pushed the feat/merge-into-block-id-repartition branch 2 times, most recently from 6b79bc3 to 0057c23 Compare April 10, 2026 08:03

dantengsky force-pushed the feat/merge-into-block-id-repartition branch from 0057c23 to d40dcb0 Compare April 10, 2026 08:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(query): local block_id repartition before RowFetch in MERGE INTO#19689

feat(query): local block_id repartition before RowFetch in MERGE INTO#19689
dantengsky wants to merge 5 commits intodatabendlabs:mainfrom
dantengsky:feat/merge-into-block-id-repartition

dantengsky commented Apr 9, 2026 •

edited by drmingdrmer

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

dantengsky commented Apr 9, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

dantengsky commented Apr 10, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 10, 2026

Uh oh!

dantengsky commented Apr 10, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

dantengsky left a comment •

edited

Loading

Uh oh!

This comment was marked as duplicate.

Uh oh!

dantengsky commented Apr 10, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dantengsky commented Apr 9, 2026 • edited by drmingdrmer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Pipeline changes

New setting: enable_mutation_block_id_repartition

Key changes

Performance impact

Tests

Type of change

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

dantengsky commented Apr 9, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

dantengsky commented Apr 10, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 10, 2026

Uh oh!

dantengsky commented Apr 10, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

dantengsky left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

dantengsky commented Apr 10, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dantengsky commented Apr 9, 2026 •

edited by drmingdrmer

Loading

New setting: `enable_mutation_block_id_repartition`

dantengsky left a comment •

edited

Loading