Skip to content

fix(query): support broadcast join with merge-limit build#19652

Open
SkyFan2002 wants to merge 8 commits intodatabendlabs:mainfrom
SkyFan2002:03-31
Open

fix(query): support broadcast join with merge-limit build#19652
SkyFan2002 wants to merge 8 commits intodatabendlabs:mainfrom
SkyFan2002:03-31

Conversation

@SkyFan2002
Copy link
Copy Markdown
Member

@SkyFan2002 SkyFan2002 commented Apr 1, 2026

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  • Fix cluster execution for broadcast-join plans whose build side becomes Broadcast(Limit(Merge(Limit(Scan))))
  • Maintain source fragment relationships so merge-input intermediate fragments are scheduled only on the coordinator
  • Add source-only ExchangeSource fallback and on-demand inbound channel-set creation for local broadcast/hash receiver paths
  • Add a cluster sqllogictest that checks both the EXPLAIN shape and the successful query result

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label Apr 1, 2026
@SkyFan2002
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@databendlabs databendlabs deleted a comment from github-actions bot Apr 1, 2026
@databendlabs databendlabs deleted a comment from github-actions bot Apr 3, 2026
Add execution-layer fallback for ExchangeSource when the source
fragment is not present in the local QueryCoordinator.

- carry source DataExchange metadata through Fragmenter
- return Option from fragment subscription lookups
- build merge/broadcast/shuffle/global-shuffle sources directly from
  remote receivers when local fragment expansion is unavailable

Validation:
- cargo check -p databend-query --lib
- cargo build -p databend-query --lib
Register and lazily create inbound channel sets for do_exchange
receivers so remote-only exchange sources can build pipelines before
network senders connect.

- pre-register incoming exchange fragments during query env init
- add get-or-create helpers on DataExchangeManager and QueryCoordinator
- switch broadcast/global-shuffle receive paths to use get-or-create

Validation:
- cargo check -p databend-query --lib
- db-slt --cluster --run-dir cluster --run-file subquery.test
Route inbound sender creation through the same channel-set guard used by
pre-registration so mismatched exchange parallelism fails explicitly.

Also add QueryCoordinator unit tests covering inbound channel set
registration and mismatch detection.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 12, 2026

🤖 CI Job Analysis

Workflow: 24312296815

📊 Summary

  • Total Jobs: 87
  • Failed Jobs: 28
  • Retryable: 0
  • Code Issues: 28

NO RETRY NEEDED

All failures appear to be code/test issues requiring manual fixes.

🔍 Job Details

  • linux / test_unit: Not retryable (Code/Test)
  • linux / test_stateless_cluster: Not retryable (Code/Test)
  • linux / test_compat_client_cluster: Not retryable (Code/Test)
  • linux / test_private_tasks: Not retryable (Code/Test)
  • linux / test_logs: Not retryable (Code/Test)
  • linux / test_stateful_cluster: Not retryable (Code/Test)
  • linux / sqllogic / cluster_with_minio_and_nginx (http_handler, ttc-go): Not retryable (Code/Test)
  • linux / sqllogic / standalone_iceberg_tpch: Not retryable (Code/Test)
  • linux / sqllogic / cluster_with_minio_and_nginx (http_handler, ttc-rust): Not retryable (Code/Test)
  • linux / sqllogic / standalone (ydb, 2c, http): Not retryable (Code/Test)
  • linux / sqllogic / standalone (query, 4c, hybrid): Not retryable (Code/Test)
  • linux / sqllogic / standalone (query, 4c, http): Not retryable (Code/Test)
  • linux / sqllogic / cluster (query, 4c, http): Not retryable (Code/Test)
  • linux / sqllogic / cluster (query, 4c, hybrid): Not retryable (Code/Test)
  • linux / sqllogic / cluster (duckdb, 4c, hybrid): Not retryable (Code/Test)
  • linux / sqllogic / cluster (duckdb, 4c, http): Not retryable (Code/Test)
  • linux / sqllogic / cluster (crdb, 2c, 2, http): Not retryable (Code/Test)
  • linux / sqllogic / cluster (base, 2c, 2, hybrid): Not retryable (Code/Test)
  • linux / sqllogic / cluster (ydb, 2c, http): Not retryable (Code/Test)
  • linux / sqllogic / cluster (base, 2c, 2, http): Not retryable (Code/Test)
  • linux / sqllogic / cluster (cluster, 2c, http): Not retryable (Code/Test)
  • linux / sqllogic / cluster (cluster, 2c, hybrid): Not retryable (Code/Test)
  • linux / sqllogic / cluster (ydb, 2c, hybrid): Not retryable (Code/Test)
  • linux / sqllogic / cluster (crdb, 2c, 2, hybrid): Not retryable (Code/Test)
  • linux / sqllogic / cluster (tpch, 2c, http): Not retryable (Code/Test)
  • linux / sqllogic / cluster (tpcds, 4c, hybrid): Not retryable (Code/Test)
  • linux / sqllogic / cluster (tpcds, 4c, http): Not retryable (Code/Test)
  • linux / sqllogic / cluster (tpch, 2c, hybrid): Not retryable (Code/Test)

🤖 About

Automated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed).

- add a send-only broadcast sink for remote leftover fragments
- allow shuffle exchanges to consume only existing remote receivers
- validate with cluster cte, window_ntile, and generate_series tests
Schedule empty and singleton source fragments on the coordinator
once real distributed work is absent so sink-side broadcast
exchanges are not materialized for remote-only receive flows.

Let pulling executors surface EOF after the execution graph has
finished, and bound statistics receiver shutdown so zero-row
cluster queries can fully tear down.

Add cluster regressions for coordinator-only distributed paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix this PR patches a bug in codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant