Skip to content

feat: globally reorder files and row groups by statistics for TopK queries#21956

Open
zhuqi-lucas wants to merge 6 commits into
apache:mainfrom
zhuqi-lucas:feat/rg-reorder-by-statistics
Open

feat: globally reorder files and row groups by statistics for TopK queries#21956
zhuqi-lucas wants to merge 6 commits into
apache:mainfrom
zhuqi-lucas:feat/rg-reorder-by-statistics

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Contributor

@zhuqi-lucas zhuqi-lucas commented Apr 30, 2026

Which issue does this PR close?

Rationale for this change

TopK queries (ORDER BY col LIMIT K) on parquet files with multiple out-of-order row groups are suboptimal — the dynamic filter threshold converges slowly because row groups are read in arbitrary order. By reordering row groups so the "best" ones (containing optimal values) are read first, the threshold tightens quickly and subsequent row groups are pruned at runtime.

What changes are included in this PR?

Row group reorder by statistics:

  • PreparedAccessPlan::reorder_by_statistics(): sorts row groups by min values (ASC) using parquet column statistics. Direction (DESC) is handled by existing reverse() applied after reorder. The two compose correctly for both sorted and unsorted data.
  • AccessPlanOptimizer trait: extensible interface for row group access plan optimizations applied after pruning.

DynamicFilter sort metadata:

  • DynamicFilterPhysicalExpr gains sort_options and fetch fields, set by SortExec::create_filter(). This lets the parquet reader determine reorder direction for any TopK query (not just sort-pushdown path).
  • Fix: SortExec::with_fetch now sets fetch before calling create_filter() so the DynamicFilter gets the correct K value.

File-level reorder in shared work queue:

  • FileSource::reorder_files() trait method + parquet implementation: reorders files in the shared work queue by statistics so multi-file TopK reads the most promising files first across all partitions.

Are these changes tested?

  • SLT tests: Tests H (mixed RGs), J (scrambled non-overlapping), K (overlapping), L (multi-key ORDER BY), P (multi-file reorder)
  • All existing sort_pushdown SLTs pass
  • 98 parquet lib unit tests pass
  • Clippy clean, rustdoc clean

Are there any user-facing changes?

No API changes. TopK queries on parquet with multiple row groups will automatically benefit from better row group ordering. This is a performance optimization only — query results are unchanged.

Copilot AI review requested due to automatic review settings April 30, 2026 09:23
@github-actions github-actions Bot added physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Apr 30, 2026
@zhuqi-lucas zhuqi-lucas force-pushed the feat/rg-reorder-by-statistics branch from 864a3d3 to 6e56cae Compare April 30, 2026 09:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves TopK (ORDER BY ... LIMIT K) performance for Parquet scans by using statistics to reorder work (files and row groups) so the dynamic filter threshold converges faster and more data can be pruned during execution.

Changes:

  • Propagate TopK sort metadata (sort_options, fetch) via DynamicFilterPhysicalExpr and fix SortExec::with_fetch ordering so the dynamic filter sees the correct K.
  • Add file-level reordering hook (FileSource::reorder_files) and implement statistics-based file reordering for Parquet.
  • Add row-group access plan optimization plumbing and statistics-based row-group reordering (composable with reverse scanning), plus SLT coverage.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
datafusion/sqllogictest/test_files/sort_pushdown.slt Adds SLT coverage for row-group/file reordering behavior across multiple TopK scenarios
datafusion/physical-plan/src/sorts/sort.rs Passes sort options + fetch into dynamic filter and adjusts fetch/filter initialization order
datafusion/physical-expr/src/expressions/dynamic_filters.rs Extends DynamicFilterPhysicalExpr with optional sort metadata and fetch limit
datafusion/datasource/src/file_stream/work_source.rs Calls FileSource::reorder_files before queueing shared work
datafusion/datasource/src/file.rs Introduces FileSource::reorder_files default extension point
datafusion/datasource-parquet/src/source.rs Implements Parquet file reordering using column statistics; wires fallback sort info
datafusion/datasource-parquet/src/opener.rs Applies row-group access plan optimizers (reorder + reverse) based on TopK metadata
datafusion/datasource-parquet/src/mod.rs Registers new access plan optimizer module
datafusion/datasource-parquet/src/access_plan_optimizer.rs Adds AccessPlanOptimizer trait plus ReverseRowGroups / ReorderByStatistics implementations
datafusion/datasource-parquet/src/access_plan.rs Adds PreparedAccessPlan::reorder_by_statistics (min-stat based RG ordering)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread datafusion/physical-plan/src/sorts/sort.rs Outdated
Comment thread datafusion/datasource/src/file_stream/work_source.rs
Comment thread datafusion/datasource-parquet/src/opener.rs Outdated
Comment thread datafusion/datasource-parquet/src/opener.rs Outdated
Comment thread datafusion/sqllogictest/test_files/sort_pushdown.slt Outdated
@zhuqi-lucas zhuqi-lucas force-pushed the feat/rg-reorder-by-statistics branch from 6e56cae to f0f4058 Compare April 30, 2026 09:34
@zhuqi-lucas zhuqi-lucas marked this pull request as draft April 30, 2026 09:36
@zhuqi-lucas zhuqi-lucas force-pushed the feat/rg-reorder-by-statistics branch 3 times, most recently from 587297d to 235c4e1 Compare April 30, 2026 09:46
@zhuqi-lucas zhuqi-lucas changed the title feat: reorder row groups by statistics for TopK queries feat: reorder files and row groups by statistics for TopK queries Apr 30, 2026
@zhuqi-lucas zhuqi-lucas changed the title feat: reorder files and row groups by statistics for TopK queries feat: globally reorder files and row groups by statistics for TopK queries Apr 30, 2026
@zhuqi-lucas zhuqi-lucas marked this pull request as ready for review April 30, 2026 09:47
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark sort_pushdown_inexact

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4351456416-1947-jscm6 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/rg-reorder-by-statistics (235c4e1) to 0144570 (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete


File an issue against this benchmark runner

@zhuqi-lucas zhuqi-lucas force-pushed the feat/rg-reorder-by-statistics branch from 235c4e1 to 0c9c3d8 Compare April 30, 2026 10:02
@github-actions github-actions Bot added the core Core DataFusion crate label Apr 30, 2026
@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_rg-reorder-by-statistics
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃  feat_rg-reorder-by-statistics ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    7.22 / 8.08 ±0.90 / 9.82 ms │   6.71 / 7.47 ±1.27 / 10.00 ms │ +1.08x faster │
│ Q2    │    6.79 / 6.88 ±0.08 / 7.01 ms │    6.87 / 7.08 ±0.23 / 7.48 ms │     no change │
│ Q3    │ 21.76 / 22.52 ±0.64 / 23.37 ms │ 21.99 / 22.88 ±0.57 / 23.66 ms │     no change │
│ Q4    │ 20.82 / 21.86 ±0.84 / 22.82 ms │ 20.73 / 21.74 ±0.77 / 22.71 ms │     no change │
└───────┴────────────────────────────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Benchmark Summary                            ┃         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ Total Time (HEAD)                            │ 59.34ms │
│ Total Time (feat_rg-reorder-by-statistics)   │ 59.17ms │
│ Average Time (HEAD)                          │ 14.83ms │
│ Average Time (feat_rg-reorder-by-statistics) │ 14.79ms │
│ Queries Faster                               │       1 │
│ Queries Slower                               │       0 │
│ Queries with No Change                       │       3 │
│ Queries with Failure                         │       0 │
└──────────────────────────────────────────────┴─────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 4.6 GiB
Avg memory 4.6 GiB
CPU user 2.5s
CPU sys 0.4s
Peak spill 0 B

sort_pushdown_inexact — branch

Metric Value
Wall time 5.0s
Peak memory 4.6 GiB
Avg memory 4.6 GiB
CPU user 2.6s
CPU sys 0.3s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas zhuqi-lucas force-pushed the feat/rg-reorder-by-statistics branch from 0c9c3d8 to f7d9156 Compare April 30, 2026 10:13
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

The benchmark results are expected — RG reorder alone doesn't skip any row groups, it only changes the read order so that TopK's dynamic filter threshold converges faster.

The significant speedup (2-3x on sort_pushdown_inexact) comes from stats init + cumulative RG prune which will be in the follow-up PR. Those optimizations depend on RG reorder as a foundation:

  1. RG reorder: put best RGs first (this PR)
  2. Stats init: initialize TopK threshold from RG statistics before reading → prune RGs upfront (next PR)
  3. Cumulative prune: after reorder, truncate remaining RGs once enough rows are collected (next PR)

Without reorder, cumulative prune might truncate the wrong RGs. Reorder ensures the best RGs come first, making truncation safe and effective.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 4, 2026

Sorry @zhuqi-lucas - I am really struggling these days tor review all these large PRs. Last year it was very rare to see a more than 1000 line PR and now we get multiple such PRs a day.

I will try and find the time to review this more carefuly

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 4, 2026

run benchmark clickbench_partitioned

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4373271959-2014-w8h9b 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/rg-reorder-by-statistics (f7d9156) to 0144570 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_rg-reorder-by-statistics
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃         feat_rg-reorder-by-statistics ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.23 / 4.82 ±7.04 / 18.90 ms │          1.21 / 4.82 ±7.05 / 18.92 ms │     no change │
│ QQuery 1  │        12.47 / 13.00 ±0.29 / 13.32 ms │        12.72 / 12.98 ±0.18 / 13.19 ms │     no change │
│ QQuery 2  │        38.63 / 39.11 ±0.35 / 39.47 ms │        37.97 / 38.31 ±0.34 / 38.80 ms │     no change │
│ QQuery 3  │        33.07 / 33.52 ±0.51 / 34.49 ms │        33.41 / 33.64 ±0.22 / 33.98 ms │     no change │
│ QQuery 4  │     242.23 / 249.99 ±9.73 / 267.45 ms │     271.06 / 274.49 ±3.09 / 278.63 ms │  1.10x slower │
│ QQuery 5  │     284.47 / 296.56 ±9.28 / 306.71 ms │     299.65 / 308.25 ±4.93 / 314.88 ms │     no change │
│ QQuery 6  │           6.96 / 7.23 ±0.18 / 7.46 ms │           6.65 / 7.27 ±0.56 / 8.05 ms │     no change │
│ QQuery 7  │        14.21 / 14.72 ±0.31 / 15.02 ms │        14.41 / 16.07 ±3.18 / 22.43 ms │  1.09x slower │
│ QQuery 8  │     356.10 / 358.94 ±2.33 / 363.04 ms │     328.55 / 340.34 ±8.92 / 354.19 ms │ +1.05x faster │
│ QQuery 9  │     487.77 / 497.42 ±8.53 / 511.30 ms │     439.88 / 453.05 ±9.21 / 467.33 ms │ +1.10x faster │
│ QQuery 10 │        76.69 / 80.03 ±5.15 / 90.25 ms │        74.55 / 77.13 ±3.18 / 83.38 ms │     no change │
│ QQuery 11 │        84.54 / 86.98 ±1.52 / 88.97 ms │        86.04 / 87.25 ±1.24 / 89.04 ms │     no change │
│ QQuery 12 │     282.89 / 284.46 ±1.23 / 285.94 ms │    275.05 / 289.89 ±11.97 / 305.74 ms │     no change │
│ QQuery 13 │     396.27 / 404.06 ±5.67 / 410.90 ms │     403.21 / 412.12 ±9.24 / 429.06 ms │     no change │
│ QQuery 14 │     282.35 / 286.07 ±3.02 / 291.18 ms │     287.05 / 296.09 ±7.05 / 306.18 ms │     no change │
│ QQuery 15 │     279.47 / 287.60 ±5.65 / 294.51 ms │    294.22 / 320.57 ±14.31 / 336.64 ms │  1.11x slower │
│ QQuery 16 │    638.58 / 658.97 ±22.69 / 690.33 ms │     667.64 / 674.35 ±6.76 / 683.20 ms │     no change │
│ QQuery 17 │    622.84 / 637.11 ±12.84 / 654.17 ms │    621.26 / 638.40 ±15.17 / 660.66 ms │     no change │
│ QQuery 18 │ 1241.87 / 1316.27 ±41.71 / 1358.85 ms │ 1258.81 / 1307.11 ±34.18 / 1348.22 ms │     no change │
│ QQuery 19 │        28.46 / 31.56 ±4.31 / 40.10 ms │        28.59 / 28.80 ±0.19 / 29.06 ms │ +1.10x faster │
│ QQuery 20 │     529.29 / 537.43 ±5.02 / 544.19 ms │     523.97 / 532.45 ±8.68 / 549.23 ms │     no change │
│ QQuery 21 │     607.66 / 614.19 ±5.53 / 621.01 ms │     605.27 / 612.57 ±8.27 / 625.13 ms │     no change │
│ QQuery 22 │ 1070.59 / 1085.27 ±13.00 / 1107.09 ms │ 1072.45 / 1090.41 ±10.78 / 1104.44 ms │     no change │
│ QQuery 23 │ 3314.00 / 3437.61 ±78.34 / 3523.07 ms │ 3387.58 / 3448.97 ±39.32 / 3497.47 ms │     no change │
│ QQuery 24 │        42.20 / 44.81 ±3.57 / 51.79 ms │        51.45 / 55.62 ±5.18 / 65.81 ms │  1.24x slower │
│ QQuery 25 │     113.70 / 114.49 ±0.66 / 115.32 ms │     128.16 / 128.61 ±0.37 / 129.21 ms │  1.12x slower │
│ QQuery 26 │        42.35 / 45.65 ±6.31 / 58.25 ms │        52.63 / 55.33 ±3.39 / 61.79 ms │  1.21x slower │
│ QQuery 27 │     674.46 / 685.83 ±7.29 / 695.20 ms │     688.07 / 695.41 ±4.37 / 701.69 ms │     no change │
│ QQuery 28 │ 3020.28 / 3048.63 ±23.33 / 3084.84 ms │ 3010.28 / 3074.39 ±33.31 / 3105.94 ms │     no change │
│ QQuery 29 │        43.07 / 43.94 ±1.04 / 45.91 ms │       43.87 / 51.79 ±10.11 / 69.72 ms │  1.18x slower │
│ QQuery 30 │     307.14 / 315.88 ±6.92 / 323.79 ms │     334.93 / 343.24 ±8.01 / 356.83 ms │  1.09x slower │
│ QQuery 31 │     308.72 / 314.42 ±4.56 / 321.89 ms │    314.12 / 323.87 ±12.27 / 348.06 ms │     no change │
│ QQuery 32 │ 1018.11 / 1070.76 ±36.10 / 1118.10 ms │  1005.36 / 1016.45 ±7.84 / 1027.16 ms │ +1.05x faster │
│ QQuery 33 │ 1496.48 / 1556.75 ±49.74 / 1616.42 ms │ 1436.75 / 1491.87 ±40.63 / 1545.17 ms │     no change │
│ QQuery 34 │ 1453.29 / 1513.04 ±45.83 / 1572.31 ms │ 1451.91 / 1498.81 ±46.92 / 1588.77 ms │     no change │
│ QQuery 35 │    295.44 / 321.06 ±19.17 / 353.20 ms │    328.30 / 358.07 ±40.25 / 436.97 ms │  1.12x slower │
│ QQuery 36 │        64.29 / 70.64 ±7.87 / 86.00 ms │        66.27 / 67.78 ±1.06 / 69.29 ms │     no change │
│ QQuery 37 │        35.20 / 37.71 ±2.70 / 41.59 ms │        34.99 / 37.66 ±2.90 / 41.75 ms │     no change │
│ QQuery 38 │        39.83 / 44.27 ±4.33 / 50.79 ms │        39.84 / 44.31 ±5.22 / 54.12 ms │     no change │
│ QQuery 39 │    124.07 / 143.17 ±12.43 / 157.24 ms │     136.46 / 143.86 ±6.15 / 153.13 ms │     no change │
│ QQuery 40 │        15.94 / 17.45 ±2.47 / 22.36 ms │        14.40 / 17.46 ±4.12 / 25.40 ms │     no change │
│ QQuery 41 │        14.73 / 15.13 ±0.27 / 15.55 ms │        13.61 / 15.07 ±2.48 / 20.01 ms │     no change │
│ QQuery 42 │        14.25 / 14.48 ±0.20 / 14.83 ms │        13.15 / 15.82 ±4.95 / 25.71 ms │  1.09x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 20680.99ms │
│ Total Time (feat_rg-reorder-by-statistics)   │ 20740.75ms │
│ Average Time (HEAD)                          │   480.95ms │
│ Average Time (feat_rg-reorder-by-statistics) │   482.34ms │
│ Queries Faster                               │          4 │
│ Queries Slower                               │         10 │
│ Queries with No Change                       │         29 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.8 GiB
Avg memory 23.5 GiB
CPU user 1091.4s
CPU sys 68.7s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 105.0s
Peak memory 29.3 GiB
Avg memory 22.9 GiB
CPU user 1090.4s
CPU sys 70.5s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

Sorry @zhuqi-lucas - I am really struggling these days tor review all these large PRs. Last year it was very rare to see a more than 1000 line PR and now we get multiple such PRs a day.

I will try and find the time to review this more carefuly

Thanks @alamb, no rush at all, I really appreciate you taking the time.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 6, 2026

the benchmark results look mixed -- I'll see if that reproduces on a second run

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 6, 2026

run benchmark clickbench_partitioned

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4391783062-2042-6k9px 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/rg-reorder-by-statistics (f7d9156) to 0144570 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_rg-reorder-by-statistics
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃         feat_rg-reorder-by-statistics ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.20 / 4.70 ±6.87 / 18.44 ms │          1.22 / 4.79 ±6.97 / 18.74 ms │     no change │
│ QQuery 1  │        12.75 / 13.17 ±0.22 / 13.34 ms │        12.40 / 12.90 ±0.28 / 13.23 ms │     no change │
│ QQuery 2  │        36.81 / 37.18 ±0.33 / 37.61 ms │        37.22 / 37.42 ±0.25 / 37.88 ms │     no change │
│ QQuery 3  │        31.71 / 32.19 ±0.54 / 33.22 ms │        31.94 / 32.44 ±0.39 / 32.96 ms │     no change │
│ QQuery 4  │     247.78 / 250.01 ±2.06 / 253.50 ms │     253.69 / 257.03 ±1.73 / 258.56 ms │     no change │
│ QQuery 5  │     287.78 / 289.52 ±1.03 / 290.85 ms │     289.42 / 292.64 ±2.07 / 295.08 ms │     no change │
│ QQuery 6  │           7.29 / 7.81 ±0.67 / 9.10 ms │           6.90 / 7.16 ±0.18 / 7.46 ms │ +1.09x faster │
│ QQuery 7  │        14.28 / 14.36 ±0.07 / 14.49 ms │        13.95 / 14.08 ±0.09 / 14.16 ms │     no change │
│ QQuery 8  │     336.38 / 340.04 ±2.13 / 342.05 ms │     341.09 / 343.49 ±1.90 / 345.33 ms │     no change │
│ QQuery 9  │     461.94 / 467.43 ±5.31 / 477.00 ms │     467.44 / 472.74 ±3.67 / 477.75 ms │     no change │
│ QQuery 10 │        74.48 / 76.10 ±1.88 / 79.71 ms │        75.52 / 77.16 ±2.19 / 81.23 ms │     no change │
│ QQuery 11 │        85.73 / 86.07 ±0.26 / 86.42 ms │        85.99 / 87.54 ±1.12 / 89.28 ms │     no change │
│ QQuery 12 │     280.98 / 285.44 ±3.52 / 290.79 ms │     282.81 / 287.61 ±5.21 / 296.21 ms │     no change │
│ QQuery 13 │     404.18 / 411.13 ±9.26 / 429.29 ms │     405.10 / 414.80 ±8.21 / 425.37 ms │     no change │
│ QQuery 14 │     292.37 / 297.84 ±3.26 / 301.90 ms │     293.62 / 295.19 ±1.64 / 297.82 ms │     no change │
│ QQuery 15 │     291.47 / 294.53 ±2.30 / 297.44 ms │     291.66 / 300.90 ±7.54 / 312.14 ms │     no change │
│ QQuery 16 │     627.68 / 641.18 ±9.98 / 655.69 ms │     639.85 / 641.51 ±1.67 / 644.29 ms │     no change │
│ QQuery 17 │     638.42 / 647.21 ±7.23 / 655.68 ms │     640.67 / 645.68 ±3.67 / 651.56 ms │     no change │
│ QQuery 18 │ 1290.83 / 1314.60 ±14.71 / 1329.99 ms │ 1297.43 / 1309.34 ±10.13 / 1325.00 ms │     no change │
│ QQuery 19 │        29.23 / 32.41 ±5.77 / 43.94 ms │        29.21 / 34.33 ±5.83 / 44.03 ms │  1.06x slower │
│ QQuery 20 │     522.82 / 528.86 ±6.61 / 541.51 ms │     523.02 / 528.33 ±3.26 / 532.18 ms │     no change │
│ QQuery 21 │     603.90 / 608.73 ±2.96 / 612.83 ms │     603.88 / 609.67 ±6.41 / 621.84 ms │     no change │
│ QQuery 22 │  1081.15 / 1087.43 ±6.25 / 1096.41 ms │  1068.50 / 1080.17 ±6.12 / 1086.37 ms │     no change │
│ QQuery 23 │ 3384.13 / 3408.97 ±14.66 / 3428.08 ms │ 3395.45 / 3408.41 ±11.08 / 3423.79 ms │     no change │
│ QQuery 24 │        42.03 / 42.47 ±0.33 / 42.83 ms │       51.50 / 61.01 ±13.32 / 86.16 ms │  1.44x slower │
│ QQuery 25 │     114.26 / 119.46 ±5.17 / 128.73 ms │     127.97 / 132.42 ±4.01 / 139.39 ms │  1.11x slower │
│ QQuery 26 │        42.83 / 44.42 ±1.62 / 47.38 ms │        52.32 / 53.26 ±1.06 / 55.01 ms │  1.20x slower │
│ QQuery 27 │     679.50 / 681.37 ±1.81 / 684.19 ms │     669.76 / 682.80 ±8.33 / 693.59 ms │     no change │
│ QQuery 28 │ 3012.02 / 3049.65 ±22.48 / 3077.02 ms │ 3027.95 / 3046.49 ±13.96 / 3063.12 ms │     no change │
│ QQuery 29 │       42.80 / 49.16 ±11.04 / 71.17 ms │        42.94 / 45.42 ±3.96 / 53.26 ms │ +1.08x faster │
│ QQuery 30 │     313.88 / 318.51 ±3.26 / 323.73 ms │     316.46 / 319.91 ±4.37 / 328.18 ms │     no change │
│ QQuery 31 │     307.95 / 317.28 ±6.52 / 325.61 ms │     307.66 / 313.96 ±3.83 / 318.14 ms │     no change │
│ QQuery 32 │ 1030.36 / 1047.67 ±22.70 / 1092.49 ms │ 1040.24 / 1052.69 ±10.45 / 1069.35 ms │     no change │
│ QQuery 33 │ 1451.84 / 1477.87 ±15.35 / 1498.49 ms │ 1469.74 / 1503.70 ±41.90 / 1584.18 ms │     no change │
│ QQuery 34 │ 1482.73 / 1500.06 ±11.41 / 1518.36 ms │ 1476.76 / 1492.89 ±15.81 / 1516.11 ms │     no change │
│ QQuery 35 │    310.09 / 330.24 ±12.15 / 344.45 ms │     301.82 / 307.84 ±4.94 / 315.16 ms │ +1.07x faster │
│ QQuery 36 │        64.00 / 65.09 ±0.85 / 66.22 ms │        64.17 / 66.87 ±1.97 / 68.78 ms │     no change │
│ QQuery 37 │        35.91 / 41.29 ±5.84 / 52.66 ms │        36.03 / 42.85 ±5.55 / 50.91 ms │     no change │
│ QQuery 38 │        43.19 / 45.62 ±2.60 / 50.61 ms │        41.06 / 44.60 ±3.39 / 50.81 ms │     no change │
│ QQuery 39 │     131.17 / 140.16 ±6.52 / 150.63 ms │     126.02 / 138.27 ±9.57 / 151.72 ms │     no change │
│ QQuery 40 │        15.14 / 18.55 ±3.41 / 23.30 ms │        14.85 / 15.28 ±0.53 / 16.29 ms │ +1.21x faster │
│ QQuery 41 │        14.55 / 16.04 ±2.52 / 21.05 ms │        14.25 / 16.53 ±2.79 / 21.83 ms │     no change │
│ QQuery 42 │        13.94 / 14.63 ±1.03 / 16.68 ms │        13.86 / 15.94 ±3.84 / 23.61 ms │  1.09x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 20496.45ms │
│ Total Time (feat_rg-reorder-by-statistics)   │ 20548.05ms │
│ Average Time (HEAD)                          │   476.66ms │
│ Average Time (feat_rg-reorder-by-statistics) │   477.86ms │
│ Queries Faster                               │          4 │
│ Queries Slower                               │          5 │
│ Queries with No Change                       │         34 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 29.0 GiB
Avg memory 23.1 GiB
CPU user 1079.7s
CPU sys 67.2s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 105.0s
Peak memory 30.0 GiB
Avg memory 23.2 GiB
CPU user 1082.3s
CPU sys 68.1s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

the benchmark results look mixed -- I'll see if that reproduces on a second run

Thanks @alamb , i update the PR to skip reorder when overlap is heavy, and let me trigger again to see if it's the reason.

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark clickbench_partitioned

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4395019893-2044-wkdsl 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/rg-reorder-by-statistics (c8fd321) to 0c38ebb (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@github-actions github-actions Bot added the auto detected api change Auto detected API change label May 7, 2026
@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
   Compiling async-compression v0.4.42
   Compiling mimalloc v0.1.50
   Compiling parquet v58.2.0
   Compiling datafusion-common v53.1.0 (/workspace/datafusion-branch/datafusion/common)
   Compiling datafusion-expr-common v53.1.0 (/workspace/datafusion-branch/datafusion/expr-common)
   Compiling datafusion-physical-expr-common v53.1.0 (/workspace/datafusion-branch/datafusion/physical-expr-common)
   Compiling datafusion-functions-aggregate-common v53.1.0 (/workspace/datafusion-branch/datafusion/functions-aggregate-common)
   Compiling datafusion-functions-window-common v53.1.0 (/workspace/datafusion-branch/datafusion/functions-window-common)
   Compiling datafusion-expr v53.1.0 (/workspace/datafusion-branch/datafusion/expr)
   Compiling datafusion-physical-expr v53.1.0 (/workspace/datafusion-branch/datafusion/physical-expr)
   Compiling datafusion-execution v53.1.0 (/workspace/datafusion-branch/datafusion/execution)
error[E0063]: missing fields `fetch` and `sort_options` in initializer of `DynamicFilterPhysicalExpr`
   --> datafusion/physical-expr/src/expressions/dynamic_filters.rs:457:9
    |
457 |         Self {
    |         ^^^^ missing `fetch` and `sort_options`

For more information about this error, try `rustc --explain E0063`.
error: could not compile `datafusion-physical-expr` (lib) due to 1 previous error
warning: build failed, waiting for other jobs to finish...

File an issue against this benchmark runner

@zhuqi-lucas zhuqi-lucas force-pushed the feat/rg-reorder-by-statistics branch from c8fd321 to ab09242 Compare May 7, 2026 07:55
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark clickbench_partitioned

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4395242557-2045-x5dxd 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/rg-reorder-by-statistics (4fac37b) to 3b634aa (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@github-actions github-actions Bot removed the auto detected api change Auto detected API change label May 7, 2026
@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_rg-reorder-by-statistics
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃         feat_rg-reorder-by-statistics ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.17 / 4.64 ±6.78 / 18.19 ms │          1.18 / 4.64 ±6.77 / 18.18 ms │     no change │
│ QQuery 1  │        12.74 / 12.91 ±0.12 / 13.07 ms │        13.03 / 13.28 ±0.14 / 13.46 ms │     no change │
│ QQuery 2  │        35.86 / 36.47 ±0.48 / 37.25 ms │        36.47 / 36.68 ±0.19 / 37.02 ms │     no change │
│ QQuery 3  │        30.84 / 31.38 ±0.64 / 32.65 ms │        30.91 / 31.05 ±0.08 / 31.14 ms │     no change │
│ QQuery 4  │     232.70 / 236.25 ±3.06 / 241.83 ms │     233.22 / 234.32 ±0.79 / 235.30 ms │     no change │
│ QQuery 5  │     279.72 / 281.78 ±1.81 / 284.54 ms │     276.91 / 280.80 ±3.00 / 285.58 ms │     no change │
│ QQuery 6  │           6.77 / 7.03 ±0.19 / 7.28 ms │           6.67 / 7.32 ±0.40 / 7.93 ms │     no change │
│ QQuery 7  │        13.90 / 13.95 ±0.04 / 14.01 ms │        14.09 / 14.27 ±0.12 / 14.45 ms │     no change │
│ QQuery 8  │     318.01 / 323.88 ±3.52 / 327.40 ms │     315.04 / 318.04 ±2.01 / 320.95 ms │     no change │
│ QQuery 9  │     455.54 / 459.86 ±3.17 / 463.50 ms │     446.87 / 455.95 ±4.81 / 461.11 ms │     no change │
│ QQuery 10 │        70.11 / 71.04 ±0.81 / 72.49 ms │        69.85 / 70.59 ±0.75 / 71.72 ms │     no change │
│ QQuery 11 │        79.31 / 81.02 ±1.24 / 82.69 ms │        81.09 / 81.49 ±0.30 / 81.82 ms │     no change │
│ QQuery 12 │     271.17 / 277.33 ±5.06 / 286.31 ms │     271.78 / 276.31 ±3.98 / 282.18 ms │     no change │
│ QQuery 13 │     385.11 / 389.27 ±4.84 / 398.56 ms │    379.76 / 391.53 ±11.83 / 413.39 ms │     no change │
│ QQuery 14 │     277.49 / 281.96 ±5.86 / 293.26 ms │     279.02 / 281.91 ±3.46 / 288.60 ms │     no change │
│ QQuery 15 │     277.09 / 283.44 ±7.77 / 298.74 ms │    279.24 / 292.29 ±16.55 / 324.16 ms │     no change │
│ QQuery 16 │     602.36 / 608.37 ±5.41 / 618.30 ms │     598.97 / 609.14 ±6.33 / 616.06 ms │     no change │
│ QQuery 17 │     597.12 / 606.79 ±5.90 / 613.61 ms │     605.86 / 612.02 ±4.38 / 617.78 ms │     no change │
│ QQuery 18 │ 1176.93 / 1203.07 ±17.94 / 1228.67 ms │ 1192.41 / 1212.50 ±16.52 / 1241.91 ms │     no change │
│ QQuery 19 │        27.97 / 29.61 ±2.84 / 35.30 ms │        28.61 / 33.03 ±5.34 / 40.93 ms │  1.12x slower │
│ QQuery 20 │    517.04 / 534.38 ±20.06 / 572.51 ms │     517.60 / 521.11 ±2.95 / 525.47 ms │     no change │
│ QQuery 21 │     593.63 / 600.06 ±6.83 / 613.09 ms │     592.76 / 596.20 ±2.79 / 600.88 ms │     no change │
│ QQuery 22 │  1054.85 / 1065.40 ±6.04 / 1071.06 ms │  1057.93 / 1061.61 ±3.43 / 1067.11 ms │     no change │
│ QQuery 23 │ 3179.68 / 3199.56 ±19.83 / 3234.14 ms │ 3179.00 / 3198.97 ±17.74 / 3231.12 ms │     no change │
│ QQuery 24 │        41.46 / 45.28 ±6.36 / 57.95 ms │       50.98 / 59.23 ±12.31 / 82.98 ms │  1.31x slower │
│ QQuery 25 │     111.80 / 116.28 ±6.84 / 129.76 ms │     125.66 / 127.45 ±1.84 / 130.99 ms │  1.10x slower │
│ QQuery 26 │        42.68 / 44.11 ±1.24 / 46.17 ms │        51.71 / 54.22 ±2.30 / 57.85 ms │  1.23x slower │
│ QQuery 27 │     667.87 / 673.95 ±3.71 / 678.18 ms │     667.56 / 677.21 ±5.68 / 683.81 ms │     no change │
│ QQuery 28 │ 3016.59 / 3042.43 ±17.92 / 3062.27 ms │ 3012.13 / 3034.15 ±20.18 / 3058.70 ms │     no change │
│ QQuery 29 │        41.77 / 47.71 ±6.74 / 56.79 ms │        42.23 / 42.63 ±0.38 / 43.35 ms │ +1.12x faster │
│ QQuery 30 │     298.30 / 305.97 ±4.71 / 313.11 ms │     305.36 / 307.69 ±1.86 / 310.87 ms │     no change │
│ QQuery 31 │     292.07 / 297.00 ±3.46 / 302.75 ms │     292.94 / 303.92 ±5.58 / 308.42 ms │     no change │
│ QQuery 32 │    924.04 / 939.96 ±14.42 / 962.45 ms │     924.15 / 933.31 ±4.61 / 936.34 ms │     no change │
│ QQuery 33 │ 1427.23 / 1447.61 ±14.59 / 1461.84 ms │ 1442.44 / 1456.91 ±13.11 / 1472.40 ms │     no change │
│ QQuery 34 │ 1441.71 / 1465.05 ±20.53 / 1499.54 ms │  1443.15 / 1457.73 ±9.33 / 1470.00 ms │     no change │
│ QQuery 35 │     289.98 / 293.98 ±4.21 / 300.35 ms │     285.80 / 298.74 ±9.90 / 313.48 ms │     no change │
│ QQuery 36 │        62.03 / 69.22 ±5.26 / 76.69 ms │        62.26 / 65.20 ±2.70 / 69.95 ms │ +1.06x faster │
│ QQuery 37 │        34.98 / 38.29 ±4.73 / 47.66 ms │        36.23 / 41.25 ±6.23 / 50.43 ms │  1.08x slower │
│ QQuery 38 │        41.57 / 43.01 ±1.07 / 44.34 ms │        41.45 / 46.92 ±4.19 / 54.05 ms │  1.09x slower │
│ QQuery 39 │     123.13 / 134.85 ±6.70 / 140.97 ms │     133.31 / 136.35 ±3.01 / 141.58 ms │     no change │
│ QQuery 40 │        14.40 / 15.87 ±1.48 / 18.32 ms │        14.49 / 16.58 ±2.14 / 19.67 ms │     no change │
│ QQuery 41 │        13.96 / 16.76 ±4.46 / 25.53 ms │        13.98 / 14.12 ±0.13 / 14.29 ms │ +1.19x faster │
│ QQuery 42 │        13.46 / 13.91 ±0.31 / 14.29 ms │        13.46 / 14.29 ±1.30 / 16.88 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 19690.71ms │
│ Total Time (feat_rg-reorder-by-statistics)   │ 19723.00ms │
│ Average Time (HEAD)                          │   457.92ms │
│ Average Time (feat_rg-reorder-by-statistics) │   458.67ms │
│ Queries Faster                               │          3 │
│ Queries Slower                               │          6 │
│ Queries with No Change                       │         34 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 100.0s
Peak memory 30.3 GiB
Avg memory 23.3 GiB
CPU user 1034.8s
CPU sys 64.5s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 100.0s
Peak memory 30.8 GiB
Avg memory 23.2 GiB
CPU user 1037.7s
CPU sys 64.0s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark clickbench_partitioned

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4396120969-2047-dtfjq 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/rg-reorder-by-statistics (6987aa2) to 3b634aa (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_rg-reorder-by-statistics
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃         feat_rg-reorder-by-statistics ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.20 / 4.54 ±6.64 / 17.82 ms │          1.18 / 4.53 ±6.64 / 17.80 ms │    no change │
│ QQuery 1  │        12.65 / 12.95 ±0.21 / 13.29 ms │        12.69 / 12.80 ±0.08 / 12.94 ms │    no change │
│ QQuery 2  │        35.28 / 35.76 ±0.40 / 36.38 ms │        36.42 / 36.85 ±0.24 / 37.13 ms │    no change │
│ QQuery 3  │        30.06 / 31.22 ±0.67 / 32.11 ms │        30.29 / 30.98 ±0.64 / 32.10 ms │    no change │
│ QQuery 4  │     226.08 / 231.19 ±4.52 / 238.42 ms │     224.28 / 230.47 ±3.90 / 235.85 ms │    no change │
│ QQuery 5  │     274.93 / 277.88 ±2.84 / 282.48 ms │     271.91 / 274.81 ±2.46 / 278.80 ms │    no change │
│ QQuery 6  │           6.70 / 7.19 ±0.38 / 7.88 ms │           6.96 / 7.19 ±0.20 / 7.45 ms │    no change │
│ QQuery 7  │        14.65 / 14.73 ±0.05 / 14.80 ms │        13.89 / 14.00 ±0.07 / 14.07 ms │    no change │
│ QQuery 8  │     310.14 / 311.16 ±1.04 / 312.56 ms │     309.27 / 311.98 ±2.40 / 316.25 ms │    no change │
│ QQuery 9  │     439.85 / 445.48 ±3.05 / 448.27 ms │     441.40 / 447.12 ±3.32 / 450.62 ms │    no change │
│ QQuery 10 │        69.32 / 70.75 ±1.75 / 74.14 ms │        67.98 / 68.91 ±0.74 / 70.01 ms │    no change │
│ QQuery 11 │        79.46 / 81.95 ±2.52 / 86.49 ms │        79.14 / 81.80 ±3.87 / 89.44 ms │    no change │
│ QQuery 12 │     266.87 / 272.45 ±4.43 / 278.74 ms │     266.41 / 271.18 ±3.75 / 277.01 ms │    no change │
│ QQuery 13 │     379.90 / 383.16 ±3.69 / 389.79 ms │     375.59 / 387.25 ±8.17 / 395.91 ms │    no change │
│ QQuery 14 │     277.36 / 279.44 ±2.48 / 284.23 ms │     275.12 / 277.48 ±3.51 / 284.43 ms │    no change │
│ QQuery 15 │     274.41 / 282.55 ±7.97 / 296.75 ms │     270.13 / 275.62 ±2.99 / 278.56 ms │    no change │
│ QQuery 16 │     596.55 / 602.55 ±4.30 / 606.63 ms │     594.32 / 600.81 ±6.79 / 610.42 ms │    no change │
│ QQuery 17 │     592.31 / 601.73 ±6.66 / 610.27 ms │     596.44 / 601.46 ±4.97 / 610.21 ms │    no change │
│ QQuery 18 │ 1178.91 / 1199.76 ±12.20 / 1212.30 ms │ 1169.33 / 1197.90 ±22.74 / 1236.30 ms │    no change │
│ QQuery 19 │        27.95 / 30.09 ±3.53 / 37.11 ms │        27.82 / 30.38 ±3.23 / 36.60 ms │    no change │
│ QQuery 20 │     512.92 / 519.73 ±8.68 / 536.84 ms │     512.18 / 521.16 ±4.74 / 525.35 ms │    no change │
│ QQuery 21 │     588.58 / 592.61 ±2.29 / 595.25 ms │     589.30 / 592.77 ±3.88 / 599.85 ms │    no change │
│ QQuery 22 │  1040.85 / 1056.30 ±8.94 / 1066.12 ms │ 1041.61 / 1063.66 ±20.53 / 1102.01 ms │    no change │
│ QQuery 23 │ 3118.55 / 3157.66 ±24.96 / 3194.14 ms │ 3144.45 / 3156.48 ±10.48 / 3169.82 ms │    no change │
│ QQuery 24 │        42.08 / 42.40 ±0.52 / 43.43 ms │        41.35 / 41.85 ±0.45 / 42.62 ms │    no change │
│ QQuery 25 │     110.58 / 111.78 ±0.76 / 112.74 ms │     109.72 / 113.20 ±3.10 / 117.57 ms │    no change │
│ QQuery 26 │        42.96 / 43.92 ±0.83 / 45.38 ms │        42.11 / 42.86 ±0.73 / 44.27 ms │    no change │
│ QQuery 27 │     667.69 / 670.60 ±2.81 / 675.36 ms │     670.00 / 680.00 ±6.94 / 688.73 ms │    no change │
│ QQuery 28 │ 2998.13 / 3026.91 ±19.25 / 3056.91 ms │ 2991.41 / 3011.03 ±10.55 / 3023.18 ms │    no change │
│ QQuery 29 │        41.38 / 45.32 ±4.68 / 51.80 ms │       41.51 / 51.16 ±18.67 / 88.48 ms │ 1.13x slower │
│ QQuery 30 │     297.64 / 303.09 ±5.28 / 310.03 ms │     298.33 / 301.00 ±3.74 / 308.39 ms │    no change │
│ QQuery 31 │     285.72 / 291.25 ±5.58 / 301.58 ms │     284.19 / 294.46 ±6.25 / 300.59 ms │    no change │
│ QQuery 32 │    900.75 / 920.82 ±20.37 / 959.83 ms │    902.56 / 918.05 ±12.10 / 937.23 ms │    no change │
│ QQuery 33 │  1419.13 / 1423.97 ±4.76 / 1433.02 ms │ 1424.74 / 1463.19 ±42.15 / 1540.75 ms │    no change │
│ QQuery 34 │ 1418.53 / 1441.87 ±12.92 / 1456.64 ms │ 1422.49 / 1435.21 ±12.63 / 1454.89 ms │    no change │
│ QQuery 35 │     285.31 / 289.58 ±4.05 / 296.74 ms │    278.13 / 303.63 ±41.76 / 386.76 ms │    no change │
│ QQuery 36 │        61.88 / 65.00 ±3.48 / 71.56 ms │        61.72 / 64.21 ±2.61 / 69.08 ms │    no change │
│ QQuery 37 │        34.74 / 36.41 ±2.20 / 40.71 ms │        34.61 / 34.87 ±0.17 / 35.12 ms │    no change │
│ QQuery 38 │        40.21 / 43.86 ±2.86 / 47.05 ms │        43.50 / 49.88 ±6.79 / 58.78 ms │ 1.14x slower │
│ QQuery 39 │     124.07 / 130.60 ±4.69 / 137.02 ms │     122.69 / 130.08 ±5.30 / 138.70 ms │    no change │
│ QQuery 40 │        14.07 / 14.29 ±0.27 / 14.80 ms │        13.63 / 16.02 ±3.44 / 22.85 ms │ 1.12x slower │
│ QQuery 41 │        13.53 / 13.65 ±0.12 / 13.82 ms │        13.30 / 13.74 ±0.32 / 14.26 ms │    no change │
│ QQuery 42 │       13.04 / 18.95 ±11.22 / 41.38 ms │        13.24 / 18.26 ±9.74 / 37.75 ms │    no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 19437.10ms │
│ Total Time (feat_rg-reorder-by-statistics)   │ 19480.29ms │
│ Average Time (HEAD)                          │   452.03ms │
│ Average Time (feat_rg-reorder-by-statistics) │   453.03ms │
│ Queries Faster                               │          0 │
│ Queries Slower                               │          3 │
│ Queries with No Change                       │         40 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 100.0s
Peak memory 30.7 GiB
Avg memory 23.2 GiB
CPU user 1025.5s
CPU sys 60.0s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 100.0s
Peak memory 31.7 GiB
Avg memory 23.3 GiB
CPU user 1025.6s
CPU sys 60.4s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

The benchmark is ok now.

Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main concerns:

  1. I don't think we should be (IMO) abusing dynamic filters to pass sort information.
  2. I don't think the traits introduced are helpful.

Can you explain why we need to use dynamic filters like this?

match arrow::compute::sort_to_indices(&stat_mins, Some(sort_options), None) {
Ok(indices) => indices,
Err(e) => {
debug!("Skipping RG reorder: sort failed: {e}");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected to happen normally? If not maybe this should be an info or a warn. Maybe we even add a debug assertion so it fails in our CI?

let stat_mins = match converter.row_group_mins(rg_metadata.iter().copied()) {
Ok(vals) => vals,
Err(e) => {
debug!("Skipping RG reorder: cannot get min values: {e}");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as https://github.com/apache/datafusion/pull/21956/changes#r3244789303.

If this is expected a docstring explaining why would be great. If unexpected... then we should at least error in CI.

file_metadata: &ParquetMetaData,
_arrow_schema: &Schema,
) -> Result<PreparedAccessPlan> {
plan.reverse(file_metadata)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea but is it really worth having a trait just to hide 1 line behind it?

Comment on lines +1134 to +1135
// 1. reorder_by_statistics: sort RGs by min values (ASC) to align
// with the file's declared output ordering. This fixes out-of-order
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't make sense to me. A file either declares an output ordering or not. If it does the row groups should already be sorted by it. Should this say the query's desired output ordering?

Comment on lines +1154 to +1160
} else if let Some(predicate) = &prepared.predicate
&& let Some(df) = find_dynamic_filter(predicate)
&& let Some(sort_options) = df.sort_options()
&& !sort_options.is_empty()
{
// Build a sort order from DynamicFilter for non-sort-pushdown TopK.
// Quick bail: check if the sort column exists in file schema.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? It seems kinda hacky / a side channel when we already have sort pushdown APIs. What is "non-sort-pushdown TopK"?

Comment on lines +1208 to +1214
let reverse_optimizer: Option<
Box<dyn crate::access_plan_optimizer::AccessPlanOptimizer>,
> = if is_descending {
Some(Box::new(crate::access_plan_optimizer::ReverseRowGroups))
} else {
None
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by this abstraction. The implementations are trivial, it's more code overall and it also obscures the code execution.

&self,
plan: PreparedAccessPlan,
file_metadata: &ParquetMetaData,
_arrow_schema: &Schema,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also smelly that we have 2 implementations and the arguments / signatures already don't match.

}

/// Find a `DynamicFilterPhysicalExpr` in the expression tree.
fn find_dynamic_filter(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to just blindly recurse like this? What if the expression is something like NOT (dynamic_filter) or whatever?

///
/// Sort options indicate the sort direction for each child expression,
/// enabling downstream consumers (e.g., parquet readers) to reorder
/// row groups by statistics for TopK queries.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again not a fan of this way of passing information around.

@adriangb
Copy link
Copy Markdown
Contributor

I wonder if this belongs on try_pushdown_sort rather than on DynamicFilterPhysicalExpr. The PushdownSort rule already visits SortExec → DataSourceExec (TopK or not) and forwards the required ordering to the source via try_pushdown_sort(order, eq_properties). Today parquet's implementation only accepts when the request matches the file's declared natural ordering (or its reverse). The case this PR is targeting — request doesn't match declared ordering (or the table declares none), but the source can still reorder row groups/files by min/max stats — fits the existing Inexact contract: keep the SortExec, but feed it data in a better order. The rule already preserves sort_exec.fetch() on the rebuilt SortExec, so TopK keeps working.

Concretely: in ParquetSource::try_pushdown_sort, after the existing Exact/reverse-Inexact checks fall through, instead of returning Unsupported, check whether the requested sort column exists in the file schema (the same check the opener fallback is already doing) and return Inexact with sort_order_for_reorder set. The opener already prefers sort_order_for_reorder over the dynamic filter — that whole branch in opener.rs (the find_dynamic_filter / find_column_in_expr fallback) can go away, along with the sort_options/fetch fields on DynamicFilterPhysicalExpr and the reordering of with_fetch to call create_filter after setting fetch.

A few reasons I think this factoring is cleaner:

  • DynamicFilterPhysicalExpr's job is "live threshold for runtime pruning." Adding sort-direction and K fields to it conflates that with "how should the source schedule its reads." try_pushdown_sort is the channel that already means the latter.
  • The reorder is useful even without a LIMIT — less sorting work and better locality help any ORDER BY. Routing through the dynamic filter ties the optimization to TopK; routing through sort pushdown makes it apply to any SortExec → DataSource shape.
  • The opener never actually reads df.fetch() — only sort_options and the child column — so the only piece of metadata being moved is one the sort-pushdown call already passes as its order argument.
  • Fewer places to maintain: no proto-roundtrip carve-out, no find_dynamic_filter tree walk under AND/wrappers (which is the kind of thing that quietly breaks the next time someone wraps the predicate), and no ordering coupling between with_fetch and create_filter.

Happy to be wrong here — is there a case I'm missing where the sort metadata can reach the parquet source through the dynamic filter but not through try_pushdown_sort?

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

Thanks @adriangb, you're right. Routing through try_pushdown_sort matches what that channel is for, decouples the reorder from TopK (so it helps any ORDER BY on a sorted source, not just LIMIT-K), and lets the sort_options/fetch fields on DynamicFilterPhysicalExpr go away — along with the find_dynamic_filter walk under AND/wrappers and the with_fetch/create_filter ordering coupling, which were the parts I was least happy about anyway.

Can't think of a case where the sort metadata reaches parquet through the dynamic filter but not through try_pushdown_sort — same SortExec → DataSource shape, dynamic filter just happens to be what SortExec attaches to its predicate.

Let me refactor along the lines you sketched.

When a parquet file has multiple row groups with out-of-order or
overlapping statistics, TopK queries benefit from reading "best" row
groups first so the dynamic filter threshold tightens quickly.

This PR adds:

1. `reorder_by_statistics`: sorts row groups by min values (ASC) based
   on parquet column statistics. Direction (DESC) is handled by the
   existing `reverse()` applied after reorder. The two steps compose:
   - Sorted data: reorder is a no-op, reverse gives perfect DESC order
   - Unsorted data: reorder fixes the order, reverse flips for DESC

2. `AccessPlanOptimizer` trait: extensible interface for row group
   access plan optimizations (reorder, reverse) applied after pruning.

3. `DynamicFilterPhysicalExpr.sort_options/fetch`: SortExec now passes
   sort direction and fetch limit to the dynamic filter, enabling the
   parquet reader to determine reorder direction for any TopK query.

4. `FileSource::reorder_files`: file-level reordering in the shared
   work queue so multi-file TopK reads the most promising files first.

5. Fix `SortExec::with_fetch` ordering: fetch must be set before
   `create_filter()` so the DynamicFilter gets the correct K value.
Adds an overlap guard to `PreparedAccessPlan::reorder_by_statistics`:
when sorted-by-min adjacent row-group `[min, max]` ranges overlap
above 50%, reordering cannot enable RG-level pruning (every "later"
RG still has values that could appear in TopK results) so the reorder
cost — CPU sort, lost IO sequential locality, and parallel scheduling
pessimization across workers all pulling "best" RGs first — dominates,
producing a net regression.

This addresses the ClickBench `hits_partitioned` regressions (Q24,
Q25, Q26: 1.11x-1.44x slower) where `EventTime` and `SearchPhrase`
are uniformly distributed across row groups via the user-id hash
partitioning, so RG min/max ranges fully overlap and reorder cannot
help.

Sorted / non-overlapping data continues to take the reorder path —
verified by `sort_pushdown.slt` Test H still passing.

The overlap helper treats null mins/maxes (RGs without statistics)
conservatively as overlaps, so missing stats discourage rather than
silently disable the guard.

Adds 7 unit tests covering: fully disjoint, disjoint after reorder,
fully overlapping, partial overlap, null max in previous, null min
in next, and the n<2 edge case.
Upstream `proto: serialize and dedupe dynamic filters v2 (apache#21807)` added
a new `DynamicFilterPhysicalExpr::from_parts` constructor that builds the
struct via a `Self { ... }` literal. After this PR's rebase brings that
commit in, the new initializer is missing the `sort_options` and `fetch`
fields this PR adds, breaking the build (E0063).

Default both to `None` in `from_parts` — the proto wire format does not
yet carry these fields, so reconstruction cannot recover them. Callers
that need sort metadata still go through `new_with_sort_options`. Proto
round-trip tests (162 in `datafusion-proto`) continue to pass.
Mirrors the row-group-level overlap guard at the file level. When
adjacent file `[min, max]` ranges (in sorted-by-min order) overlap
above 50%, file-level reorder cannot enable file pruning and the
reorder cost (CPU sort + lost IO sequential locality + parallel
work-stealing pessimization) dominates.

This addresses the remaining ClickBench `hits_partitioned` regressions
(Q24, Q25, Q26 still 1.10x-1.31x slower after the RG-level guard
landed). `hits_partitioned` is partitioned by user-id hash, so each
file covers the full range of `EventTime` and `SearchPhrase` — file
ranges fully overlap and reorder cannot help.

Files lacking statistics are conservatively counted as overlaps so
missing stats discourage rather than silently disable the guard.

Adds 6 unit tests (disjoint sorted, disjoint after reorder, fully
overlapping, partial below threshold, missing stats, n<2).
Per @adriangb's review on PR apache#21956, the sort-direction + fetch
metadata that drives parquet row-group reorder belongs on the
sort-pushdown channel that already exists, not bolted onto
`DynamicFilterPhysicalExpr` whose job is runtime threshold pruning.

Concretely:

- `ParquetSource::try_pushdown_sort` grows a third branch. After the
  existing Exact and reverse-Inexact checks fall through, if the
  requested sort column is a plain `Column` present in the file
  schema, the source now returns `Inexact` with
  `sort_order_for_reorder` set (and `reverse_row_groups` for DESC),
  instead of `Unsupported`. The reorder benefit now applies to any
  `ORDER BY` on a sorted source, not just TopK.

- `parquet/opener.rs` drops the `find_dynamic_filter` /
  `find_column_in_expr` walk under AND/wrappers and the
  `df.sort_options()` extraction. Both reorder direction and
  reverse-pass activation come from the source config alone.
  `is_descending` collapses to just `prepared.reverse_row_groups`.

- `ParquetSource::extract_topk_sort_info` (used by
  `reorder_files` in the shared work queue) reads
  `sort_order_for_reorder` directly, no dynamic-filter fallback.

- `DynamicFilterPhysicalExpr` loses its `sort_options` field,
  `fetch` field, `new_with_sort_options` constructor, and both
  getters. The proto round-trip carve-out (`from_parts` defaulting
  these to None) goes away.

- `SortExec::create_filter` reverts to `DynamicFilterPhysicalExpr::new`.
  `SortExec::with_fetch` reverts to building the filter from
  `self.create_filter()` before assigning `new_sort.fetch = fetch` —
  the ordering coupling introduced earlier is no longer needed because
  `create_filter` no longer reads `fetch`.

Tests:

- Two existing pushdown_sort tests (`test_no_pushdown_for_non_reverse_sort`,
  `test_no_prefix_match_longer_than_source`) renamed and re-snapshotted
  to reflect the new `Inexact` outcome — pushdown now succeeds even
  when neither natural nor reversed ordering matches, as long as the
  sort column is in the file schema. The surrounding `SortExec` still
  stays in place; only the source picks up the reorder hint.

- Four new unit tests in `source.rs` for the new branch:
  - column in schema + ASC -> Inexact, sort_order_for_reorder set,
    reverse_row_groups not set
  - column in schema + DESC -> Inexact, both set
  - non-Column sort expression (e.g. `a + 1`) -> Unsupported
  - column not in file schema -> Unsupported

Test results: datasource-parquet (116 + 4 passed), physical-plan sorts
(60 passed), core_integration physical_optimizer (454 passed).
clippy --all-targets -D warnings clean. cargo fmt clean.
@zhuqi-lucas zhuqi-lucas force-pushed the feat/rg-reorder-by-statistics branch from 6987aa2 to 45d775a Compare May 15, 2026 04:17
@github-actions github-actions Bot removed physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate labels May 15, 2026
…ORDER BY ... DESC LIMIT` shapes

After routing the stats-based RG reorder through `try_pushdown_sort`
(commit before this), the new `Inexact` branch trips whenever the
requested sort column is in the file schema — not only when the
dynamic filter path could derive sort direction. As a result, more
`ORDER BY col DESC LIMIT N` queries on parquet sources without a
declared natural ordering now emit `reverse_row_groups=true` on the
`DataSourceExec` line.

This is the behavioural change Adrian's review intended: the
optimisation applies to any `ORDER BY` shape that has a column-in-schema
basis for the reorder, not just TopK queries with a dynamic filter.
The plan results are unchanged; only the `reverse_row_groups` decorator
appears in more EXPLAIN outputs.

Regenerated with `cargo test --test sqllogictests -- ... --complete`,
then verified the suites pass without --complete.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate datasource Changes to the datasource crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: global file reorder in shared work queue for TopK optimization Sort pushdown: reorder row groups by statistics within each file

5 participants