Add group join physical optimizer#21995
Conversation
73f4713 to
5fe7219
Compare
|
run benchmark tpch tpcds tpch10 |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing groupjoin-eliminate-extra-hash-build (5fe7219) to 2f2fe8f (merge-base) diff using: tpch10 File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing groupjoin-eliminate-extra-hash-build (5fe7219) to 2f2fe8f (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing groupjoin-eliminate-extra-hash-build (5fe7219) to 2f2fe8f (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch10 — base (merge-base)
tpch10 — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
|
@Dandandan Thanks for running the benchmarks. There are additional equivalencies/optimizations of hashjoin + groupby that can be turned into groupjoin from the paper. I wanted to make this PR just the initial optimization + create the groupjoin rule. If its okay, I will ping you in another draft PR which will contain all optimizations in the paper so you can review + benchmark. I listed the optimizations im talking about at the bottom of this PRs description. Ignore the cost based one I think this is not really currently applicable |
Sounds great! |
|
This looks awesome! I have a question, why physical optimizer rule? It looks simpler to implement a logical optimizer rule instead. |
|
@2010YOUY01 So introduce the groupjoin logically by detecting the opportunity there then edit the hash join exec so that it does the goupby within? |
On second thought, I figured it’s better to keep this in the physical optimizer, please ignore that. If we perform this transformation in the logical optimizer (by detecting Aggregate + equi-join), we remove the flexibility for the physical optimizer to choose among multiple applicable rules (that are only possible in physical optimizer) and search for a globally optimal plan. |
|
sounds good, thanks. Yep, physical rule preferred because there is some complexity on additional groupjoins optimizations possible which im not sure is possible just in logical layer When groupjoin opportunity is found we always do the Memoizing GroupJoin so simply build the hash table on the left side with accumulators embedded, then update them inplace during probe. One big one from the papers is "Eager Right Aggregation" which is just pre-aggregate the probe side before the join, reducing its cardinality from |S| to |distinct(S.join_key)|. Ideal when most right-side groups have a corresponding value (one way to verify this is foreign key constraint which can be added with eager aggregation) |
|
I did some more testing and groupjoin has more opportunities for optimization BUT they require things like more accurate filter cardinality estimates, and cost based join ordering data to apply them safely without regressions. So I don't think the additional optimizations (#22058) can be introduced with any guarantees rn. BUT still really think like the safe groupjoin optimization opportunity because the common pattern which is the simple fact + dimension equi join followed by a groupby gets nice performance benefits by avoiding the extra work. Also, I listed out when its safe to introduce it into plan in DF with guarantees that performance cant regress. Required to introduce basic GroupJoin into plan which will just memoize (NO ADDITIONAL optimization like eager join....):
Required to avoid regression (basic memoizing GroupJoin):
Item 4 is the only real performance guard for basic memoizing. Since we can't easily figure out how selective the join. We can only allow memoizing for Left joins (guaranteed 100% utilization) or Inner joins where NDV suggests high match rate possibly |
Rationale for this change
Some queries combining a join and a group-by on the same key can be executed as a single groupjoin operator. This optimization targets a common analytical pattern — dimension-fact joins where a smaller dimension table is joined with a larger fact table and aggregated by the join key.
This is based on research: Moerkotte & Neumann PVLDB 2011. The paper introduces the groupjoin algebraic equivalence and proves its correctness for both inner and outer joins, provided the join key is a key of the build side.
This PR implements the groupjoin operator and optimizer rule, using the memoizing groupjoin strategy from the paper: a single hash table serves as both the join lookup and the aggregation group table, with probe-side rows updating accumulators in-place. This eliminates the redundant hash table construction and intermediate result materialization that occur when the join and aggregate run as separate operators. This addresses #13243.
locally I saw: TPC-H Q13 (SF10): 299ms → 254ms (~15% faster), not zero regressions (with just groupjoin avoiding materialization so not including additional optimizations below)
What changes are included in this PR?
New physical operator —
GroupJoinExec(physical-plan/src/joins/group_join.rs):GroupValueshash table from the left (build) sideGroupsAccumulators in-place for matching rowsNew physical optimizer rule —
GroupJoinOptimizer(physical-optimizer/src/group_join.rs):AggregateExecaboveHashJoinExec(looking through intermediateProjectionExec)GroupsAccumulatorCombinePartialFinalAggregatein the optimizer pipelineHow can this be extended?
The paper describes three additional strategies and optimizations we did not implement:
Eager Right Aggregation (Strategy 1) — Pre-aggregate the probe side before the join, reducing its cardinality. For Q13, this would reduce the 15M order rows to ~1.5M pre-aggregated groups before joining with 1.5M customers.
Superset GROUP BY (Theorem 3 in the paper) — Handle cases where GROUP BY keys are a superset of the join keys (extra keys from the build side). This would enable queries like Q3 (
GROUP BY l_orderkey, o_orderdate, o_shipprioritywith join ono_orderkey = l_orderkey). Requires the probe side to look up by the join key subset while the hash table is keyed by the full GROUP BY.Cost-model strategy selection (Section 4 of Fent et al.) — Choose between the four strategies at optimization time based on input cardinalities and selectivities, rather than always using Strategy 2.