[PoC] perf: optimize group-only group-by case for primitive cases (clickbench q4)#22145
[PoC] perf: optimize group-only group-by case for primitive cases (clickbench q4)#22145waynexia wants to merge 6 commits into
Conversation
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
|
run benchmarks |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
2 similar comments
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
|
run benchmarks |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
2 similar comments
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
run benchmarks |
|
Hi @waynexia, thanks for the request (#22145 (comment)). Only whitelisted users can trigger benchmarks. Allowed users: Dandandan, Fokko, Jefffrey, Omega359, adriangb, alamb, asubiotto, brunal, buraksenn, cetra3, codephage2020, coderfender, comphead, erenavsarogullari, etseidl, friendlymatthew, gabotechs, geoffreyclaude, grtlr, haohuaijin, jonathanc-n, kevinjqliu, klion26, kosiew, kumarUjjawal, kunalsinghdadhwal, liamzwbao, mbutrovich, mkleen, mzabaluev, neilconway, rluvaton, sdf-jkl, timsaucer, xudong963, zhuqi-lucas. File an issue against this benchmark runner |
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing q4 (65980f0) to 7f2f78d (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing q4 (65980f0) to 7f2f78d (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing q4 (65980f0) to 7f2f78d (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
Hmmm, the result does not seem good 🤔 Let me dig deeper |
Seems like the query is faster though (and memory is ~halved) so seems like a great optimization - I expect it for higher cardinality |
|
run benchmark clickbench_partitioned clickbench_extended |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing q4 (65980f0) to 7f2f78d (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing q4 (65980f0) to 7f2f78d (merge-base) diff using: clickbench_extended File an issue against this benchmark runner |
| /// is obvious in high cardinality group by situation. | ||
| /// More details can see: | ||
| /// <https://github.com/apache/datafusion/issues/15961> | ||
| map: HashTable<(usize, u64)>, |
There was a problem hiding this comment.
I found HashTable<(usize, T::Native)> also to be a bit faster than this current approach (and also saving memory). We can build the Vec<T::Native> when emitting data.
Could be part of some future PR.
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_extended — base (merge-base)
clickbench_extended — branch
File an issue against this benchmark runner |
Some queries regress a lot... |
Which issue does this PR close?
Rationale for this change
This is an (agent made) PoC branch for clickbench q4
SELECT COUNT(DISTINCT UserID) FROM hits;. In my test environment it's about 25% faster thanmain.The basic idea is to skip maintaining group-id when there is no requirement of it, e.g., for group-by cases without any accumulator.
Other findings on this experiment:
SELECT COUNT(DISTINCT SearchPhrase) FROM hits;where the key is string type, the group calculation is not the critical pathWhat changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?