-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix: The limit_pushdown physical optimization rule removes limits in some cases leading to incorrect results #20048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I'm looking into the CI failures. I guess I forgot to run the tests before making the PR 🙃 From what I can tell so far, it looks like my change improves/fixes things in the tests it breaks |
avantgardnerio
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell, the tests are wrong and this was behaving incorrectly before.
I'd like to see copy & pasted results with and without the optimizer rule:
- if they don't match prior to this PR, we have a issue we may need to hotfix in prior versions
- if they do match after this PR is applied, I approve of this PR
a5f6ad4 to
5ade46f
Compare
|
Found one more bug. I updated the PR description and added a test + fix for that bug. |
| 3 99 82 | ||
| 3 99 79 | ||
| 3 98 79 | ||
| 3 97 96 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous result was wrong. Look at the query directly above this. It is identical except that this query adds OFFSET 3 LIMIT 2. None of the rows in the previous test training are in the rows returned by the previous query, and the new training I added is just rows 4-5 in that query.
What was happening here is the inner OFFSET/LIMITs were being removed by the physical optimizer rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, when I disable the limit_pushdown rule, I get the new results I added here
| 17)------BoundedWindowAggExec: wdw=[lead(b.c1,Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "lead(b.c1,Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": nullable Int64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted] | ||
| 18)--------ProjectionExec: expr=[1 as c1] | ||
| 19)----------PlaceholderRowExec | ||
| 04)------GlobalLimitExec: skip=0, fetch=3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a better plan than before: GlobalLimitExecs are pushed into the union, and before they were not pushed below CoalescePartitionsExec.
avantgardnerio
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find, I think you fixed a long standing bug. @alamb do we need to backport this to any previous releases, given that they were producing incorrect results?
What I think we should do is file a ticket with some example queries where this bug results in incorrect results. Such queries I think will help us to understand the impact, and depending on that we can figure out if we want to backport this change |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @masonh22 and @avantgardnerio -- this makes sense to me.
I also made a small follow on PR to update the tests to use insta
| let after_optimize = | ||
| LimitPushdown::new().optimize(outer_limit, &ConfigOptions::new())?; | ||
| let expected = [ | ||
| "GlobalLimitExec: skip=1, fetch=3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the code change in the PR, the actual code looks like
"SortExec: TopK(fetch=4), expr=[c1@0 ASC], preserve_partitioning=[false]"
" EmptyExec"
Note the offset (aka the skip) was dropped
Which issue does this PR close?
None
Rationale for this change
Bug 1: When pushing down limits, we recurse down the physical plan accumulating limits until we reach a node where we can't push the limit down further. At this point, we insert another limit executor (or push it into the current node, if that node supports it). After this, we continue recursing to try to find more limits to push down. If we do find another, we remove it, but we don't set the
GlobalRequirements::satisfiedfield back to false, meaning we don't always re-insert this limit.Bug 2: When we're pushing down a limit with a skip/offset and no fetch/limit and we run into a node that supports fetch, we set
GlobalRequirements::satisfiedto true. This is wrong: the limit is not satisfied because fetch doesn't support skip/offset. Instead, we should setGlobalRequirements::satisfiedto true if skip/offset is 0.What changes are included in this PR?
This includes a one-line change to the push down limit logic that fixes the issue.
Are these changes tested?
I added a test that replicates the issue and fails without this change.
Are there any user-facing changes?
No