Skip to content

Conversation

@masonh22
Copy link
Contributor

@masonh22 masonh22 commented Jan 28, 2026

Which issue does this PR close?

None

Rationale for this change

Bug 1: When pushing down limits, we recurse down the physical plan accumulating limits until we reach a node where we can't push the limit down further. At this point, we insert another limit executor (or push it into the current node, if that node supports it). After this, we continue recursing to try to find more limits to push down. If we do find another, we remove it, but we don't set the GlobalRequirements::satisfied field back to false, meaning we don't always re-insert this limit.

Bug 2: When we're pushing down a limit with a skip/offset and no fetch/limit and we run into a node that supports fetch, we set GlobalRequirements::satisfied to true. This is wrong: the limit is not satisfied because fetch doesn't support skip/offset. Instead, we should set GlobalRequirements::satisfied to true if skip/offset is 0.

What changes are included in this PR?

This includes a one-line change to the push down limit logic that fixes the issue.

Are these changes tested?

I added a test that replicates the issue and fails without this change.

Are there any user-facing changes?

No

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Jan 28, 2026
@masonh22
Copy link
Contributor Author

I'm looking into the CI failures. I guess I forgot to run the tests before making the PR 🙃

From what I can tell so far, it looks like my change improves/fixes things in the tests it breaks

Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, the tests are wrong and this was behaving incorrectly before.

I'd like to see copy & pasted results with and without the optimizer rule:

  1. if they don't match prior to this PR, we have a issue we may need to hotfix in prior versions
  2. if they do match after this PR is applied, I approve of this PR

@masonh22 masonh22 force-pushed the fix-limit-pushdown-fetch branch from a5f6ad4 to 5ade46f Compare January 28, 2026 17:05
@masonh22
Copy link
Contributor Author

Found one more bug. I updated the PR description and added a test + fix for that bug.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 28, 2026
Comment on lines -709 to +710
3 99 82
3 99 79
3 98 79
3 97 96
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous result was wrong. Look at the query directly above this. It is identical except that this query adds OFFSET 3 LIMIT 2. None of the rows in the previous test training are in the rows returned by the previous query, and the new training I added is just rows 4-5 in that query.

What was happening here is the inner OFFSET/LIMITs were being removed by the physical optimizer rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, when I disable the limit_pushdown rule, I get the new results I added here

17)------BoundedWindowAggExec: wdw=[lead(b.c1,Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "lead(b.c1,Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": nullable Int64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted]
18)--------ProjectionExec: expr=[1 as c1]
19)----------PlaceholderRowExec
04)------GlobalLimitExec: skip=0, fetch=3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a better plan than before: GlobalLimitExecs are pushed into the union, and before they were not pushed below CoalescePartitionsExec.

@masonh22 masonh22 changed the title fix: The limit_pushdown physical optimization rule removes nested limits in some cases fix: The limit_pushdown physical optimization rule removes limits in some cases leading to incorrect results Jan 28, 2026
@avantgardnerio avantgardnerio self-requested a review January 28, 2026 18:10
Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find, I think you fixed a long standing bug. @alamb do we need to backport this to any previous releases, given that they were producing incorrect results?

@alamb
Copy link
Contributor

alamb commented Jan 29, 2026

Nice find, I think you fixed a long standing bug. @alamb do we need to backport this to any previous releases, given that they were producing incorrect results?

What I think we should do is file a ticket with some example queries where this bug results in incorrect results. Such queries I think will help us to understand the impact, and depending on that we can figure out if we want to backport this change

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @masonh22 and @avantgardnerio -- this makes sense to me.

I also made a small follow on PR to update the tests to use insta

let after_optimize =
LimitPushdown::new().optimize(outer_limit, &ConfigOptions::new())?;
let expected = [
"GlobalLimitExec: skip=1, fetch=3",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the code change in the PR, the actual code looks like

      "SortExec: TopK(fetch=4), expr=[c1@0 ASC], preserve_partitioning=[false]"
      "  EmptyExec"

Note the offset (aka the skip) was dropped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants