Skip to content

feat(cubesql): Push Limit-Sort down Union inputs#11125

Open
MazterQyou wants to merge 1 commit into
masterfrom
cubesql/limit-sort-union-push-down
Open

feat(cubesql): Push Limit-Sort down Union inputs#11125
MazterQyou wants to merge 1 commit into
masterfrom
cubesql/limit-sort-union-push-down

Conversation

@MazterQyou

Copy link
Copy Markdown
Member

Check List

  • Tests have been run in packages where changes have been made if available
  • Linter has been run for changed code
  • Tests for the changes have been added if not covered yet
  • Docs have been added / updated if required

Description of Changes Made

This PR adds a new logical plan optimizer, duplicating Limit-Sort expressions above Union down the input plans which helps Limit and Sort reach CubeScans. Related test is included.

Signed-off-by: Alex Qyoun-ae <4062971+MazterQyou@users.noreply.github.com>
@MazterQyou MazterQyou requested a review from a team as a code owner June 19, 2026 11:38
@github-actions github-actions Bot added the rust Pull requests that update Rust code label Jun 19, 2026
@claude

claude Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Claude finished @MazterQyou's task in 1m 25s —— View job


Code Review

Reviewed the new UnionSortLimitPushDown rule.

  • Read changed files and surrounding context
  • Analyze optimizer behavior, edge cases, and tests
  • Post review

Verdict: LGTM with minor observations — 0 high, 0 medium, 3 low.

Detailed findings

Strengths

  • Clear, well-commented rationale at the top of union_sort_limit_push_down.rs (lines 17–28) explaining ordering vs. the existing SortPushDown/LimitPushDown rules, and the rule is correctly slotted in query_engine.rs:152-157 so the duplicated clauses can be re-pushed by a second pass of those rules.
  • Correctness of the propagation semantics is good: Sort preserves the saved skip/fetch (sort runs before limit) at union_sort_limit_push_down.rs:64-78, while Limit clears any saved Sort because that sort would have run after the limit at lines 79-94. The fall-through other => arm conservatively drops the saved context at lines 128-139 — safe under nodes that change row counts (Filter/Aggregate/Join).
  • drop_const_sort_keys (lines 175-193) is a nice touch — without it the duplicated Sort could not be pushed into a CubeScan when a sort key resolves to a constant in one branch (e.g. a UNION-tagged discriminator column). The test test_sort_limit_above_union_drops_const_keys covers this and also verifies the original outer Sort keeps the full key list.
  • Tests cover the four core shapes (sort alone, limit alone, both, constant-key elision, and no-union pass-through).

Low-severity observations

  1. Missed opportunity for transparent unary nodes. The other => arm at lines 128-139 unconditionally drops the saved Sort/Limit. For row-preserving, ordering-preserving nodes (e.g. simple Projection with no expressions that reorder rows, or SubqueryAlias), the context could be forwarded so Sort -> Projection -> Union benefits too. Today such a plan won't get the push-down even though it's semantically equivalent. Not a correctness issue, just unrealized benefit. (Fix this →)

  2. const_output_columns only inspects the topmost Projection of the recursed input. If the input is Sort -> Projection -> ... (which is exactly what this rule may emit when recursing into Union inputs that already had a sort), the const detection sees the Sort, not the Projection, and returns an empty set. In practice this isn't wrong (we just keep extra sort keys), but it limits the optimization to inputs whose top node is already a Projection. Worth a one-liner comment, or peek through Sort/Limit wrappers.

  3. No test for nested Union. A Sort -> Union(Union(...), ...) case would document the intended behavior (the rule duplicates into the outer union's inputs in a single pass; the inner one would only benefit on a subsequent optimizer iteration if any). Cheap to add and prevents regression if the recursion changes later.

Nits

  • union_sort_limit_push_down takes 6 args including three Options — packaging sort_expr/skip/fetch into a small Context struct would read a bit cleaner, but readable as is.
  • In tests, assert_optimized_plan_eq uses string equality on {:?} output — consistent with sibling optimizer tests, but brittle to upstream DataFusion debug-format changes. No action needed.

Security / Performance

  • No security concerns: pure logical plan rewrite, no I/O or user input.
  • Performance: rule walks the plan once; recursion is bounded by plan depth. Safe.
· Branch: `cubesql/limit-sort-union-push-down`

@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.15385% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.19%. Comparing base (a67fa3c) to head (7cc8099).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...engine/df/optimizers/union_sort_limit_push_down.rs 96.10% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11125      +/-   ##
==========================================
+ Coverage   79.14%   79.19%   +0.04%     
==========================================
  Files         473      474       +1     
  Lines       96411    96671     +260     
  Branches     3524     3524              
==========================================
+ Hits        76309    76559     +250     
- Misses      19588    19598      +10     
  Partials      514      514              
Flag Coverage Δ
cube-backend 58.75% <ø> (ø)
cubesql 83.67% <96.15%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant