fix: preserve Inexact precision in Statistics#22146
Open
timsaucer wants to merge 2 commits into
Open
Conversation
…skip When `Statistics::with_fetch` encountered an input `Inexact(nr)` with `nr <= skip`, it unconditionally returned `Precision::Exact(0)`, promoting an estimated upper bound into an exact zero. This silently caused `AggregateStatistics` to fold an outer `count(*)` over a sub-tree whose row estimate happened to reach zero into a literal `0` backed by `PlaceholderRowExec`. With apache#21240 leaving uncorrelated scalar subqueries in the filter, `FilterExec` falls back to the default 20% selectivity (the interval analyzer cannot reason about `ScalarSubqueryExpr`) and `LeftAnti` cardinality estimation can produce `Inexact(0)`, which `SortExec`/`with_fetch` then turned into `Exact(0)` -- and `Count::value_from_stats` trusted it. Preserve the exactness of the input estimate so estimated zeros stay inexact and downstream consumers can no longer skip real execution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
Author
|
@neilconway would you mind reviewing? It wasn't caused by #21240 but it was exposed by it. |
This was referenced May 13, 2026
neilconway
approved these changes
May 13, 2026
Contributor
neilconway
left a comment
There was a problem hiding this comment.
Makes sense to me! Nice work on the writeup and the test cases.
asolimando
approved these changes
May 14, 2026
Member
asolimando
left a comment
There was a problem hiding this comment.
LGTM, the promotion to Exact was indeed incorrect in this case, thanks for catching that and proposing a fix!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
No issue reported
Rationale for this change
Statistics::with_fetchunconditionally returnedPrecision::Exact(0)when the input hadnr <= skip, even when the input wasInexact(nr)— promoting an estimated upper bound into an exact zero. The exactness flag then misleads downstream consumers (notablyAggregateStatisticsviaCount::value_from_stats) into trusting a derived "0" and folding the count subtree to a literal.Concrete user-visible symptom reported on TPC-H Q22:
EXPLAINfor the count plan shows the outer count aggregate collapsed toProjectionExec([lit(0)]) -> PlaceholderRowExec.After PR #21240 left uncorrelated scalar subqueries in the filter rather than rewriting them to joins,
FilterExeccan't use interval analysis onScalarSubqueryExpr, falls back to the 20% default selectivity, and produces a smallInexactrow estimate. ALeftAntijoin whose estimated semi-overlap covers the outer estimate then yieldsInexact(0). That zero propagates through grouped aggregates whoseestimate_num_rowsreturns the child stats unchanged whenvalue == 0. The pre-existingwith_fetchbug on a downstreamSortExecfinally promotes it toExact(0), whichAggregateStatisticstrusts.The root cause is the precision promotion in
with_fetch. The PR fixes that; the surrounding plan-shape changes after #21240 just made it reachable.What changes are included in this PR?
Statistics::with_fetch: whennr <= skip, preserve the exactness of the input viacheck_num_rows(Some(0), self.num_rows.is_exact().unwrap())instead of always returningExact(0).datafusion-common: new unit testtest_with_fetch_skip_all_rows_inexactpinning the new behaviour.datafusion-physical-plan: update the existingtest_row_number_statistics_for_global_limitexpectation that encoded the old (incorrect) promotion to expectInexact(0)now.datafusion/sqllogictest/test_files/subquery.slt: SLT regression test reproducing the user-visiblecount(*)symptom over a query that contains a scalar subquery,not exists, and a group-by on a derived column, backed by parquet sources so the data sources report Exact statistics.Are these changes tested?
Yes:
datafusion-commonfor the precision-preserving behaviour.datafusion-physical-plan/limit.rs.subquery.sltthat fails without the fix (count(*)returns0instead of2) and passes with it.Are there any user-facing changes?
Only the bug fix. No public API changes.