What is the problem the feature request solves?
When #2894 was first addressed by #2994, COUNT was included in the set of aggregates safe to run with mixed Spark partial / Comet final execution alongside MIN, MAX, and the bitwise aggregates. The follow-up branch for that work removed COUNT from the safe set after two regressions surfaced. As a result, the TPC-DS coverage gains in #2994 (which were almost entirely driven by COUNT) are not realized in the carve-out PR.
This issue tracks investigating and re-enabling COUNT for mixed Spark partial / Comet final execution.
Known blockers
The two specific regressions that caused COUNT to be excluded:
-
AQE PropagateEmptyRelationAfterAQE. The rule matches BaseAggregateExec only, not CometHashAggregateExec. When the partial runs in Spark and the final runs in Comet, the rule no longer fires for the final stage, which changes results in some queries.
-
Spark 4.0 count-bug decorrelation. The decorrelation rewrite for correlated IN subqueries drops a row in the OR pattern in in-count-bug.sql when the partial/final aggregate stages are split between Spark and Comet.
Suggested approach
- Reproduce both regressions with
COUNT re-added to supportsMixedPartialFinal in aggregates.scala.
- For (1), evaluate whether the right fix is upstream (extend
PropagateEmptyRelationAfterAQE to recognize Comet aggregate exec) or in Comet (e.g., guard mixed-COUNT when AQE is enabled, or insert a wrapper Spark aggregate node).
- For (2), determine whether the row-drop is specific to Spark 4.0 decorrelation interacting with mixed aggregate stages, and whether a targeted guard is preferable to broad fallback.
- Once both are addressed, re-add
override def supportsMixedPartialFinal: Boolean = true to CometCount and regenerate TPC-DS golden files.
Related
What is the problem the feature request solves?
When #2894 was first addressed by #2994,
COUNTwas included in the set of aggregates safe to run with mixed Spark partial / Comet final execution alongsideMIN,MAX, and the bitwise aggregates. The follow-up branch for that work removedCOUNTfrom the safe set after two regressions surfaced. As a result, the TPC-DS coverage gains in #2994 (which were almost entirely driven byCOUNT) are not realized in the carve-out PR.This issue tracks investigating and re-enabling
COUNTfor mixed Spark partial / Comet final execution.Known blockers
The two specific regressions that caused
COUNTto be excluded:AQE
PropagateEmptyRelationAfterAQE. The rule matchesBaseAggregateExeconly, notCometHashAggregateExec. When the partial runs in Spark and the final runs in Comet, the rule no longer fires for the final stage, which changes results in some queries.Spark 4.0 count-bug decorrelation. The decorrelation rewrite for correlated
INsubqueries drops a row in the OR pattern inin-count-bug.sqlwhen the partial/final aggregate stages are split between Spark and Comet.Suggested approach
COUNTre-added tosupportsMixedPartialFinalinaggregates.scala.PropagateEmptyRelationAfterAQEto recognize Comet aggregate exec) or in Comet (e.g., guard mixed-COUNT when AQE is enabled, or insert a wrapper Spark aggregate node).override def supportsMixedPartialFinal: Boolean = truetoCometCountand regenerate TPC-DS golden files.Related