Skip to content

Re-enable COUNT for mixed Spark partial / Comet final aggregate execution #4242

@andygrove

Description

@andygrove

What is the problem the feature request solves?

When #2894 was first addressed by #2994, COUNT was included in the set of aggregates safe to run with mixed Spark partial / Comet final execution alongside MIN, MAX, and the bitwise aggregates. The follow-up branch for that work removed COUNT from the safe set after two regressions surfaced. As a result, the TPC-DS coverage gains in #2994 (which were almost entirely driven by COUNT) are not realized in the carve-out PR.

This issue tracks investigating and re-enabling COUNT for mixed Spark partial / Comet final execution.

Known blockers

The two specific regressions that caused COUNT to be excluded:

  1. AQE PropagateEmptyRelationAfterAQE. The rule matches BaseAggregateExec only, not CometHashAggregateExec. When the partial runs in Spark and the final runs in Comet, the rule no longer fires for the final stage, which changes results in some queries.

  2. Spark 4.0 count-bug decorrelation. The decorrelation rewrite for correlated IN subqueries drops a row in the OR pattern in in-count-bug.sql when the partial/final aggregate stages are split between Spark and Comet.

Suggested approach

  • Reproduce both regressions with COUNT re-added to supportsMixedPartialFinal in aggregates.scala.
  • For (1), evaluate whether the right fix is upstream (extend PropagateEmptyRelationAfterAQE to recognize Comet aggregate exec) or in Comet (e.g., guard mixed-COUNT when AQE is enabled, or insert a wrapper Spark aggregate node).
  • For (2), determine whether the row-drop is specific to Spark 4.0 decorrelation interacting with mixed aggregate stages, and whether a targeted guard is preferable to broad fallback.
  • Once both are addressed, re-add override def supportsMixedPartialFinal: Boolean = true to CometCount and regenerate TPC-DS golden files.

Related

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions