Skip to content

ci: Enable Comet PR test matrix and TPCDS plan-stability for Spark 4.2#4126

Merged
andygrove merged 3 commits intoapache:mainfrom
andygrove:spark-4.2-tests
Apr 30, 2026
Merged

ci: Enable Comet PR test matrix and TPCDS plan-stability for Spark 4.2#4126
andygrove merged 3 commits intoapache:mainfrom
andygrove:spark-4.2-tests

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Apr 28, 2026

Which issue does this PR close?

Part of #4113.

Rationale for this change

#4119 added a build-only spark-4.2 Maven profile targeting Spark 4.2.0-preview4. To start exercising Comet against 4.2 in CI (rather than discovering everything at once when 4.2 GA lands), this PR turns on the existing PR test matrices for Spark 4.2 and adds dedicated TPC-DS plan-stability goldens.

This mirrors the approach previously used to bring Spark 4.1 online before reverting (see commits 622e851e1 and 75e3b3116 on the spark-4.1.1 branch).

What changes are included in this PR?

  • .github/workflows/pr_build_linux.yml: add Spark 4.2, JDK 17 to the linux-test matrix and a comment explaining why 4.1/4.2 are skipped from the lint-java matrix (semanticdb-scalac is not yet published for Scala 2.13.17/2.13.18).
  • .github/workflows/pr_build_macos.yml: add Spark 4.2, JDK 17, Scala 2.13 to the macos-aarch64-test matrix.
  • spark/pom.xml: wire iceberg/jetty test dependencies into the spark-4.2 profile (Iceberg falls back to the 4.0 runtime since 4.2 is not yet published; Jetty pinned at 11.0.26).
  • spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: add isSpark42Plus helper.
  • spark/src/test/scala/org/apache/spark/sql/comet/CometPlanStabilitySuite.scala: route isSpark42Plus to the new approved-plans-{v1_4,v2_7}-spark4_2 directories.
  • dev/regenerate-golden-files.sh: accept --spark-version 4.2 and include 4.2 in the default version list.
  • spark/src/test/resources/tpcds-plan-stability/approved-plans-{v1_4,v2_7}-spark4_2/: regenerated golden files. 22 of the generated files differ from the spark4_0 directory (q2, q5, q33, q49, q54, q56, q60, q66 in v1_4 and q5a, q14a, q49 in v2_7, both native_datafusion and native_iceberg_compat per query); the rest are byte-identical.

This PR does not attempt to fix any 4.2-specific runtime/test failures the new matrix entries surface; those will be tracked and addressed in follow-up PRs as we did for Spark 4.1.

How are these changes tested?

  • Local: built -Pspark-4.2 end-to-end with JDK 17.
  • Local: ran CometTPCDSV1_4_PlanStabilitySuite (194 tests) and CometTPCDSV2_7_PlanStabilitySuite (64 tests) against -Pspark-4.2 with SPARK_GENERATE_GOLDEN_FILES unset; both pass with 0 failures.
  • CI: this PR will exercise the new Linux and macOS matrix entries.

@andygrove andygrove changed the title [WIP] ci: enable PR test matrix and TPCDS plan-stability for Spark 4.2 ci: Enable Comet PR test matrix and TPCDS plan-stability for Spark 4.2 [WIP] Apr 28, 2026
@andygrove andygrove changed the title ci: Enable Comet PR test matrix and TPCDS plan-stability for Spark 4.2 [WIP] ci: Enable Comet PR test matrix and TPCDS plan-stability for Spark 4.2 Apr 29, 2026
@andygrove andygrove requested a review from coderfender April 29, 2026 20:09
func_might_contain,
new ExpressionInfo(classOf[BloomFilterMightContain].getName, "might_contain"),
(children: Seq[Expression]) => BloomFilterMightContain(children.head, children(1)))
if (!isSpark42Plus) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole suite failed with Spark 4.2 - the individual tests have the links to the issue

@andygrove andygrove marked this pull request as ready for review April 29, 2026 21:28
Copy link
Copy Markdown
Contributor

@coderfender coderfender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nit to see why we need isSpark41Plus and isSpark42Plus tags in the same test ?

Comment thread spark/src/test/scala/org/apache/comet/exec/CometExec3_4PlusSuite.scala Outdated
Adds a Spark 4.2, JDK 17 entry to the linux-test matrix in pr_build_linux.yml
and a Spark 4.2, JDK 17, Scala 2.13 entry to the macos-aarch64-test matrix in
pr_build_macos.yml. Also wires up the iceberg/jetty test dependencies for the
spark-4.2 profile (matching the spark-4.1 setup, since iceberg-spark-runtime
4.2 is not yet published) and adds an isSpark42Plus helper.

Spark 4.2 stays out of lint-java because semanticdb-scalac_2.13.18 is not yet
published.
Add isSpark42Plus branch to CometPlanStabilitySuite to route Spark 4.2 to
dedicated approved-plans-{v1_4,v2_7}-spark4_2 directories. Spark 4.0 logic is
unchanged. Also extends dev/regenerate-golden-files.sh to accept --spark-version
4.2.

Generated via SPARK_GENERATE_GOLDEN_FILES=1 against -Pspark-4.2. 22 of the
generated files differ from the spark4_0 directory (q2, q5, q33, q49, q54,
q56, q60, q66 in v1_4 and q5a, q14a, q49 in v2_7, both native_datafusion and
native_iceberg_compat); the rest are byte-identical. The CometTPCDSV1_4
suite (194 tests) and CometTPCDSV2_7 suite (64 tests) both pass against the
new goldens with 0 failures.
Skip tests that fail on Spark 4.2 due to:
- FunctionIdentifier requiring 3-part qualification (might_contain)
- ARITHMETIC_OVERFLOW message change ("overflow" vs "integer overflow")
- Jetty VerifyError in REST catalog test (classpath version mismatch)
@andygrove andygrove merged commit 6cd6cf3 into apache:main Apr 30, 2026
186 of 187 checks passed
@andygrove andygrove deleted the spark-4.2-tests branch April 30, 2026 14:17
@andygrove
Copy link
Copy Markdown
Member Author

Merged. Thanks @coderfender!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants