Skip to content

test: skip flaky StateStoreSuite under Comet and disambiguate JDK matrix names#4226

Merged
mbutrovich merged 1 commit intoapache:mainfrom
andygrove:issue-4221
May 5, 2026
Merged

test: skip flaky StateStoreSuite under Comet and disambiguate JDK matrix names#4226
mbutrovich merged 1 commit intoapache:mainfrom
andygrove:issue-4221

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #4221.

Rationale for this change

The Spark StateStoreSuite.maintenance test is intermittently flaky in CI on the Spark 4.0.2 build, forcing repeated re-runs. Looking at the suite, all 27 tests target StateStore.get/put/commit directly via SparkContext — no DataFrame queries, no SQL execution — so Comet does not exercise any of its code paths and ignoring the suite under Comet does not lose meaningful coverage.

Separately, the spark_sql_test.yml matrix has two rows for Spark 4.0.2 (JDK 17 and JDK 21), but the job display name and fallback-log artifact name do not include the JDK, making the two runs visually indistinguishable in the GitHub Actions UI.

What changes are included in this PR?

  • dev/diffs/4.0.2.diff and dev/diffs/4.1.1.diff: override test() in StateStoreSuite to reroute every test to ignore(...) when ENABLE_COMET=true. The override lives only on StateStoreSuite, so RocksDBStateStoreSuite (which shares the same base class) is unaffected. The suite extends SparkFunSuite rather than SQLTestUtils, so the existing IgnoreCometSuite trait could not be mixed in directly; the same logic is inlined using classic.SparkSession.isCometEnabled.
  • .github/workflows/spark_sql_test.yml: append -jdk${{ matrix.config.java }} to both the matrix job display name and the fallback-log upload-artifact name so the two Spark 4.0.2 rows render distinctly. The companion spark_sql_test_native_iceberg_compat.yml workflow already includes the JDK in its job name.

How are these changes tested?

The diff regenerations were verified by resetting each Spark working tree to its base tag, applying the regenerated diff, and confirming git apply succeeds with no conflicts. CI on this PR will exercise the changes against the full Spark SQL test suite for both JDK 17 and JDK 21 on 4.0.2.

…rix names

The Spark 4.0.2 `maintenance` test in `StateStoreSuite` is flaky in CI
(see issue apache#4221). The whole suite drives `StateStore.get/put/commit`
directly with no DataFrame queries, so Comet does not exercise any of
its code paths. Override `test()` in the suite to reroute every test to
`ignore` when `ENABLE_COMET=true`. RocksDBStateStoreSuite is unaffected.

Also append `-jdk<version>` to the spark-sql-test job display name and
fallback-log artifact name so the two 4.0.2 matrix rows (JDK 17 and
JDK 21) are distinguishable.
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove!

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove

@comphead
Copy link
Copy Markdown
Contributor

comphead commented May 5, 2026

CI fails because of

Error: The operation was canceled

which is super annoying and the nature of it is not clear

@mbutrovich mbutrovich merged commit d5c2bed into apache:main May 5, 2026
194 of 198 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spark SQL maintenance test fails intermittently

3 participants