Skip to content

docs: start Spark 4.1 known-limitations section, seeded with #4199#4202

Merged
mbutrovich merged 2 commits intoapache:mainfrom
andygrove:docs-spark-4.1-known-issues
May 4, 2026
Merged

docs: start Spark 4.1 known-limitations section, seeded with #4199#4202
mbutrovich merged 2 commits intoapache:mainfrom
andygrove:docs-spark-4.1-known-issues

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Starts a Spark 4.1 Known Limitations section so operator-facing compatibility issues on the experimental 4.1 profile have a home. First entry seeds it with #4199.

Rationale for this change

Spark 4.1 support is experimental and we're discovering a handful of limitations as we work through the 4.1 CI failures. Some of them are real user-visible gaps (not just test-side plumbing) and deserve a visible entry in the compatibility guide so users who hit them can self-diagnose.

What changes are included in this PR?

  • `docs/source/user-guide/latest/compatibility/spark-versions.md`: add a "Known Limitations" section under Spark 4.1, mirroring the existing section under Spark 4.0, and seed it with the `NullType` Parquet issue (Spark 4.1 NullType parquet: parquet-rs rejects BOOLEAN + Unknown logical type #4199). The entry describes the on-disk encoding mismatch (`BOOLEAN + LogicalType::Unknown` vs parquet-rs requiring `INT32 + Unknown`), the exact error users will see, and concrete workarounds.

Additional 4.1 limitations will be appended to the section as they are characterised.

How are these changes tested?

Docs-only change; rendered locally to confirm markdown formatting.

andygrove added 2 commits May 4, 2026 07:00
…ntry

Start a Spark-4.1 "Known Limitations" section on the compatibility guide's
spark-versions page, mirroring the existing Spark-4.0 section. First entry
documents apache#4199 (parquet-rs rejects Spark's `BOOLEAN + Unknown` encoding
for `NullType` columns) with the failure mode and a user-facing workaround
so operators hitting the decode-time error have somewhere to land.
)

Comet's native execution is built on Arrow, whose string type is
strictly UTF-8, so non-UTF-8 bytes in a STRING column are rejected at
scan time rather than silently accepted like Spark. Document under
Shared Limitations in the scans compatibility guide.
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove!

@mbutrovich mbutrovich merged commit d12c5c9 into apache:main May 4, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants