docs: start Spark 4.1 known-limitations section, seeded with #4199 by andygrove · Pull Request #4202 · apache/datafusion-comet

andygrove · 2026-05-04T13:01:08Z

Which issue does this PR close?

Starts a Spark 4.1 Known Limitations section so operator-facing compatibility issues on the experimental 4.1 profile have a home. First entry seeds it with #4199.

Rationale for this change

Spark 4.1 support is experimental and we're discovering a handful of limitations as we work through the 4.1 CI failures. Some of them are real user-visible gaps (not just test-side plumbing) and deserve a visible entry in the compatibility guide so users who hit them can self-diagnose.

What changes are included in this PR?

`docs/source/user-guide/latest/compatibility/spark-versions.md`: add a "Known Limitations" section under Spark 4.1, mirroring the existing section under Spark 4.0, and seed it with the `NullType` Parquet issue (Spark 4.1 NullType parquet: parquet-rs rejects BOOLEAN + Unknown logical type #4199). The entry describes the on-disk encoding mismatch (`BOOLEAN + LogicalType::Unknown` vs parquet-rs requiring `INT32 + Unknown`), the exact error users will see, and concrete workarounds.

Additional 4.1 limitations will be appended to the section as they are characterised.

How are these changes tested?

Docs-only change; rendered locally to confirm markdown formatting.

…ntry Start a Spark-4.1 "Known Limitations" section on the compatibility guide's spark-versions page, mirroring the existing Spark-4.0 section. First entry documents apache#4199 (parquet-rs rejects Spark's `BOOLEAN + Unknown` encoding for `NullType` columns) with the failure mode and a user-facing workaround so operators hitting the decode-time error have somewhere to land.

) Comet's native execution is built on Arrow, whose string type is strictly UTF-8, so non-UTF-8 bytes in a STRING column are rejected at scan time rather than silently accepted like Spark. Document under Shared Limitations in the scans compatibility guide.

mbutrovich

Thanks @andygrove!

andygrove added 2 commits May 4, 2026 07:00

mbutrovich approved these changes May 4, 2026

View reviewed changes

mbutrovich merged commit d12c5c9 into apache:main May 4, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: start Spark 4.1 known-limitations section, seeded with #4199#4202

docs: start Spark 4.1 known-limitations section, seeded with #4199#4202
mbutrovich merged 2 commits intoapache:mainfrom
andygrove:docs-spark-4.1-known-issues

andygrove commented May 4, 2026

Uh oh!

mbutrovich left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andygrove commented May 4, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

mbutrovich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants