apache · mbutrovich · May 4, 2026 · May 4, 2026 · May 4, 2026
diff --git a/docs/source/user-guide/latest/compatibility/scans.md b/docs/source/user-guide/latest/compatibility/scans.md
@@ -57,6 +57,15 @@ The following shared limitation may produce incorrect results without falling ba
   written using the Proleptic Gregorian calendar. This may produce incorrect results for dates before
   October 15, 1582.
 
+The following shared limitation raises an error at scan time rather than falling back to Spark:
+
+- Invalid UTF-8 bytes in `STRING` columns. Spark permits arbitrary byte sequences in a `STRING`
+  column (for example from `CAST(X'C1' AS STRING)`), but Comet's native execution path is built on
+  Arrow, whose string type is strictly UTF-8. Reading a Parquet file whose `STRING` column contains
+  non-UTF-8 bytes fails with `Parquet error: encountered non UTF-8 data`. Disable Comet for the
+  query, or cast the column to `BINARY` before persisting, if you need to preserve non-UTF-8 bytes.
+  See [#4121](https://github.com/apache/datafusion-comet/issues/4121).
+
 ## `native_datafusion` Limitations
 
 The `native_datafusion` scan has some additional limitations, mostly related to Parquet metadata. All of these

diff --git a/docs/source/user-guide/latest/compatibility/spark-versions.md b/docs/source/user-guide/latest/compatibility/spark-versions.md
@@ -51,6 +51,17 @@ Spark 4.1 support is experimental and intended for development and testing only.
 in production.
 ```
 
+### Known Limitations
+
+- **`NullType` columns in Parquet files**
+  ([#4199](https://github.com/apache/datafusion-comet/issues/4199)): Spark encodes a `NullType`
+  column as a Parquet `BOOLEAN` physical type annotated with `LogicalType::Unknown`. The Rust
+  `parquet` crate that Comet depends on accepts `Unknown` only when paired with `INT32` and rejects
+  any other physical type with `Parquet error: Cannot annotate Unknown from BOOLEAN for field '<name>'`.
+  Any attempt to read a Parquet file that contains a `NullType` column fails at decode time before
+  Comet's scan runs. Workaround: project the column away, cast it to a concrete type before
+  persisting, or read the file with Comet disabled for that query.
+
 ## Spark 4.2 (Experimental)
 
 Spark 4.2.0-preview4 is provided as experimental support with Java 17 and Scala 2.13.