From f82bb70fe520dae89046d98f827e83d61bc57196 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Wed, 6 May 2026 11:09:04 -0600 Subject: [PATCH 01/15] docs: refresh Spark version support, OS coverage, and version-pinned examples Update user-facing docs ahead of the 0.16 release: - Promote Spark 4.1 from experimental to fully supported across the installation page, compatibility guide, and Gluten comparison; keep 4.2 listed as experimental. Spark 4.1.1 now runs in CI under both JDK 17 and 21. - Restructure the Supported Operating Systems section in the installation guide to make clear that published Maven jars cover Linux only and that macOS users must build from source. Drop the Intel macOS claim since Apple Silicon is the only macOS variant exercised in CI. - Flip seven recently-added math expressions to supported in the contributor-guide tracking page: acosh, asinh, atanh, cbrt, degrees, pi, radians. - Move spark.comet.scan.enabled to the testing category and rewrite its description to reflect that it is intended for Comet's own test suites only. Remove the corresponding mention from the data sources page. - Replace hard-coded Comet versions in the Iceberg and Kubernetes guides with the \$COMET_VERSION placeholder used elsewhere, and drop the redundant spark.sql.extensions=...CometSparkSessionExtensions conf from the Iceberg examples (CometPlugin registers it automatically). --- .../main/scala/org/apache/comet/CometConf.scala | 8 +++----- docs/source/about/gluten_comparison.md | 4 ++-- .../contributor-guide/spark_expressions_support.md | 14 +++++++------- .../latest/compatibility/spark-versions.md | 9 ++------- docs/source/user-guide/latest/datasources.md | 7 +++---- docs/source/user-guide/latest/iceberg.md | 8 +++----- docs/source/user-guide/latest/installation.md | 14 ++++++++++---- docs/source/user-guide/latest/kubernetes.md | 6 +++--- 8 files changed, 33 insertions(+), 37 deletions(-) diff --git a/common/src/main/scala/org/apache/comet/CometConf.scala b/common/src/main/scala/org/apache/comet/CometConf.scala index d3f51dfbe2..e7fd17933b 100644 --- a/common/src/main/scala/org/apache/comet/CometConf.scala +++ b/common/src/main/scala/org/apache/comet/CometConf.scala @@ -94,12 +94,10 @@ object CometConf extends ShimCometConf { .createWithEnvVarOrDefault("ENABLE_COMET", true) val COMET_NATIVE_SCAN_ENABLED: ConfigEntry[Boolean] = conf("spark.comet.scan.enabled") - .category(CATEGORY_SCAN) + .category(CATEGORY_TESTING) .doc( - "Whether to enable native scans. When this is turned on, Spark will use Comet to " + - "read supported data sources (currently only Parquet is supported natively). Note " + - "that to enable native vectorized execution, both this config and " + - "`spark.comet.exec.enabled` need to be enabled.") + "Whether to enable native scans. Intended for use in Comet's own test suites to " + + "selectively disable native scans; not intended for production use.") .booleanConf .createWithDefault(true) diff --git a/docs/source/about/gluten_comparison.md b/docs/source/about/gluten_comparison.md index 86dcad56a0..c7807eabc7 100644 --- a/docs/source/about/gluten_comparison.md +++ b/docs/source/about/gluten_comparison.md @@ -62,8 +62,8 @@ code, then we suggest benchmarking with both solutions and choosing the fastest Both projects target a similar set of Spark releases. -Comet supports Spark 3.4, 3.5, and 4.0 in production builds, with experimental builds also published for -Spark 4.1 and the Spark 4.2 preview. See the [Spark version compatibility guide] for the exact patch versions and +Comet supports Spark 3.4, 3.5, 4.0, and 4.1 in production builds, with an experimental build also published for +the Spark 4.2 preview. See the [Spark version compatibility guide] for the exact patch versions and JDK/Scala combinations. [Spark version compatibility guide]: /user-guide/latest/compatibility/spark-versions.md diff --git a/docs/source/contributor-guide/spark_expressions_support.md b/docs/source/contributor-guide/spark_expressions_support.md index 8d2918d473..0a39cd4ed0 100644 --- a/docs/source/contributor-guide/spark_expressions_support.md +++ b/docs/source/contributor-guide/spark_expressions_support.md @@ -356,15 +356,15 @@ - [x] `/` - [x] abs - [x] acos -- [ ] acosh +- [x] acosh - [x] asin -- [ ] asinh +- [x] asinh - [x] atan - [x] atan2 -- [ ] atanh +- [x] atanh - [x] bin - [ ] bround -- [ ] cbrt +- [x] cbrt - [x] ceil - [x] ceiling - [ ] conv @@ -372,7 +372,7 @@ - [x] cosh - [x] cot - [ ] csc -- [ ] degrees +- [x] degrees - [ ] div - [ ] e - [x] exp @@ -390,12 +390,12 @@ - [x] log2 - [x] mod - [x] negative -- [ ] pi +- [x] pi - [ ] pmod - [x] positive - [x] pow - [x] power -- [ ] radians +- [x] radians - [x] rand - [x] randn - [ ] random diff --git a/docs/source/user-guide/latest/compatibility/spark-versions.md b/docs/source/user-guide/latest/compatibility/spark-versions.md index 115b1595be..5c8225ae5c 100644 --- a/docs/source/user-guide/latest/compatibility/spark-versions.md +++ b/docs/source/user-guide/latest/compatibility/spark-versions.md @@ -42,14 +42,9 @@ Spark 4.0.2 is supported with Java 17 and Scala 2.13. [#4051](https://github.com/apache/datafusion-comet/issues/4051)): Spark 4.0 introduced collation support. Non-default collated strings are not yet supported by Comet and will fall back to Spark. -## Spark 4.1 (Experimental) +## Spark 4.1 -Spark 4.1.1 is provided as experimental support with Java 17 and Scala 2.13. - -```{warning} -Spark 4.1 support is experimental and intended for development and testing only. It should not be used -in production. -``` +Spark 4.1.1 is supported with Java 17/21 and Scala 2.13. ### Known Limitations diff --git a/docs/source/user-guide/latest/datasources.md b/docs/source/user-guide/latest/datasources.md index 572beb1e02..32fa09e807 100644 --- a/docs/source/user-guide/latest/datasources.md +++ b/docs/source/user-guide/latest/datasources.md @@ -23,10 +23,9 @@ ### Parquet -When `spark.comet.scan.enabled` is enabled, Parquet scans will be performed natively by Comet if all data types -in the schema are supported. When this option is not enabled, the scan will fall back to Spark. In this case, -enabling `spark.comet.convert.parquet.enabled` will immediately convert the data into Arrow format, allowing native -execution to happen after that, but the process may not be efficient. +Parquet scans are performed natively by Comet if all data types in the schema are supported. When the scan +falls back to Spark, enabling `spark.comet.convert.parquet.enabled` will immediately convert the data into +Arrow format, allowing native execution to happen after that, but the process may not be efficient. ### Apache Iceberg diff --git a/docs/source/user-guide/latest/iceberg.md b/docs/source/user-guide/latest/iceberg.md index 24a4bda057..0339d4040e 100644 --- a/docs/source/user-guide/latest/iceberg.md +++ b/docs/source/user-guide/latest/iceberg.md @@ -25,13 +25,13 @@ Comet's native Iceberg reader relies on reflection to extract `FileScanTask`s fr then serialized to Comet's native execution engine (see [PR #2528](https://github.com/apache/datafusion-comet/pull/2528)). -The example below uses Spark's package downloader to retrieve Comet 0.14.0 and Iceberg +The example below uses Spark's package downloader to retrieve Comet $COMET_VERSION and Iceberg 1.8.1, but Comet has been tested with Iceberg 1.5, 1.7, 1.8, 1.9, and 1.10. The native Iceberg reader is enabled by default. To disable it, set `spark.comet.scan.icebergNative.enabled=false`. ```shell $SPARK_HOME/bin/spark-shell \ - --packages org.apache.datafusion:comet-spark-spark3.5_2.12:0.14.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \ + --packages org.apache.datafusion:comet-spark-spark3.5_2.12:$COMET_VERSION,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \ --repositories https://repo1.maven.org/maven2/ \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog \ @@ -39,7 +39,6 @@ $SPARK_HOME/bin/spark-shell \ --conf spark.sql.catalog.spark_catalog.warehouse=/tmp/warehouse \ --conf spark.plugins=org.apache.spark.CometPlugin \ --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \ - --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \ --conf spark.comet.explainFallback.enabled=true \ --conf spark.memory.offHeap.enabled=true \ --conf spark.memory.offHeap.size=2g @@ -106,7 +105,7 @@ configure Spark to use a REST catalog with Comet's native Iceberg scan: ```shell $SPARK_HOME/bin/spark-shell \ - --packages org.apache.datafusion:comet-spark-spark3.5_2.12:0.14.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \ + --packages org.apache.datafusion:comet-spark-spark3.5_2.12:$COMET_VERSION,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \ --repositories https://repo1.maven.org/maven2/ \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ --conf spark.sql.catalog.rest_cat=org.apache.iceberg.spark.SparkCatalog \ @@ -115,7 +114,6 @@ $SPARK_HOME/bin/spark-shell \ --conf spark.sql.catalog.rest_cat.warehouse=/tmp/warehouse \ --conf spark.plugins=org.apache.spark.CometPlugin \ --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \ - --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \ --conf spark.comet.explainFallback.enabled=true \ --conf spark.memory.offHeap.enabled=true \ --conf spark.memory.offHeap.size=2g diff --git a/docs/source/user-guide/latest/installation.md b/docs/source/user-guide/latest/installation.md index 71f5a3d0ce..2b5483136e 100644 --- a/docs/source/user-guide/latest/installation.md +++ b/docs/source/user-guide/latest/installation.md @@ -25,8 +25,14 @@ Make sure the following requirements are met and software installed on your mach ### Supported Operating Systems -- Linux -- Apple macOS (Intel and Apple Silicon) +The published Comet jar files in Maven Central bundle native libraries for Linux only (amd64 and arm64). macOS +users must [build from source](source.md). + +| Operating System | Published Maven Jars | Build from Source | +| ----------------------------- | -------------------- | ----------------- | +| Linux (amd64) | Yes | Yes | +| Linux (arm64) | Yes | Yes | +| Apple macOS (Apple Silicon) | No | Yes | ### Supported Spark Versions @@ -44,6 +50,7 @@ Other versions may work well enough for development and evaluation purposes. | 3.4.3 | 11/17 | 2.12/2.13 | Yes | Yes | | 3.5.8 | 11/17 | 2.12/2.13 | Yes | Yes | | 4.0.2 | 17/21 | 2.13 | Yes | Yes | +| 4.1.1 | 17/21 | 2.13 | Yes | Yes | Note that we do not test the full matrix of supported Java and Scala versions in CI for every Spark version. @@ -52,7 +59,6 @@ use only and should not be used in production yet. | Spark Version | Java Version | Scala Version | Comet Tests in CI | Spark SQL Tests in CI | | -------------- | ------------ | ------------- | ----------------- | --------------------- | -| 4.1.1 | 17 | 2.13 | Yes | No | | 4.2.0-preview4 | 17 | 2.13 | No | No | Note that Comet may not fully work with proprietary forks of Apache Spark such as the Spark versions offered by @@ -85,7 +91,7 @@ Here are the direct links for downloading the Comet $COMET_VERSION jar file. - [Comet plugin for Spark 3.5 / Scala 2.12](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.5_2.12/$COMET_VERSION/comet-spark-spark3.5_2.12-$COMET_VERSION.jar) - [Comet plugin for Spark 3.5 / Scala 2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.5_2.13/$COMET_VERSION/comet-spark-spark3.5_2.13-$COMET_VERSION.jar) - [Comet plugin for Spark 4.0 / Scala 2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark4.0_2.13/$COMET_VERSION/comet-spark-spark4.0_2.13-$COMET_VERSION.jar) -- [Comet plugin for Spark 4.1 / Scala 2.13 (Experimental)](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark4.1_2.13/$COMET_VERSION/comet-spark-spark4.1_2.13-$COMET_VERSION.jar) +- [Comet plugin for Spark 4.1 / Scala 2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark4.1_2.13/$COMET_VERSION/comet-spark-spark4.1_2.13-$COMET_VERSION.jar) - [Comet plugin for Spark 4.2 / Scala 2.13 (Experimental)](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark4.2_2.13/$COMET_VERSION/comet-spark-spark4.2_2.13-$COMET_VERSION.jar) diff --git a/docs/source/user-guide/latest/kubernetes.md b/docs/source/user-guide/latest/kubernetes.md index 2fb037d630..95c5f008a8 100644 --- a/docs/source/user-guide/latest/kubernetes.md +++ b/docs/source/user-guide/latest/kubernetes.md @@ -69,13 +69,13 @@ metadata: spec: type: Scala mode: cluster - image: apache/datafusion-comet:0.7.0-spark3.5.5-scala2.12-java11 + image: apache/datafusion-comet:$COMET_VERSION-spark3.5.5-scala2.12-java11 imagePullPolicy: IfNotPresent mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.5.jar sparkConf: - "spark.executor.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.7.0.jar" - "spark.driver.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.7.0.jar" + "spark.executor.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-$COMET_VERSION.jar" + "spark.driver.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-$COMET_VERSION.jar" "spark.plugins": "org.apache.spark.CometPlugin" "spark.comet.enabled": "true" "spark.comet.exec.enabled": "true" From af2d731d6f9d14e90d84c378186cfeb3cb41d40e Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Wed, 6 May 2026 11:19:55 -0600 Subject: [PATCH 02/15] build: add spark-4.1 jar to release build script --- dev/release/build-release-comet.sh | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/dev/release/build-release-comet.sh b/dev/release/build-release-comet.sh index 91d02885af..bb38f51f34 100755 --- a/dev/release/build-release-comet.sh +++ b/dev/release/build-release-comet.sh @@ -202,7 +202,10 @@ LOCAL_REPO=$(mktemp -d /tmp/comet-staging-repo-XXXXX) ./mvnw "-Dmaven.repo.local=${LOCAL_REPO}" -P spark-3.4 -P scala-2.13 -DskipTests install ./mvnw "-Dmaven.repo.local=${LOCAL_REPO}" -P spark-3.5 -P scala-2.12 -DskipTests install ./mvnw "-Dmaven.repo.local=${LOCAL_REPO}" -P spark-3.5 -P scala-2.13 -DskipTests install -./mvnw "-Dmaven.repo.local=${LOCAL_REPO}" -P spark-4.0 -P scala-2.13 -DskipTests install +# The spark-4.x profiles pin their own Scala 2.13.x patch versions to match the +# corresponding Spark release, so the scala-2.13 profile is not used here. +./mvnw "-Dmaven.repo.local=${LOCAL_REPO}" -P spark-4.0 -DskipTests install +./mvnw "-Dmaven.repo.local=${LOCAL_REPO}" -P spark-4.1 -DskipTests install echo "Installed to local repo: ${LOCAL_REPO}" From 571baf3d1bb67b02cee367ed2ee40125a89d731b Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Wed, 6 May 2026 11:21:53 -0600 Subject: [PATCH 03/15] docs: reflect AQE Dynamic Partition Pruning support for native Parquet scans PR #4011 added non-AQE DPP and PR #4112 added AQE DPP with broadcast reuse for native Parquet scans, but the docs still claimed AQE DPP was unsupported. Update the scans compatibility page, the contributor-guide roadmap, and the Iceberg guide accordingly. AQE DPP for Iceberg native scans remains future work, tracked at #3510. --- docs/source/contributor-guide/roadmap.md | 10 ++++++---- docs/source/user-guide/latest/compatibility/scans.md | 1 - docs/source/user-guide/latest/iceberg.md | 2 ++ 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/docs/source/contributor-guide/roadmap.md b/docs/source/contributor-guide/roadmap.md index 8d293c0686..bae4ddba58 100644 --- a/docs/source/contributor-guide/roadmap.md +++ b/docs/source/contributor-guide/roadmap.md @@ -43,14 +43,16 @@ significant family of Spark expressions in one effort. ## Dynamic Partition Pruning -Both Iceberg table scans and Parquet V1 native scans (`CometNativeScanExec`) support non-AQE Dynamic Partition Pruning -(DPP) filters generated by Spark's `PlanDynamicPruningFilters` optimizer rule ([#3349], [#3511]). However, Spark's -`PlanAdaptiveDynamicPruningFilters` optimizer rule runs after Comet's rules, so DPP with Adaptive Query Execution -requires a redesign of Comet's plan translation. This effort can be tracked at [#3510]. +Native Parquet scans (`CometNativeScanExec`) support Dynamic Partition Pruning (DPP) both with and without +Adaptive Query Execution. Non-AQE DPP landed in [#4011] and AQE DPP with broadcast reuse landed in [#4112]. +Iceberg native scans currently support non-AQE DPP only ([#3349], [#3511]); extending broadcast reuse to AQE +DPP for Iceberg is tracked at [#3510]. [#3349]: https://github.com/apache/datafusion-comet/pull/3349 [#3510]: https://github.com/apache/datafusion-comet/issues/3510 [#3511]: https://github.com/apache/datafusion-comet/pull/3511 +[#4011]: https://github.com/apache/datafusion-comet/pull/4011 +[#4112]: https://github.com/apache/datafusion-comet/pull/4112 ## TPC-H and TPC-DS Performance diff --git a/docs/source/user-guide/latest/compatibility/scans.md b/docs/source/user-guide/latest/compatibility/scans.md index d68c59d562..aeb11cbd5a 100644 --- a/docs/source/user-guide/latest/compatibility/scans.md +++ b/docs/source/user-guide/latest/compatibility/scans.md @@ -48,7 +48,6 @@ The following features are not supported by either scan implementation, and Come - Spark's Datasource V2 API. When `spark.sql.sources.useV1SourceList` does not include `parquet`, Spark uses the V2 API for Parquet scans. The DataFusion-based implementations only support the V1 API. - Spark metadata columns (e.g., `_metadata.file_path`) -- No support for AQE Dynamic Partition Pruning (DPP). Non-AQE DPP is supported. The following shared limitation may produce incorrect results without falling back to Spark: diff --git a/docs/source/user-guide/latest/iceberg.md b/docs/source/user-guide/latest/iceberg.md index 0339d4040e..e5c0c6473e 100644 --- a/docs/source/user-guide/latest/iceberg.md +++ b/docs/source/user-guide/latest/iceberg.md @@ -139,6 +139,8 @@ The following scenarios will fall back to Spark's native Iceberg reader: - Scans with residual filters using `truncate`, `bucket`, `year`, `month`, `day`, or `hour` transform functions (partition pruning still works, but row-level filtering of these transforms falls back) +- Dynamic Partition Pruning under Adaptive Query Execution (non-AQE DPP is supported); + see [#3510](https://github.com/apache/datafusion-comet/issues/3510) ### Task input metrics From 9d378bcfc2f5350bea14640dc5a775b630d2873e Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Wed, 6 May 2026 11:29:52 -0600 Subject: [PATCH 04/15] docs: correct ShimSparkErrorConverter directory list in error-propagation guide --- docs/source/contributor-guide/sql_error_propagation.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/source/contributor-guide/sql_error_propagation.md b/docs/source/contributor-guide/sql_error_propagation.md index 7becfabe75..a27408510d 100644 --- a/docs/source/contributor-guide/sql_error_propagation.md +++ b/docs/source/contributor-guide/sql_error_propagation.md @@ -398,8 +398,9 @@ def convertToSparkException(e: CometQueryExecutionException): Throwable = { ### `ShimSparkErrorConverter` calls the real Spark API -Because Spark's `QueryExecutionErrors` API changes between Spark versions (3.4, 3.5, 4.0), -there is a separate implementation per version (in `spark-3.4/`, `spark-3.5/`, `spark-4.0/`). +Because Spark's `QueryExecutionErrors` API changes between Spark versions (3.4, 3.5, and the 4.x line), +there is a separate implementation per branch (in `spark-3.4/`, `spark-3.5/`, and `spark-4.x/`, which is +shared by Spark 4.0, 4.1, and 4.2). ![Shim pattern for per-version Spark API bridging](./shim_pattern.svg) From d7b1b6d3c62cc7f2038674d9763be1531891fd30 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Wed, 6 May 2026 19:25:22 -0600 Subject: [PATCH 05/15] fix: apply spotless formatting to CometConf.scala --- common/src/main/scala/org/apache/comet/CometConf.scala | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/common/src/main/scala/org/apache/comet/CometConf.scala b/common/src/main/scala/org/apache/comet/CometConf.scala index e7fd17933b..9b376837f7 100644 --- a/common/src/main/scala/org/apache/comet/CometConf.scala +++ b/common/src/main/scala/org/apache/comet/CometConf.scala @@ -95,9 +95,8 @@ object CometConf extends ShimCometConf { val COMET_NATIVE_SCAN_ENABLED: ConfigEntry[Boolean] = conf("spark.comet.scan.enabled") .category(CATEGORY_TESTING) - .doc( - "Whether to enable native scans. Intended for use in Comet's own test suites to " + - "selectively disable native scans; not intended for production use.") + .doc("Whether to enable native scans. Intended for use in Comet's own test suites to " + + "selectively disable native scans; not intended for production use.") .booleanConf .createWithDefault(true) From 9d8e135df8ed33323ffdde3176a165e3abd7e62e Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Wed, 6 May 2026 19:29:04 -0600 Subject: [PATCH 06/15] fix: apply prettier formatting to docs --- docs/source/_static/theme_overrides.css | 27 +- docs/source/_templates/docs-sidebar.html | 4 +- docs/source/_templates/layout.html | 14 +- .../benchmark-results/comet-0.15.0-tpcds.json | 414 +++++------------- .../comet-0.15.0-tpch-hashjoin.json | 90 +--- .../benchmark-results/comet-0.15.0-tpch.json | 90 +--- .../benchmark-results/spark-3.5.8-tpcds.json | 414 +++++------------- .../benchmark-results/spark-3.5.8-tpch.json | 90 +--- docs/source/contributor-guide/debugging.md | 1 + .../contributor-guide/native_shuffle.md | 6 + docs/source/user-guide/latest/installation.md | 10 +- 11 files changed, 313 insertions(+), 847 deletions(-) diff --git a/docs/source/_static/theme_overrides.css b/docs/source/_static/theme_overrides.css index dd5b374446..b7ae6d8d92 100644 --- a/docs/source/_static/theme_overrides.css +++ b/docs/source/_static/theme_overrides.css @@ -17,7 +17,6 @@ * under the License. */ - /* Customizing with theme CSS variables */ :root { @@ -42,7 +41,9 @@ } /* --- remove the right (secondary) sidebar entirely --- */ -.bd-sidebar-secondary { display: none !important; } +.bd-sidebar-secondary { + display: none !important; +} /* Some versions still reserve the grid column for it — collapse it */ .bd-main { @@ -62,11 +63,11 @@ } /* --- let the center content use all remaining width --- */ -.bd-content, .bd-article-container { - max-width: none !important; /* remove internal cap */ +.bd-content, +.bd-article-container { + max-width: none !important; /* remove internal cap */ } - code { color: rgb(215, 70, 51); } @@ -92,30 +93,28 @@ add ":class: table-striped" */ background-color: rgba(0, 0, 0, 0.05); } - /* Limit the max height of the sidebar navigation section. Because in our customized template, there is more content above the navigation, i.e. larger logo: if we don't decrease the max-height, it will overlap with the footer. Details: 8rem for search box etc*/ -@media (min-width:720px) { - @supports (position:-webkit-sticky) or (position:sticky) { +@media (min-width: 720px) { + @supports (position: -webkit-sticky) or (position: sticky) { .bd-links { - max-height: calc(100vh - 8rem) + max-height: calc(100vh - 8rem); } } } - /* Fix table text wrapping in RTD theme, * see https://rackerlabs.github.io/docs-rackspace/tools/rtd-tables.html */ @media screen { - table.docutils td { - /* !important prevents the common CSS stylesheets from overriding + table.docutils td { + /* !important prevents the common CSS stylesheets from overriding this as on RTD they are loaded after this stylesheet */ - white-space: normal !important; - } + white-space: normal !important; + } } diff --git a/docs/source/_templates/docs-sidebar.html b/docs/source/_templates/docs-sidebar.html index 26e859eadc..e632c8f4ce 100644 --- a/docs/source/_templates/docs-sidebar.html +++ b/docs/source/_templates/docs-sidebar.html @@ -19,7 +19,7 @@ - diff --git a/docs/source/_templates/layout.html b/docs/source/_templates/layout.html index f90a9e0b58..cce68aa98c 100644 --- a/docs/source/_templates/layout.html +++ b/docs/source/_templates/layout.html @@ -27,13 +27,17 @@
{% for footer_item in theme_footer_items %} - + {% endfor %}
diff --git a/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpcds.json b/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpcds.json index f84bbfce41..65a6e97f60 100644 --- a/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpcds.json +++ b/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpcds.json @@ -13,313 +13,107 @@ "scale_factor": "1000", "spark_profile": "3.5_2.12" }, - "1": [ - 7.220000000000001 - ], - "2": [ - 5.5635 - ], - "3": [ - 3.2104999999999997 - ], - "4": [ - 48.129999999999995 - ], - "5": [ - 6.2745 - ], - "6": [ - 1.9304999999999999 - ], - "7": [ - 7.42 - ], - "8": [ - 2.644 - ], - "9": [ - 9.9525 - ], - "10": [ - 4.0455000000000005 - ], - "11": [ - 20.5745 - ], - "12": [ - 1.5710000000000002 - ], - "13": [ - 8.305499999999999 - ], - "15": [ - 2.293 - ], - "16": [ - 6.834 - ], - "17": [ - 5.15 - ], - "18": [ - 4.7525 - ], - "19": [ - 1.9245 - ], - "20": [ - 1.529 - ], - "21": [ - 1.4005 - ], - "22": [ - 8.411000000000001 - ], - "25": [ - 4.8605 - ], - "26": [ - 3.6575 - ], - "27": [ - 6.09 - ], - "28": [ - 11.378499999999999 - ], - "29": [ - 12.758 - ], - "30": [ - 3.346 - ], - "31": [ - 7.4355 - ], - "32": [ - 1.2095 - ], - "33": [ - 3.035 - ], - "34": [ - 4.402 - ], - "35": [ - 6.5415 - ], - "36": [ - 5.6705000000000005 - ], - "37": [ - 3.2424999999999997 - ], - "38": [ - 11.7975 - ], - "40": [ - 3.5564999999999998 - ], - "41": [ - 0.4645 - ], - "42": [ - 0.9555 - ], - "43": [ - 4.8149999999999995 - ], - "44": [ - 7.016 - ], - "45": [ - 2.567 - ], - "46": [ - 6.1690000000000005 - ], - "47": [ - 7.805999999999999 - ], - "48": [ - 6.811999999999999 - ], - "49": [ - 4.8355 - ], - "50": [ - 21.1985 - ], - "51": [ - 10.940000000000001 - ], - "52": [ - 1.0964999999999998 - ], - "53": [ - 5.8635 - ], - "54": [ - 3.2199999999999998 - ], - "55": [ - 1.0135 - ], - "56": [ - 2.948 - ], - "57": [ - 5.361000000000001 - ], - "58": [ - 2.4905 - ], - "59": [ - 7.554 - ], - "60": [ - 3.2135 - ], - "61": [ - 2.383 - ], - "62": [ - 3.6390000000000002 - ], - "63": [ - 4.607 - ], - "64": [ - 23.2 - ], - "65": [ - 13.316500000000001 - ], - "66": [ - 5.2635000000000005 - ], - "67": [ - 62.164500000000004 - ], - "68": [ - 6.563000000000001 - ], - "69": [ - 3.784 - ], - "70": [ - 9.7715 - ], - "71": [ - 1.9004999999999999 - ], - "72": [ - 10.899999999999999 - ], - "73": [ - 1.709 - ], - "74": [ - 17.8455 - ], - "75": [ - 18.0295 - ], - "76": [ - 7.417 - ], - "77": [ - 1.7635 - ], - "78": [ - 32.436 - ], - "79": [ - 3.9115 - ], - "80": [ - 6.602 - ], - "81": [ - 5.242 - ], - "82": [ - 5.0135000000000005 - ], - "83": [ - 1.5425 - ], - "84": [ - 1.734 - ], - "85": [ - 5.2865 - ], - "86": [ - 2.8755 - ], - "87": [ - 13.177 - ], - "88": [ - 21.369 - ], - "89": [ - 4.142 - ], - "90": [ - 3.5090000000000003 - ], - "91": [ - 1.302 - ], - "92": [ - 1.104 - ], - "93": [ - 25.0165 - ], - "94": [ - 5.6465 - ], - "95": [ - 22.176499999999997 - ], - "96": [ - 4.4205000000000005 - ], - "97": [ - 12.806000000000001 - ], - "98": [ - 2.0945 - ], - "99": [ - 3.9445 - ], - "14a": [ - 51.093999999999994 - ], - "14b": [ - 44.180499999999995 - ], - "23a": [ - 64.36500000000001 - ], - "23b": [ - 76.929 - ], - "24a": [ - 26.692 - ], - "24b": [ - 26.1125 - ], - "39a": [ - 8.219 - ], - "39b": [ - 7.592499999999999 - ] -} \ No newline at end of file + "1": [7.220000000000001], + "2": [5.5635], + "3": [3.2104999999999997], + "4": [48.129999999999995], + "5": [6.2745], + "6": [1.9304999999999999], + "7": [7.42], + "8": [2.644], + "9": [9.9525], + "10": [4.0455000000000005], + "11": [20.5745], + "12": [1.5710000000000002], + "13": [8.305499999999999], + "15": [2.293], + "16": [6.834], + "17": [5.15], + "18": [4.7525], + "19": [1.9245], + "20": [1.529], + "21": [1.4005], + "22": [8.411000000000001], + "25": [4.8605], + "26": [3.6575], + "27": [6.09], + "28": [11.378499999999999], + "29": [12.758], + "30": [3.346], + "31": [7.4355], + "32": [1.2095], + "33": [3.035], + "34": [4.402], + "35": [6.5415], + "36": [5.6705000000000005], + "37": [3.2424999999999997], + "38": [11.7975], + "40": [3.5564999999999998], + "41": [0.4645], + "42": [0.9555], + "43": [4.8149999999999995], + "44": [7.016], + "45": [2.567], + "46": [6.1690000000000005], + "47": [7.805999999999999], + "48": [6.811999999999999], + "49": [4.8355], + "50": [21.1985], + "51": [10.940000000000001], + "52": [1.0964999999999998], + "53": [5.8635], + "54": [3.2199999999999998], + "55": [1.0135], + "56": [2.948], + "57": [5.361000000000001], + "58": [2.4905], + "59": [7.554], + "60": [3.2135], + "61": [2.383], + "62": [3.6390000000000002], + "63": [4.607], + "64": [23.2], + "65": [13.316500000000001], + "66": [5.2635000000000005], + "67": [62.164500000000004], + "68": [6.563000000000001], + "69": [3.784], + "70": [9.7715], + "71": [1.9004999999999999], + "72": [10.899999999999999], + "73": [1.709], + "74": [17.8455], + "75": [18.0295], + "76": [7.417], + "77": [1.7635], + "78": [32.436], + "79": [3.9115], + "80": [6.602], + "81": [5.242], + "82": [5.0135000000000005], + "83": [1.5425], + "84": [1.734], + "85": [5.2865], + "86": [2.8755], + "87": [13.177], + "88": [21.369], + "89": [4.142], + "90": [3.5090000000000003], + "91": [1.302], + "92": [1.104], + "93": [25.0165], + "94": [5.6465], + "95": [22.176499999999997], + "96": [4.4205000000000005], + "97": [12.806000000000001], + "98": [2.0945], + "99": [3.9445], + "14a": [51.093999999999994], + "14b": [44.180499999999995], + "23a": [64.36500000000001], + "23b": [76.929], + "24a": [26.692], + "24b": [26.1125], + "39a": [8.219], + "39b": [7.592499999999999] +} diff --git a/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch-hashjoin.json b/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch-hashjoin.json index 003636ad14..5292757014 100644 --- a/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch-hashjoin.json +++ b/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch-hashjoin.json @@ -14,70 +14,26 @@ "scale_factor": "1000", "spark_profile": "3.5_2.12" }, - "1": [ - 12.007 - ], - "2": [ - 24.2505 - ], - "3": [ - 16.8625 - ], - "4": [ - 14.686499999999999 - ], - "5": [ - 32.248999999999995 - ], - "6": [ - 0.7415 - ], - "7": [ - 15.712499999999999 - ], - "8": [ - 39.1135 - ], - "9": [ - 47.789 - ], - "10": [ - 28.569 - ], - "11": [ - 14.600999999999999 - ], - "12": [ - 7.8575 - ], - "13": [ - 10.9605 - ], - "14": [ - 2.607 - ], - "15": [ - 11.962 - ], - "16": [ - 15.587499999999999 - ], - "17": [ - 36.6875 - ], - "18": [ - 76.5035 - ], - "19": [ - 9.379 - ], - "20": [ - 8.841000000000001 - ], - "21": [ - 102.9005 - ], - "22": [ - 10.042 - ] -} \ No newline at end of file + "1": [12.007], + "2": [24.2505], + "3": [16.8625], + "4": [14.686499999999999], + "5": [32.248999999999995], + "6": [0.7415], + "7": [15.712499999999999], + "8": [39.1135], + "9": [47.789], + "10": [28.569], + "11": [14.600999999999999], + "12": [7.8575], + "13": [10.9605], + "14": [2.607], + "15": [11.962], + "16": [15.587499999999999], + "17": [36.6875], + "18": [76.5035], + "19": [9.379], + "20": [8.841000000000001], + "21": [102.9005], + "22": [10.042] +} diff --git a/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch.json b/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch.json index 2eb1ade06b..3d0b487fd2 100644 --- a/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch.json +++ b/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch.json @@ -13,70 +13,26 @@ "scale_factor": "1000", "spark_profile": "3.5_2.12" }, - "1": [ - 11.9865 - ], - "2": [ - 22.6175 - ], - "3": [ - 26.852 - ], - "4": [ - 14.3615 - ], - "5": [ - 65.00649999999999 - ], - "6": [ - 0.6839999999999999 - ], - "7": [ - 23.7965 - ], - "8": [ - 69.518 - ], - "9": [ - 84.6605 - ], - "10": [ - 29.8585 - ], - "11": [ - 15.7475 - ], - "12": [ - 10.299 - ], - "13": [ - 12.3625 - ], - "14": [ - 2.982 - ], - "15": [ - 12.059999999999999 - ], - "16": [ - 16.506 - ], - "17": [ - 39.364000000000004 - ], - "18": [ - 69.58349999999999 - ], - "19": [ - 11.554 - ], - "20": [ - 9.604500000000002 - ], - "21": [ - 85.452 - ], - "22": [ - 10.0525 - ] -} \ No newline at end of file + "1": [11.9865], + "2": [22.6175], + "3": [26.852], + "4": [14.3615], + "5": [65.00649999999999], + "6": [0.6839999999999999], + "7": [23.7965], + "8": [69.518], + "9": [84.6605], + "10": [29.8585], + "11": [15.7475], + "12": [10.299], + "13": [12.3625], + "14": [2.982], + "15": [12.059999999999999], + "16": [16.506], + "17": [39.364000000000004], + "18": [69.58349999999999], + "19": [11.554], + "20": [9.604500000000002], + "21": [85.452], + "22": [10.0525] +} diff --git a/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpcds.json b/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpcds.json index 556d26dd9e..429a2688e2 100644 --- a/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpcds.json +++ b/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpcds.json @@ -13,313 +13,107 @@ "scale_factor": "1000", "spark_profile": "3.5_2.12" }, - "1": [ - 6.6325 - ], - "2": [ - 11.296 - ], - "3": [ - 3.5389999999999997 - ], - "4": [ - 42.13 - ], - "5": [ - 16.0765 - ], - "6": [ - 2.5905 - ], - "7": [ - 7.6775 - ], - "8": [ - 2.7664999999999997 - ], - "9": [ - 41.19 - ], - "10": [ - 4.798500000000001 - ], - "11": [ - 16.6935 - ], - "12": [ - 1.9075000000000002 - ], - "13": [ - 8.3805 - ], - "15": [ - 2.657 - ], - "16": [ - 15.8 - ], - "17": [ - 5.198 - ], - "18": [ - 4.775499999999999 - ], - "19": [ - 2.0535 - ], - "20": [ - 1.951 - ], - "21": [ - 1.6165 - ], - "22": [ - 10.1985 - ], - "25": [ - 4.18 - ], - "26": [ - 3.6559999999999997 - ], - "27": [ - 5.875500000000001 - ], - "28": [ - 41.400000000000006 - ], - "29": [ - 12.814499999999999 - ], - "30": [ - 3.683 - ], - "31": [ - 6.7405 - ], - "32": [ - 1.5375 - ], - "33": [ - 3.2175000000000002 - ], - "34": [ - 4.367 - ], - "35": [ - 6.285500000000001 - ], - "36": [ - 6.1295 - ], - "37": [ - 8.221 - ], - "38": [ - 13.931999999999999 - ], - "40": [ - 5.8815 - ], - "41": [ - 1.1164999999999998 - ], - "42": [ - 2.5195 - ], - "43": [ - 5.170999999999999 - ], - "44": [ - 25.598 - ], - "45": [ - 2.5675 - ], - "46": [ - 7.5 - ], - "47": [ - 8.979500000000002 - ], - "48": [ - 8.464500000000001 - ], - "49": [ - 8.951 - ], - "50": [ - 31.622500000000002 - ], - "51": [ - 10.7525 - ], - "52": [ - 1.5 - ], - "53": [ - 6.2330000000000005 - ], - "54": [ - 3.2664999999999997 - ], - "55": [ - 1.295 - ], - "56": [ - 3.4320000000000004 - ], - "57": [ - 5.213 - ], - "58": [ - 2.982 - ], - "59": [ - 17.828 - ], - "60": [ - 3.588 - ], - "61": [ - 4.023 - ], - "62": [ - 10.6945 - ], - "63": [ - 5.3575 - ], - "64": [ - 39.058 - ], - "65": [ - 13.0225 - ], - "66": [ - 5.6965 - ], - "67": [ - 58.116 - ], - "68": [ - 2.9850000000000003 - ], - "69": [ - 4.6395 - ], - "70": [ - 11.1165 - ], - "71": [ - 2.0905 - ], - "72": [ - 14.392 - ], - "73": [ - 1.939 - ], - "74": [ - 14.888 - ], - "75": [ - 20.826500000000003 - ], - "76": [ - 19.8905 - ], - "77": [ - 1.883 - ], - "78": [ - 33.607 - ], - "79": [ - 4.1065 - ], - "80": [ - 9.498999999999999 - ], - "81": [ - 5.681 - ], - "82": [ - 16.321 - ], - "83": [ - 2.0175 - ], - "84": [ - 3.9284999999999997 - ], - "85": [ - 6.912 - ], - "86": [ - 4.362 - ], - "87": [ - 13.2085 - ], - "88": [ - 57.921499999999995 - ], - "89": [ - 4.8945 - ], - "90": [ - 11.091000000000001 - ], - "91": [ - 1.6255000000000002 - ], - "92": [ - 1.565 - ], - "93": [ - 51.4765 - ], - "94": [ - 12.693 - ], - "95": [ - 30.4705 - ], - "96": [ - 12.237 - ], - "97": [ - 12.588000000000001 - ], - "98": [ - 2.2845 - ], - "99": [ - 11.4715 - ], - "14a": [ - 49.5975 - ], - "14b": [ - 47.707499999999996 - ], - "23a": [ - 85.717 - ], - "23b": [ - 119.332 - ], - "24a": [ - 47.7055 - ], - "24b": [ - 44.661 - ], - "39a": [ - 5.699 - ], - "39b": [ - 5.2835 - ] -} \ No newline at end of file + "1": [6.6325], + "2": [11.296], + "3": [3.5389999999999997], + "4": [42.13], + "5": [16.0765], + "6": [2.5905], + "7": [7.6775], + "8": [2.7664999999999997], + "9": [41.19], + "10": [4.798500000000001], + "11": [16.6935], + "12": [1.9075000000000002], + "13": [8.3805], + "15": [2.657], + "16": [15.8], + "17": [5.198], + "18": [4.775499999999999], + "19": [2.0535], + "20": [1.951], + "21": [1.6165], + "22": [10.1985], + "25": [4.18], + "26": [3.6559999999999997], + "27": [5.875500000000001], + "28": [41.400000000000006], + "29": [12.814499999999999], + "30": [3.683], + "31": [6.7405], + "32": [1.5375], + "33": [3.2175000000000002], + "34": [4.367], + "35": [6.285500000000001], + "36": [6.1295], + "37": [8.221], + "38": [13.931999999999999], + "40": [5.8815], + "41": [1.1164999999999998], + "42": [2.5195], + "43": [5.170999999999999], + "44": [25.598], + "45": [2.5675], + "46": [7.5], + "47": [8.979500000000002], + "48": [8.464500000000001], + "49": [8.951], + "50": [31.622500000000002], + "51": [10.7525], + "52": [1.5], + "53": [6.2330000000000005], + "54": [3.2664999999999997], + "55": [1.295], + "56": [3.4320000000000004], + "57": [5.213], + "58": [2.982], + "59": [17.828], + "60": [3.588], + "61": [4.023], + "62": [10.6945], + "63": [5.3575], + "64": [39.058], + "65": [13.0225], + "66": [5.6965], + "67": [58.116], + "68": [2.9850000000000003], + "69": [4.6395], + "70": [11.1165], + "71": [2.0905], + "72": [14.392], + "73": [1.939], + "74": [14.888], + "75": [20.826500000000003], + "76": [19.8905], + "77": [1.883], + "78": [33.607], + "79": [4.1065], + "80": [9.498999999999999], + "81": [5.681], + "82": [16.321], + "83": [2.0175], + "84": [3.9284999999999997], + "85": [6.912], + "86": [4.362], + "87": [13.2085], + "88": [57.921499999999995], + "89": [4.8945], + "90": [11.091000000000001], + "91": [1.6255000000000002], + "92": [1.565], + "93": [51.4765], + "94": [12.693], + "95": [30.4705], + "96": [12.237], + "97": [12.588000000000001], + "98": [2.2845], + "99": [11.4715], + "14a": [49.5975], + "14b": [47.707499999999996], + "23a": [85.717], + "23b": [119.332], + "24a": [47.7055], + "24b": [44.661], + "39a": [5.699], + "39b": [5.2835] +} diff --git a/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpch.json b/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpch.json index 8fc3772d7d..2e169cc8d7 100644 --- a/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpch.json +++ b/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpch.json @@ -13,70 +13,26 @@ "scale_factor": "1000", "spark_profile": "3.5_2.12" }, - "1": [ - 101.212 - ], - "2": [ - 38.5525 - ], - "3": [ - 36.48 - ], - "4": [ - 31.700499999999998 - ], - "5": [ - 68.3175 - ], - "6": [ - 1.6925 - ], - "7": [ - 28.233 - ], - "8": [ - 63.251000000000005 - ], - "9": [ - 86.0805 - ], - "10": [ - 40.848 - ], - "11": [ - 33.468999999999994 - ], - "12": [ - 16.343 - ], - "13": [ - 27.7675 - ], - "14": [ - 8.062000000000001 - ], - "15": [ - 26.322499999999998 - ], - "16": [ - 27.36 - ], - "17": [ - 97.70949999999999 - ], - "18": [ - 154.4655 - ], - "19": [ - 14.817 - ], - "20": [ - 18.521 - ], - "21": [ - 134.6705 - ], - "22": [ - 17.3075 - ] -} \ No newline at end of file + "1": [101.212], + "2": [38.5525], + "3": [36.48], + "4": [31.700499999999998], + "5": [68.3175], + "6": [1.6925], + "7": [28.233], + "8": [63.251000000000005], + "9": [86.0805], + "10": [40.848], + "11": [33.468999999999994], + "12": [16.343], + "13": [27.7675], + "14": [8.062000000000001], + "15": [26.322499999999998], + "16": [27.36], + "17": [97.70949999999999], + "18": [154.4655], + "19": [14.817], + "20": [18.521], + "21": [134.6705], + "22": [17.3075] +} diff --git a/docs/source/contributor-guide/debugging.md b/docs/source/contributor-guide/debugging.md index e5372d922d..6893e0f16e 100644 --- a/docs/source/contributor-guide/debugging.md +++ b/docs/source/contributor-guide/debugging.md @@ -73,6 +73,7 @@ process handle -n true -p true -s false SIGBUS SIGSEGV SIGILL ### In CLion 1. After the breakpoint is hit in IntelliJ, in Clion (or LLDB from terminal or editor) - + 1. Attach to the jvm process (make sure the PID matches). In CLion, this is `Run -> Attach to process` 1. Put your breakpoint in the native code diff --git a/docs/source/contributor-guide/native_shuffle.md b/docs/source/contributor-guide/native_shuffle.md index 18e80a90c8..9cf30dca82 100644 --- a/docs/source/contributor-guide/native_shuffle.md +++ b/docs/source/contributor-guide/native_shuffle.md @@ -49,6 +49,7 @@ Native shuffle (`CometExchange`) is selected when all of the following condition columnar output. Row-based Spark operators require JVM shuffle. 3. **Supported partitioning type**: Native shuffle supports: + - `HashPartitioning` - `RangePartitioning` - `SinglePartition` @@ -124,12 +125,14 @@ Native shuffle (`CometExchange`) is selected when all of the following condition ### Write Path 1. **Plan construction**: `CometNativeShuffleWriter` builds a protobuf operator plan containing: + - A scan operator reading from the input iterator - A `ShuffleWriter` operator with partitioning config and compression codec 2. **Native execution**: `CometExec.getCometIterator()` executes the plan in Rust. 3. **Partitioning**: `ShuffleWriterExec` receives batches and routes to the appropriate partitioner: + - `MultiPartitionShuffleRepartitioner`: For hash/range/round-robin partitioning - `SinglePartitionShufflePartitioner`: For single partition (simpler path) @@ -137,11 +140,13 @@ Native shuffle (`CometExchange`) is selected when all of the following condition exceeds the threshold, partitions spill to temporary files. 5. **Encoding**: `ShuffleBlockWriter` encodes each partition's data as compressed Arrow IPC: + - Writes compression type header - Writes field count header - Writes compressed IPC stream 6. **Output files**: Two files are produced: + - **Data file**: Concatenated partition data - **Index file**: Array of 8-byte little-endian offsets marking partition boundaries @@ -153,6 +158,7 @@ Native shuffle (`CometExchange`) is selected when all of the following condition 1. `CometBlockStoreShuffleReader` fetches shuffle blocks via `ShuffleBlockFetcherIterator`. 2. For each block, `NativeBatchDecoderIterator`: + - Reads the 8-byte compressed length header - Reads the 8-byte field count header - Reads the compressed IPC data diff --git a/docs/source/user-guide/latest/installation.md b/docs/source/user-guide/latest/installation.md index fce19e329e..020acef30c 100644 --- a/docs/source/user-guide/latest/installation.md +++ b/docs/source/user-guide/latest/installation.md @@ -28,11 +28,11 @@ Make sure the following requirements are met and software installed on your mach The published Comet jar files in Maven Central bundle native libraries for Linux only (amd64 and arm64). macOS users must [build from source](source.md). -| Operating System | Published Maven Jars | Build from Source | -| ----------------------------- | -------------------- | ----------------- | -| Linux (amd64) | Yes | Yes | -| Linux (arm64) | Yes | Yes | -| Apple macOS (Apple Silicon) | No | Yes | +| Operating System | Published Maven Jars | Build from Source | +| --------------------------- | -------------------- | ----------------- | +| Linux (amd64) | Yes | Yes | +| Linux (arm64) | Yes | Yes | +| Apple macOS (Apple Silicon) | No | Yes | ### Supported Spark Versions From ba77e245f25d656e1257b6ef3ca5f98b228b20c9 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Wed, 6 May 2026 19:31:50 -0600 Subject: [PATCH 07/15] Revert "fix: apply prettier formatting to docs" This reverts commit 9d8e135df8ed33323ffdde3176a165e3abd7e62e. --- docs/source/_static/theme_overrides.css | 27 +- docs/source/_templates/docs-sidebar.html | 4 +- docs/source/_templates/layout.html | 14 +- .../benchmark-results/comet-0.15.0-tpcds.json | 414 +++++++++++++----- .../comet-0.15.0-tpch-hashjoin.json | 90 +++- .../benchmark-results/comet-0.15.0-tpch.json | 90 +++- .../benchmark-results/spark-3.5.8-tpcds.json | 414 +++++++++++++----- .../benchmark-results/spark-3.5.8-tpch.json | 90 +++- docs/source/contributor-guide/debugging.md | 1 - .../contributor-guide/native_shuffle.md | 6 - docs/source/user-guide/latest/installation.md | 10 +- 11 files changed, 847 insertions(+), 313 deletions(-) diff --git a/docs/source/_static/theme_overrides.css b/docs/source/_static/theme_overrides.css index b7ae6d8d92..dd5b374446 100644 --- a/docs/source/_static/theme_overrides.css +++ b/docs/source/_static/theme_overrides.css @@ -17,6 +17,7 @@ * under the License. */ + /* Customizing with theme CSS variables */ :root { @@ -41,9 +42,7 @@ } /* --- remove the right (secondary) sidebar entirely --- */ -.bd-sidebar-secondary { - display: none !important; -} +.bd-sidebar-secondary { display: none !important; } /* Some versions still reserve the grid column for it — collapse it */ .bd-main { @@ -63,11 +62,11 @@ } /* --- let the center content use all remaining width --- */ -.bd-content, -.bd-article-container { - max-width: none !important; /* remove internal cap */ +.bd-content, .bd-article-container { + max-width: none !important; /* remove internal cap */ } + code { color: rgb(215, 70, 51); } @@ -93,28 +92,30 @@ add ":class: table-striped" */ background-color: rgba(0, 0, 0, 0.05); } + /* Limit the max height of the sidebar navigation section. Because in our customized template, there is more content above the navigation, i.e. larger logo: if we don't decrease the max-height, it will overlap with the footer. Details: 8rem for search box etc*/ -@media (min-width: 720px) { - @supports (position: -webkit-sticky) or (position: sticky) { +@media (min-width:720px) { + @supports (position:-webkit-sticky) or (position:sticky) { .bd-links { - max-height: calc(100vh - 8rem); + max-height: calc(100vh - 8rem) } } } + /* Fix table text wrapping in RTD theme, * see https://rackerlabs.github.io/docs-rackspace/tools/rtd-tables.html */ @media screen { - table.docutils td { - /* !important prevents the common CSS stylesheets from overriding + table.docutils td { + /* !important prevents the common CSS stylesheets from overriding this as on RTD they are loaded after this stylesheet */ - white-space: normal !important; - } + white-space: normal !important; + } } diff --git a/docs/source/_templates/docs-sidebar.html b/docs/source/_templates/docs-sidebar.html index e632c8f4ce..26e859eadc 100644 --- a/docs/source/_templates/docs-sidebar.html +++ b/docs/source/_templates/docs-sidebar.html @@ -19,7 +19,7 @@ + diff --git a/docs/source/_templates/layout.html b/docs/source/_templates/layout.html index cce68aa98c..f90a9e0b58 100644 --- a/docs/source/_templates/layout.html +++ b/docs/source/_templates/layout.html @@ -27,17 +27,13 @@
{% for footer_item in theme_footer_items %} - + {% endfor %}
diff --git a/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpcds.json b/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpcds.json index 65a6e97f60..f84bbfce41 100644 --- a/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpcds.json +++ b/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpcds.json @@ -13,107 +13,313 @@ "scale_factor": "1000", "spark_profile": "3.5_2.12" }, - "1": [7.220000000000001], - "2": [5.5635], - "3": [3.2104999999999997], - "4": [48.129999999999995], - "5": [6.2745], - "6": [1.9304999999999999], - "7": [7.42], - "8": [2.644], - "9": [9.9525], - "10": [4.0455000000000005], - "11": [20.5745], - "12": [1.5710000000000002], - "13": [8.305499999999999], - "15": [2.293], - "16": [6.834], - "17": [5.15], - "18": [4.7525], - "19": [1.9245], - "20": [1.529], - "21": [1.4005], - "22": [8.411000000000001], - "25": [4.8605], - "26": [3.6575], - "27": [6.09], - "28": [11.378499999999999], - "29": [12.758], - "30": [3.346], - "31": [7.4355], - "32": [1.2095], - "33": [3.035], - "34": [4.402], - "35": [6.5415], - "36": [5.6705000000000005], - "37": [3.2424999999999997], - "38": [11.7975], - "40": [3.5564999999999998], - "41": [0.4645], - "42": [0.9555], - "43": [4.8149999999999995], - "44": [7.016], - "45": [2.567], - "46": [6.1690000000000005], - "47": [7.805999999999999], - "48": [6.811999999999999], - "49": [4.8355], - "50": [21.1985], - "51": [10.940000000000001], - "52": [1.0964999999999998], - "53": [5.8635], - "54": [3.2199999999999998], - "55": [1.0135], - "56": [2.948], - "57": [5.361000000000001], - "58": [2.4905], - "59": [7.554], - "60": [3.2135], - "61": [2.383], - "62": [3.6390000000000002], - "63": [4.607], - "64": [23.2], - "65": [13.316500000000001], - "66": [5.2635000000000005], - "67": [62.164500000000004], - "68": [6.563000000000001], - "69": [3.784], - "70": [9.7715], - "71": [1.9004999999999999], - "72": [10.899999999999999], - "73": [1.709], - "74": [17.8455], - "75": [18.0295], - "76": [7.417], - "77": [1.7635], - "78": [32.436], - "79": [3.9115], - "80": [6.602], - "81": [5.242], - "82": [5.0135000000000005], - "83": [1.5425], - "84": [1.734], - "85": [5.2865], - "86": [2.8755], - "87": [13.177], - "88": [21.369], - "89": [4.142], - "90": [3.5090000000000003], - "91": [1.302], - "92": [1.104], - "93": [25.0165], - "94": [5.6465], - "95": [22.176499999999997], - "96": [4.4205000000000005], - "97": [12.806000000000001], - "98": [2.0945], - "99": [3.9445], - "14a": [51.093999999999994], - "14b": [44.180499999999995], - "23a": [64.36500000000001], - "23b": [76.929], - "24a": [26.692], - "24b": [26.1125], - "39a": [8.219], - "39b": [7.592499999999999] -} + "1": [ + 7.220000000000001 + ], + "2": [ + 5.5635 + ], + "3": [ + 3.2104999999999997 + ], + "4": [ + 48.129999999999995 + ], + "5": [ + 6.2745 + ], + "6": [ + 1.9304999999999999 + ], + "7": [ + 7.42 + ], + "8": [ + 2.644 + ], + "9": [ + 9.9525 + ], + "10": [ + 4.0455000000000005 + ], + "11": [ + 20.5745 + ], + "12": [ + 1.5710000000000002 + ], + "13": [ + 8.305499999999999 + ], + "15": [ + 2.293 + ], + "16": [ + 6.834 + ], + "17": [ + 5.15 + ], + "18": [ + 4.7525 + ], + "19": [ + 1.9245 + ], + "20": [ + 1.529 + ], + "21": [ + 1.4005 + ], + "22": [ + 8.411000000000001 + ], + "25": [ + 4.8605 + ], + "26": [ + 3.6575 + ], + "27": [ + 6.09 + ], + "28": [ + 11.378499999999999 + ], + "29": [ + 12.758 + ], + "30": [ + 3.346 + ], + "31": [ + 7.4355 + ], + "32": [ + 1.2095 + ], + "33": [ + 3.035 + ], + "34": [ + 4.402 + ], + "35": [ + 6.5415 + ], + "36": [ + 5.6705000000000005 + ], + "37": [ + 3.2424999999999997 + ], + "38": [ + 11.7975 + ], + "40": [ + 3.5564999999999998 + ], + "41": [ + 0.4645 + ], + "42": [ + 0.9555 + ], + "43": [ + 4.8149999999999995 + ], + "44": [ + 7.016 + ], + "45": [ + 2.567 + ], + "46": [ + 6.1690000000000005 + ], + "47": [ + 7.805999999999999 + ], + "48": [ + 6.811999999999999 + ], + "49": [ + 4.8355 + ], + "50": [ + 21.1985 + ], + "51": [ + 10.940000000000001 + ], + "52": [ + 1.0964999999999998 + ], + "53": [ + 5.8635 + ], + "54": [ + 3.2199999999999998 + ], + "55": [ + 1.0135 + ], + "56": [ + 2.948 + ], + "57": [ + 5.361000000000001 + ], + "58": [ + 2.4905 + ], + "59": [ + 7.554 + ], + "60": [ + 3.2135 + ], + "61": [ + 2.383 + ], + "62": [ + 3.6390000000000002 + ], + "63": [ + 4.607 + ], + "64": [ + 23.2 + ], + "65": [ + 13.316500000000001 + ], + "66": [ + 5.2635000000000005 + ], + "67": [ + 62.164500000000004 + ], + "68": [ + 6.563000000000001 + ], + "69": [ + 3.784 + ], + "70": [ + 9.7715 + ], + "71": [ + 1.9004999999999999 + ], + "72": [ + 10.899999999999999 + ], + "73": [ + 1.709 + ], + "74": [ + 17.8455 + ], + "75": [ + 18.0295 + ], + "76": [ + 7.417 + ], + "77": [ + 1.7635 + ], + "78": [ + 32.436 + ], + "79": [ + 3.9115 + ], + "80": [ + 6.602 + ], + "81": [ + 5.242 + ], + "82": [ + 5.0135000000000005 + ], + "83": [ + 1.5425 + ], + "84": [ + 1.734 + ], + "85": [ + 5.2865 + ], + "86": [ + 2.8755 + ], + "87": [ + 13.177 + ], + "88": [ + 21.369 + ], + "89": [ + 4.142 + ], + "90": [ + 3.5090000000000003 + ], + "91": [ + 1.302 + ], + "92": [ + 1.104 + ], + "93": [ + 25.0165 + ], + "94": [ + 5.6465 + ], + "95": [ + 22.176499999999997 + ], + "96": [ + 4.4205000000000005 + ], + "97": [ + 12.806000000000001 + ], + "98": [ + 2.0945 + ], + "99": [ + 3.9445 + ], + "14a": [ + 51.093999999999994 + ], + "14b": [ + 44.180499999999995 + ], + "23a": [ + 64.36500000000001 + ], + "23b": [ + 76.929 + ], + "24a": [ + 26.692 + ], + "24b": [ + 26.1125 + ], + "39a": [ + 8.219 + ], + "39b": [ + 7.592499999999999 + ] +} \ No newline at end of file diff --git a/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch-hashjoin.json b/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch-hashjoin.json index 5292757014..003636ad14 100644 --- a/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch-hashjoin.json +++ b/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch-hashjoin.json @@ -14,26 +14,70 @@ "scale_factor": "1000", "spark_profile": "3.5_2.12" }, - "1": [12.007], - "2": [24.2505], - "3": [16.8625], - "4": [14.686499999999999], - "5": [32.248999999999995], - "6": [0.7415], - "7": [15.712499999999999], - "8": [39.1135], - "9": [47.789], - "10": [28.569], - "11": [14.600999999999999], - "12": [7.8575], - "13": [10.9605], - "14": [2.607], - "15": [11.962], - "16": [15.587499999999999], - "17": [36.6875], - "18": [76.5035], - "19": [9.379], - "20": [8.841000000000001], - "21": [102.9005], - "22": [10.042] -} + "1": [ + 12.007 + ], + "2": [ + 24.2505 + ], + "3": [ + 16.8625 + ], + "4": [ + 14.686499999999999 + ], + "5": [ + 32.248999999999995 + ], + "6": [ + 0.7415 + ], + "7": [ + 15.712499999999999 + ], + "8": [ + 39.1135 + ], + "9": [ + 47.789 + ], + "10": [ + 28.569 + ], + "11": [ + 14.600999999999999 + ], + "12": [ + 7.8575 + ], + "13": [ + 10.9605 + ], + "14": [ + 2.607 + ], + "15": [ + 11.962 + ], + "16": [ + 15.587499999999999 + ], + "17": [ + 36.6875 + ], + "18": [ + 76.5035 + ], + "19": [ + 9.379 + ], + "20": [ + 8.841000000000001 + ], + "21": [ + 102.9005 + ], + "22": [ + 10.042 + ] +} \ No newline at end of file diff --git a/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch.json b/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch.json index 3d0b487fd2..2eb1ade06b 100644 --- a/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch.json +++ b/docs/source/contributor-guide/benchmark-results/comet-0.15.0-tpch.json @@ -13,26 +13,70 @@ "scale_factor": "1000", "spark_profile": "3.5_2.12" }, - "1": [11.9865], - "2": [22.6175], - "3": [26.852], - "4": [14.3615], - "5": [65.00649999999999], - "6": [0.6839999999999999], - "7": [23.7965], - "8": [69.518], - "9": [84.6605], - "10": [29.8585], - "11": [15.7475], - "12": [10.299], - "13": [12.3625], - "14": [2.982], - "15": [12.059999999999999], - "16": [16.506], - "17": [39.364000000000004], - "18": [69.58349999999999], - "19": [11.554], - "20": [9.604500000000002], - "21": [85.452], - "22": [10.0525] -} + "1": [ + 11.9865 + ], + "2": [ + 22.6175 + ], + "3": [ + 26.852 + ], + "4": [ + 14.3615 + ], + "5": [ + 65.00649999999999 + ], + "6": [ + 0.6839999999999999 + ], + "7": [ + 23.7965 + ], + "8": [ + 69.518 + ], + "9": [ + 84.6605 + ], + "10": [ + 29.8585 + ], + "11": [ + 15.7475 + ], + "12": [ + 10.299 + ], + "13": [ + 12.3625 + ], + "14": [ + 2.982 + ], + "15": [ + 12.059999999999999 + ], + "16": [ + 16.506 + ], + "17": [ + 39.364000000000004 + ], + "18": [ + 69.58349999999999 + ], + "19": [ + 11.554 + ], + "20": [ + 9.604500000000002 + ], + "21": [ + 85.452 + ], + "22": [ + 10.0525 + ] +} \ No newline at end of file diff --git a/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpcds.json b/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpcds.json index 429a2688e2..556d26dd9e 100644 --- a/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpcds.json +++ b/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpcds.json @@ -13,107 +13,313 @@ "scale_factor": "1000", "spark_profile": "3.5_2.12" }, - "1": [6.6325], - "2": [11.296], - "3": [3.5389999999999997], - "4": [42.13], - "5": [16.0765], - "6": [2.5905], - "7": [7.6775], - "8": [2.7664999999999997], - "9": [41.19], - "10": [4.798500000000001], - "11": [16.6935], - "12": [1.9075000000000002], - "13": [8.3805], - "15": [2.657], - "16": [15.8], - "17": [5.198], - "18": [4.775499999999999], - "19": [2.0535], - "20": [1.951], - "21": [1.6165], - "22": [10.1985], - "25": [4.18], - "26": [3.6559999999999997], - "27": [5.875500000000001], - "28": [41.400000000000006], - "29": [12.814499999999999], - "30": [3.683], - "31": [6.7405], - "32": [1.5375], - "33": [3.2175000000000002], - "34": [4.367], - "35": [6.285500000000001], - "36": [6.1295], - "37": [8.221], - "38": [13.931999999999999], - "40": [5.8815], - "41": [1.1164999999999998], - "42": [2.5195], - "43": [5.170999999999999], - "44": [25.598], - "45": [2.5675], - "46": [7.5], - "47": [8.979500000000002], - "48": [8.464500000000001], - "49": [8.951], - "50": [31.622500000000002], - "51": [10.7525], - "52": [1.5], - "53": [6.2330000000000005], - "54": [3.2664999999999997], - "55": [1.295], - "56": [3.4320000000000004], - "57": [5.213], - "58": [2.982], - "59": [17.828], - "60": [3.588], - "61": [4.023], - "62": [10.6945], - "63": [5.3575], - "64": [39.058], - "65": [13.0225], - "66": [5.6965], - "67": [58.116], - "68": [2.9850000000000003], - "69": [4.6395], - "70": [11.1165], - "71": [2.0905], - "72": [14.392], - "73": [1.939], - "74": [14.888], - "75": [20.826500000000003], - "76": [19.8905], - "77": [1.883], - "78": [33.607], - "79": [4.1065], - "80": [9.498999999999999], - "81": [5.681], - "82": [16.321], - "83": [2.0175], - "84": [3.9284999999999997], - "85": [6.912], - "86": [4.362], - "87": [13.2085], - "88": [57.921499999999995], - "89": [4.8945], - "90": [11.091000000000001], - "91": [1.6255000000000002], - "92": [1.565], - "93": [51.4765], - "94": [12.693], - "95": [30.4705], - "96": [12.237], - "97": [12.588000000000001], - "98": [2.2845], - "99": [11.4715], - "14a": [49.5975], - "14b": [47.707499999999996], - "23a": [85.717], - "23b": [119.332], - "24a": [47.7055], - "24b": [44.661], - "39a": [5.699], - "39b": [5.2835] -} + "1": [ + 6.6325 + ], + "2": [ + 11.296 + ], + "3": [ + 3.5389999999999997 + ], + "4": [ + 42.13 + ], + "5": [ + 16.0765 + ], + "6": [ + 2.5905 + ], + "7": [ + 7.6775 + ], + "8": [ + 2.7664999999999997 + ], + "9": [ + 41.19 + ], + "10": [ + 4.798500000000001 + ], + "11": [ + 16.6935 + ], + "12": [ + 1.9075000000000002 + ], + "13": [ + 8.3805 + ], + "15": [ + 2.657 + ], + "16": [ + 15.8 + ], + "17": [ + 5.198 + ], + "18": [ + 4.775499999999999 + ], + "19": [ + 2.0535 + ], + "20": [ + 1.951 + ], + "21": [ + 1.6165 + ], + "22": [ + 10.1985 + ], + "25": [ + 4.18 + ], + "26": [ + 3.6559999999999997 + ], + "27": [ + 5.875500000000001 + ], + "28": [ + 41.400000000000006 + ], + "29": [ + 12.814499999999999 + ], + "30": [ + 3.683 + ], + "31": [ + 6.7405 + ], + "32": [ + 1.5375 + ], + "33": [ + 3.2175000000000002 + ], + "34": [ + 4.367 + ], + "35": [ + 6.285500000000001 + ], + "36": [ + 6.1295 + ], + "37": [ + 8.221 + ], + "38": [ + 13.931999999999999 + ], + "40": [ + 5.8815 + ], + "41": [ + 1.1164999999999998 + ], + "42": [ + 2.5195 + ], + "43": [ + 5.170999999999999 + ], + "44": [ + 25.598 + ], + "45": [ + 2.5675 + ], + "46": [ + 7.5 + ], + "47": [ + 8.979500000000002 + ], + "48": [ + 8.464500000000001 + ], + "49": [ + 8.951 + ], + "50": [ + 31.622500000000002 + ], + "51": [ + 10.7525 + ], + "52": [ + 1.5 + ], + "53": [ + 6.2330000000000005 + ], + "54": [ + 3.2664999999999997 + ], + "55": [ + 1.295 + ], + "56": [ + 3.4320000000000004 + ], + "57": [ + 5.213 + ], + "58": [ + 2.982 + ], + "59": [ + 17.828 + ], + "60": [ + 3.588 + ], + "61": [ + 4.023 + ], + "62": [ + 10.6945 + ], + "63": [ + 5.3575 + ], + "64": [ + 39.058 + ], + "65": [ + 13.0225 + ], + "66": [ + 5.6965 + ], + "67": [ + 58.116 + ], + "68": [ + 2.9850000000000003 + ], + "69": [ + 4.6395 + ], + "70": [ + 11.1165 + ], + "71": [ + 2.0905 + ], + "72": [ + 14.392 + ], + "73": [ + 1.939 + ], + "74": [ + 14.888 + ], + "75": [ + 20.826500000000003 + ], + "76": [ + 19.8905 + ], + "77": [ + 1.883 + ], + "78": [ + 33.607 + ], + "79": [ + 4.1065 + ], + "80": [ + 9.498999999999999 + ], + "81": [ + 5.681 + ], + "82": [ + 16.321 + ], + "83": [ + 2.0175 + ], + "84": [ + 3.9284999999999997 + ], + "85": [ + 6.912 + ], + "86": [ + 4.362 + ], + "87": [ + 13.2085 + ], + "88": [ + 57.921499999999995 + ], + "89": [ + 4.8945 + ], + "90": [ + 11.091000000000001 + ], + "91": [ + 1.6255000000000002 + ], + "92": [ + 1.565 + ], + "93": [ + 51.4765 + ], + "94": [ + 12.693 + ], + "95": [ + 30.4705 + ], + "96": [ + 12.237 + ], + "97": [ + 12.588000000000001 + ], + "98": [ + 2.2845 + ], + "99": [ + 11.4715 + ], + "14a": [ + 49.5975 + ], + "14b": [ + 47.707499999999996 + ], + "23a": [ + 85.717 + ], + "23b": [ + 119.332 + ], + "24a": [ + 47.7055 + ], + "24b": [ + 44.661 + ], + "39a": [ + 5.699 + ], + "39b": [ + 5.2835 + ] +} \ No newline at end of file diff --git a/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpch.json b/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpch.json index 2e169cc8d7..8fc3772d7d 100644 --- a/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpch.json +++ b/docs/source/contributor-guide/benchmark-results/spark-3.5.8-tpch.json @@ -13,26 +13,70 @@ "scale_factor": "1000", "spark_profile": "3.5_2.12" }, - "1": [101.212], - "2": [38.5525], - "3": [36.48], - "4": [31.700499999999998], - "5": [68.3175], - "6": [1.6925], - "7": [28.233], - "8": [63.251000000000005], - "9": [86.0805], - "10": [40.848], - "11": [33.468999999999994], - "12": [16.343], - "13": [27.7675], - "14": [8.062000000000001], - "15": [26.322499999999998], - "16": [27.36], - "17": [97.70949999999999], - "18": [154.4655], - "19": [14.817], - "20": [18.521], - "21": [134.6705], - "22": [17.3075] -} + "1": [ + 101.212 + ], + "2": [ + 38.5525 + ], + "3": [ + 36.48 + ], + "4": [ + 31.700499999999998 + ], + "5": [ + 68.3175 + ], + "6": [ + 1.6925 + ], + "7": [ + 28.233 + ], + "8": [ + 63.251000000000005 + ], + "9": [ + 86.0805 + ], + "10": [ + 40.848 + ], + "11": [ + 33.468999999999994 + ], + "12": [ + 16.343 + ], + "13": [ + 27.7675 + ], + "14": [ + 8.062000000000001 + ], + "15": [ + 26.322499999999998 + ], + "16": [ + 27.36 + ], + "17": [ + 97.70949999999999 + ], + "18": [ + 154.4655 + ], + "19": [ + 14.817 + ], + "20": [ + 18.521 + ], + "21": [ + 134.6705 + ], + "22": [ + 17.3075 + ] +} \ No newline at end of file diff --git a/docs/source/contributor-guide/debugging.md b/docs/source/contributor-guide/debugging.md index 6893e0f16e..e5372d922d 100644 --- a/docs/source/contributor-guide/debugging.md +++ b/docs/source/contributor-guide/debugging.md @@ -73,7 +73,6 @@ process handle -n true -p true -s false SIGBUS SIGSEGV SIGILL ### In CLion 1. After the breakpoint is hit in IntelliJ, in Clion (or LLDB from terminal or editor) - - 1. Attach to the jvm process (make sure the PID matches). In CLion, this is `Run -> Attach to process` 1. Put your breakpoint in the native code diff --git a/docs/source/contributor-guide/native_shuffle.md b/docs/source/contributor-guide/native_shuffle.md index 9cf30dca82..18e80a90c8 100644 --- a/docs/source/contributor-guide/native_shuffle.md +++ b/docs/source/contributor-guide/native_shuffle.md @@ -49,7 +49,6 @@ Native shuffle (`CometExchange`) is selected when all of the following condition columnar output. Row-based Spark operators require JVM shuffle. 3. **Supported partitioning type**: Native shuffle supports: - - `HashPartitioning` - `RangePartitioning` - `SinglePartition` @@ -125,14 +124,12 @@ Native shuffle (`CometExchange`) is selected when all of the following condition ### Write Path 1. **Plan construction**: `CometNativeShuffleWriter` builds a protobuf operator plan containing: - - A scan operator reading from the input iterator - A `ShuffleWriter` operator with partitioning config and compression codec 2. **Native execution**: `CometExec.getCometIterator()` executes the plan in Rust. 3. **Partitioning**: `ShuffleWriterExec` receives batches and routes to the appropriate partitioner: - - `MultiPartitionShuffleRepartitioner`: For hash/range/round-robin partitioning - `SinglePartitionShufflePartitioner`: For single partition (simpler path) @@ -140,13 +137,11 @@ Native shuffle (`CometExchange`) is selected when all of the following condition exceeds the threshold, partitions spill to temporary files. 5. **Encoding**: `ShuffleBlockWriter` encodes each partition's data as compressed Arrow IPC: - - Writes compression type header - Writes field count header - Writes compressed IPC stream 6. **Output files**: Two files are produced: - - **Data file**: Concatenated partition data - **Index file**: Array of 8-byte little-endian offsets marking partition boundaries @@ -158,7 +153,6 @@ Native shuffle (`CometExchange`) is selected when all of the following condition 1. `CometBlockStoreShuffleReader` fetches shuffle blocks via `ShuffleBlockFetcherIterator`. 2. For each block, `NativeBatchDecoderIterator`: - - Reads the 8-byte compressed length header - Reads the 8-byte field count header - Reads the compressed IPC data diff --git a/docs/source/user-guide/latest/installation.md b/docs/source/user-guide/latest/installation.md index 020acef30c..fce19e329e 100644 --- a/docs/source/user-guide/latest/installation.md +++ b/docs/source/user-guide/latest/installation.md @@ -28,11 +28,11 @@ Make sure the following requirements are met and software installed on your mach The published Comet jar files in Maven Central bundle native libraries for Linux only (amd64 and arm64). macOS users must [build from source](source.md). -| Operating System | Published Maven Jars | Build from Source | -| --------------------------- | -------------------- | ----------------- | -| Linux (amd64) | Yes | Yes | -| Linux (arm64) | Yes | Yes | -| Apple macOS (Apple Silicon) | No | Yes | +| Operating System | Published Maven Jars | Build from Source | +| ----------------------------- | -------------------- | ----------------- | +| Linux (amd64) | Yes | Yes | +| Linux (arm64) | Yes | Yes | +| Apple macOS (Apple Silicon) | No | Yes | ### Supported Spark Versions From ba01596cce5201637329dacbf54c24b4d309a332 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Wed, 6 May 2026 19:32:02 -0600 Subject: [PATCH 08/15] fix: apply prettier formatting to markdown docs --- docs/source/contributor-guide/debugging.md | 1 + docs/source/contributor-guide/native_shuffle.md | 6 ++++++ docs/source/user-guide/latest/installation.md | 10 +++++----- 3 files changed, 12 insertions(+), 5 deletions(-) diff --git a/docs/source/contributor-guide/debugging.md b/docs/source/contributor-guide/debugging.md index e5372d922d..6893e0f16e 100644 --- a/docs/source/contributor-guide/debugging.md +++ b/docs/source/contributor-guide/debugging.md @@ -73,6 +73,7 @@ process handle -n true -p true -s false SIGBUS SIGSEGV SIGILL ### In CLion 1. After the breakpoint is hit in IntelliJ, in Clion (or LLDB from terminal or editor) - + 1. Attach to the jvm process (make sure the PID matches). In CLion, this is `Run -> Attach to process` 1. Put your breakpoint in the native code diff --git a/docs/source/contributor-guide/native_shuffle.md b/docs/source/contributor-guide/native_shuffle.md index 18e80a90c8..9cf30dca82 100644 --- a/docs/source/contributor-guide/native_shuffle.md +++ b/docs/source/contributor-guide/native_shuffle.md @@ -49,6 +49,7 @@ Native shuffle (`CometExchange`) is selected when all of the following condition columnar output. Row-based Spark operators require JVM shuffle. 3. **Supported partitioning type**: Native shuffle supports: + - `HashPartitioning` - `RangePartitioning` - `SinglePartition` @@ -124,12 +125,14 @@ Native shuffle (`CometExchange`) is selected when all of the following condition ### Write Path 1. **Plan construction**: `CometNativeShuffleWriter` builds a protobuf operator plan containing: + - A scan operator reading from the input iterator - A `ShuffleWriter` operator with partitioning config and compression codec 2. **Native execution**: `CometExec.getCometIterator()` executes the plan in Rust. 3. **Partitioning**: `ShuffleWriterExec` receives batches and routes to the appropriate partitioner: + - `MultiPartitionShuffleRepartitioner`: For hash/range/round-robin partitioning - `SinglePartitionShufflePartitioner`: For single partition (simpler path) @@ -137,11 +140,13 @@ Native shuffle (`CometExchange`) is selected when all of the following condition exceeds the threshold, partitions spill to temporary files. 5. **Encoding**: `ShuffleBlockWriter` encodes each partition's data as compressed Arrow IPC: + - Writes compression type header - Writes field count header - Writes compressed IPC stream 6. **Output files**: Two files are produced: + - **Data file**: Concatenated partition data - **Index file**: Array of 8-byte little-endian offsets marking partition boundaries @@ -153,6 +158,7 @@ Native shuffle (`CometExchange`) is selected when all of the following condition 1. `CometBlockStoreShuffleReader` fetches shuffle blocks via `ShuffleBlockFetcherIterator`. 2. For each block, `NativeBatchDecoderIterator`: + - Reads the 8-byte compressed length header - Reads the 8-byte field count header - Reads the compressed IPC data diff --git a/docs/source/user-guide/latest/installation.md b/docs/source/user-guide/latest/installation.md index fce19e329e..020acef30c 100644 --- a/docs/source/user-guide/latest/installation.md +++ b/docs/source/user-guide/latest/installation.md @@ -28,11 +28,11 @@ Make sure the following requirements are met and software installed on your mach The published Comet jar files in Maven Central bundle native libraries for Linux only (amd64 and arm64). macOS users must [build from source](source.md). -| Operating System | Published Maven Jars | Build from Source | -| ----------------------------- | -------------------- | ----------------- | -| Linux (amd64) | Yes | Yes | -| Linux (arm64) | Yes | Yes | -| Apple macOS (Apple Silicon) | No | Yes | +| Operating System | Published Maven Jars | Build from Source | +| --------------------------- | -------------------- | ----------------- | +| Linux (amd64) | Yes | Yes | +| Linux (arm64) | Yes | Yes | +| Apple macOS (Apple Silicon) | No | Yes | ### Supported Spark Versions From 6aac6f919388ff3b09886cd011964abe40e392b9 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Thu, 7 May 2026 06:17:56 -0600 Subject: [PATCH 09/15] docs: note alternative Comet artifact builds in Iceberg example --- docs/source/user-guide/latest/iceberg.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/source/user-guide/latest/iceberg.md b/docs/source/user-guide/latest/iceberg.md index e5c0c6473e..5c63ae9ad6 100644 --- a/docs/source/user-guide/latest/iceberg.md +++ b/docs/source/user-guide/latest/iceberg.md @@ -29,6 +29,10 @@ The example below uses Spark's package downloader to retrieve Comet $COMET_VERSI 1.8.1, but Comet has been tested with Iceberg 1.5, 1.7, 1.8, 1.9, and 1.10. The native Iceberg reader is enabled by default. To disable it, set `spark.comet.scan.icebergNative.enabled=false`. +The example uses the Spark 3.5 / Scala 2.12 build of Comet; substitute the Comet artifact +matching your Spark and Scala versions (Comet also ships Spark 3.5 / Scala 2.13 and Spark +4.0/4.1 / Scala 2.13 jars; see the [installation guide](installation.md) for the full list). + ```shell $SPARK_HOME/bin/spark-shell \ --packages org.apache.datafusion:comet-spark-spark3.5_2.12:$COMET_VERSION,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \ From 5c9d830e0e18f0153f03337f4f465b91c7aeee92 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Thu, 7 May 2026 06:22:07 -0600 Subject: [PATCH 10/15] compat --- .../user-guide/latest/compatibility/scans.md | 16 ++++++++++++++++ .../latest/compatibility/spark-versions.md | 12 ++++++++++++ 2 files changed, 28 insertions(+) diff --git a/docs/source/user-guide/latest/compatibility/scans.md b/docs/source/user-guide/latest/compatibility/scans.md index aeb11cbd5a..d713dd4e8f 100644 --- a/docs/source/user-guide/latest/compatibility/scans.md +++ b/docs/source/user-guide/latest/compatibility/scans.md @@ -80,6 +80,22 @@ requires `spark.comet.exec.enabled=true` because the scan node must be wrapped b are detected at read time and raise a `SparkRuntimeException` with error class `_LEGACY_ERROR_TEMP_2093`, matching Spark's behavior. +The following `native_datafusion` limitations may produce incorrect results on Spark versions prior to 4.0 +without falling back to Spark: + +- Reading `TimestampLTZ` as `TimestampNTZ`. On Spark 3.x, Spark raises an error per + [SPARK-36182](https://issues.apache.org/jira/browse/SPARK-36182) because LTZ encodes UTC-adjusted instants + that cannot be safely reinterpreted as timezone-free values. Comet does not raise this error and instead + returns the raw UTC instant as a `TimestampNTZ` value. This applies to all LTZ physical encodings (INT96, + TIMESTAMP_MICROS, TIMESTAMP_MILLIS). On Spark 4.0+, this read is permitted + ([SPARK-47447](https://issues.apache.org/jira/browse/SPARK-47447)) and Comet matches Spark's behavior. + See [#4219](https://github.com/apache/datafusion-comet/issues/4219). + +- Unsupported Parquet type conversions. Spark raises schema incompatibility errors for certain conversions + (e.g., reading INT32 as BIGINT, reading BINARY as TIMESTAMP, unsupported decimal precision changes). + The `native_datafusion` scan may not detect these mismatches and could return unexpected values instead + of raising an error. See [#3720](https://github.com/apache/datafusion-comet/issues/3720). + ## `native_iceberg_compat` Limitations The `native_iceberg_compat` scan has the following additional limitation that may produce incorrect results diff --git a/docs/source/user-guide/latest/compatibility/spark-versions.md b/docs/source/user-guide/latest/compatibility/spark-versions.md index 5c8225ae5c..663be37a2a 100644 --- a/docs/source/user-guide/latest/compatibility/spark-versions.md +++ b/docs/source/user-guide/latest/compatibility/spark-versions.md @@ -28,10 +28,22 @@ compatibility guide. Spark 3.4.3 is supported with Java 11/17 and Scala 2.12/2.13. +### Known Limitations + +- **Reading `TimestampLTZ` as `TimestampNTZ`**: Spark 3.4 raises an error for this operation + (SPARK-36182), but Comet's `native_datafusion` scan silently returns the raw UTC value instead. + See [Parquet Compatibility](scans.md#native_datafusion-limitations) for details. + ## Spark 3.5 Spark 3.5.8 is supported with Java 11/17 and Scala 2.12/2.13. +### Known Limitations + +- **Reading `TimestampLTZ` as `TimestampNTZ`**: Spark 3.5 raises an error for this operation + (SPARK-36182), but Comet's `native_datafusion` scan silently returns the raw UTC value instead. + See [Parquet Compatibility](scans.md#native_datafusion-limitations) for details. + ## Spark 4.0 Spark 4.0.2 is supported with Java 17 and Scala 2.13. From 89b3baf29ba1e91defab52bbf00d959106a8bd61 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Thu, 7 May 2026 06:39:42 -0600 Subject: [PATCH 11/15] update installation guide --- docs/source/user-guide/latest/installation.md | 37 ++++++++++--------- 1 file changed, 19 insertions(+), 18 deletions(-) diff --git a/docs/source/user-guide/latest/installation.md b/docs/source/user-guide/latest/installation.md index 020acef30c..416545ba14 100644 --- a/docs/source/user-guide/latest/installation.md +++ b/docs/source/user-guide/latest/installation.md @@ -92,7 +92,6 @@ Here are the direct links for downloading the Comet $COMET_VERSION jar file. - [Comet plugin for Spark 3.5 / Scala 2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.5_2.13/$COMET_VERSION/comet-spark-spark3.5_2.13-$COMET_VERSION.jar) - [Comet plugin for Spark 4.0 / Scala 2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark4.0_2.13/$COMET_VERSION/comet-spark-spark4.0_2.13-$COMET_VERSION.jar) - [Comet plugin for Spark 4.1 / Scala 2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark4.1_2.13/$COMET_VERSION/comet-spark-spark4.1_2.13-$COMET_VERSION.jar) -- [Comet plugin for Spark 4.2 / Scala 2.13 (Experimental)](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark4.2_2.13/$COMET_VERSION/comet-spark-spark4.2_2.13-$COMET_VERSION.jar) ## Building from source @@ -121,7 +120,7 @@ $SPARK_HOME/bin/spark-shell \ --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \ --conf spark.comet.explainFallback.enabled=true \ --conf spark.memory.offHeap.enabled=true \ - --conf spark.memory.offHeap.size=16g + --conf spark.memory.offHeap.size=4g ``` ### Verify Comet enabled for Spark SQL query @@ -132,6 +131,16 @@ Create a test Parquet source scala> (0 until 10).toDF("a").write.mode("overwrite").parquet("/tmp/test") ``` +Comet will log output similar to: + +```shell +INFO core/src/lib.rs: Comet native library version $COMET_VERSION initialized +WARN CometExecRule: Comet cannot execute some parts of this plan natively (set spark.comet.explainFallback.enabled=false to disable this logging): + Execute InsertIntoHadoopFsRelationCommand [COMET: Native support for operator DataWritingCommandExec is disabled. Set spark.comet.parquet.write.enabled=true to enable it.] ++- WriteFiles + +- LocalTableScan [COMET: Native support for operator LocalTableScanExec is disabled. Set spark.comet.exec.localTableScan.enabled=true to enable it.] +``` + Query the data from the test source and check: - INFO message shows the native Comet library has been initialized. @@ -140,24 +149,16 @@ Query the data from the test source and check: ```scala scala> spark.read.parquet("/tmp/test").createOrReplaceTempView("t1") scala> spark.sql("select * from t1 where a > 5").explain -INFO src/lib.rs: Comet native library initialized -== Physical Plan == - *(1) ColumnarToRow - +- CometFilter [a#14], (isnotnull(a#14) AND (a#14 > 5)) - +- CometScan parquet [a#14] Batched: true, DataFilters: [isnotnull(a#14), (a#14 > 5)], - Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/test], PartitionFilters: [], - PushedFilters: [IsNotNull(a), GreaterThan(a,5)], ReadSchema: struct ``` -With the configuration `spark.comet.explainFallback.enabled=true`, Comet will log any reasons that prevent a plan from -being executed natively. +Comet will log output similar to: -```scala -scala> Seq(1,2,3,4).toDF("a").write.parquet("/tmp/test.parquet") -WARN CometSparkSessionExtensions$CometExecRule: Comet cannot execute some parts of this plan natively because: - - LocalTableScan is not supported - - WriteFiles is not supported - - Execute InsertIntoHadoopFsRelationCommand is not supported +```shell +INFO src/lib.rs: Comet native library initialized + == Physical Plan == + CometNativeColumnarToRow + +- CometFilter [a#6], (isnotnull(a#6) AND (a#6 > 5)) ++- CometNativeScan parquet [a#6] Batched: true, DataFilters: [isnotnull(a#6), (a#6 > 5)], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/test], PartitionFilters: [], PushedFilters: [IsNotNull(a), GreaterThan(a,5)], ReadSchema: struct ``` ## Additional Configuration @@ -166,7 +167,7 @@ Depending on your deployment mode you may also need to set the driver & executor explicitly contain Comet otherwise Spark may use a different class-loader for the Comet components than its internal components which will then fail at runtime. For example: -``` +```shell --driver-class-path spark/target/comet-spark-spark4.1_2.13-$COMET_VERSION.jar ``` From c6c5e6493d7ed0685e89edcde7bba22bf958c854 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Thu, 7 May 2026 06:45:45 -0600 Subject: [PATCH 12/15] building from source pages - different content for snapshot vs release --- docs/source/user-guide/latest/installation.md | 9 ++++----- docs/source/user-guide/latest/source.md | 11 +++++++++++ 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/docs/source/user-guide/latest/installation.md b/docs/source/user-guide/latest/installation.md index 416545ba14..69f780c6c9 100644 --- a/docs/source/user-guide/latest/installation.md +++ b/docs/source/user-guide/latest/installation.md @@ -154,11 +154,10 @@ scala> spark.sql("select * from t1 where a > 5").explain Comet will log output similar to: ```shell -INFO src/lib.rs: Comet native library initialized - == Physical Plan == - CometNativeColumnarToRow - +- CometFilter [a#6], (isnotnull(a#6) AND (a#6 > 5)) -+- CometNativeScan parquet [a#6] Batched: true, DataFilters: [isnotnull(a#6), (a#6 > 5)], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/test], PartitionFilters: [], PushedFilters: [IsNotNull(a), GreaterThan(a,5)], ReadSchema: struct +== Physical Plan == +CometNativeColumnarToRow ++- CometFilter [a#6], (isnotnull(a#6) AND (a#6 > 5)) + +- CometNativeScan parquet [a#6] Batched: true, DataFilters: [isnotnull(a#6), (a#6 > 5)], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/test], PartitionFilters: [], PushedFilters: [IsNotNull(a), GreaterThan(a,5)], ReadSchema: struct ``` ## Additional Configuration diff --git a/docs/source/user-guide/latest/source.md b/docs/source/user-guide/latest/source.md index 6ae43be56a..841c4e2084 100644 --- a/docs/source/user-guide/latest/source.md +++ b/docs/source/user-guide/latest/source.md @@ -23,6 +23,15 @@ It is sometimes preferable to build from source for a specific platform. ## Using a Published Source Release + + +This documentation is for the current development version of Comet. Published source releases are only available for released versions. +To use this version of Comet, see the following section on building from the GitHubn repository. + + + + + Official source releases can be downloaded from https://dist.apache.org/repos/dist/release/datafusion/ ```console @@ -41,6 +50,8 @@ Build make release-nogit PROFILES="-Pspark-4.1" ``` + + ## Building from the GitHub repository Clone the repository: From a60340dac758ca1032d4e90c7becd0992bbb7183 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Thu, 7 May 2026 07:05:28 -0600 Subject: [PATCH 13/15] improve docs layout --- docs/source/asf/index.md | 5 +++++ docs/source/conf.py | 4 ++-- docs/source/contributor-guide/benchmarking.md | 10 ++++++++++ docs/source/contributor-guide/index.md | 13 +++++++++++++ docs/source/index.md | 15 ++++----------- docs/source/user-guide/index.md | 8 ++++++++ .../latest/compatibility/expressions/index.md | 1 + .../user-guide/latest/compatibility/index.md | 1 + docs/source/user-guide/latest/datasources.md | 10 +++++----- docs/source/user-guide/latest/index.rst | 10 ++++++++++ docs/source/user-guide/latest/installation.md | 2 +- .../scala/org/apache/comet/GenerateDocs.scala | 2 +- 12 files changed, 61 insertions(+), 20 deletions(-) diff --git a/docs/source/asf/index.md b/docs/source/asf/index.md index 3d7c9810db..e461f68d47 100644 --- a/docs/source/asf/index.md +++ b/docs/source/asf/index.md @@ -19,9 +19,14 @@ under the License. # ASF Links +Apache DataFusion Comet is part of the Apache Software Foundation. The links below point to ASF +resources covering licensing, donations, security reporting, and the Foundation's code of conduct. +Select a link from the navigation menu. + ```{toctree} :maxdepth: 1 :caption: ASF Links +:hidden: Apache Software Foundation License diff --git a/docs/source/conf.py b/docs/source/conf.py index faf6709735..746c8a5659 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -134,8 +134,8 @@ "**": ["docs-sidebar.html"], } -# tell myst_parser to auto-generate anchor links for headers h1, h2, h3 -myst_heading_anchors = 3 +# tell myst_parser to auto-generate anchor links for headers h1, h2, h3, h4 +myst_heading_anchors = 4 # enable nice rendering of checkboxes for the task lists myst_enable_extensions = ["colon_fence", "deflist", "tasklist"] diff --git a/docs/source/contributor-guide/benchmarking.md b/docs/source/contributor-guide/benchmarking.md index ecc4fcf277..4bfd5ae571 100644 --- a/docs/source/contributor-guide/benchmarking.md +++ b/docs/source/contributor-guide/benchmarking.md @@ -39,3 +39,13 @@ Available benchmarking guides: - [TPC-DS Benchmarking with spark-sql-perf](benchmarking_spark_sql_perf.md) We also have many micro benchmarks that can be run from an IDE located [here](https://github.com/apache/datafusion-comet/tree/main/spark/src/test/scala/org/apache/spark/sql/benchmark). + +```{toctree} +:hidden: + +benchmark-results/tpc-h +benchmark-results/tpc-ds +benchmarking_macos +benchmarking_aws_ec2 +benchmarking_spark_sql_perf +``` diff --git a/docs/source/contributor-guide/index.md b/docs/source/contributor-guide/index.md index 20e73c7428..77c73d68da 100644 --- a/docs/source/contributor-guide/index.md +++ b/docs/source/contributor-guide/index.md @@ -19,9 +19,21 @@ under the License. # Comet Contributor Guide +The Comet contributor guide is for developers working on Comet itself. It covers the project +architecture, the JVM and native code layout, the Arrow FFI bridge, JVM and native shuffle, and +how data and plans flow between Spark and the DataFusion execution engine. + +It also documents day-to-day workflows including building and testing locally, debugging, +benchmarking, profiling, tracing, running the SQL test suites, adding new operators and +expressions, triaging bugs, and the Comet release process. + +New contributors should start with the Getting Started page. Select a topic from the navigation +menu to read more. + ```{toctree} :maxdepth: 2 :caption: Contributor Guide +:hidden: Getting Started Comet Plugin Overview @@ -30,6 +42,7 @@ JVM Shuffle Native Shuffle Development Guide Debugging Guide +ANSI Error Propagation Benchmarking Guide Adding a New Operator Adding a New Expression diff --git a/docs/source/index.md b/docs/source/index.md index 75c5db07bd..26a2d333db 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -40,21 +40,14 @@ Comet also accelerates Apache Iceberg, when performing Parquet scans from Spark. Comet delivers a performance speedup for many queries, enabling faster data processing and shorter time-to-insights. -The following chart shows the time it takes to run the 22 TPC-H queries against 100 GB of data in Parquet format -using a single executor with 8 cores. See the [Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html) -for details of the environment used for these benchmarks. +The following charts demonstrate Comet accelerating TPC-H @ 1 TB. See the [Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html) +for details. -When using Comet, the overall run time is reduced from 687 seconds to 302 seconds, a 2.2x speedup. - -![](_static/images/benchmark-results/0.11.0/tpch_allqueries.png) +![](_static/images/benchmark-results/0.15.0/tpch_allqueries.png) Here is a breakdown showing relative performance of Spark and Comet for each TPC-H query. -![](_static/images/benchmark-results/0.11.0/tpch_queries_compare.png) - -These benchmarks can be reproduced in any environment using the documentation in the -[Comet Benchmarking Guide](/contributor-guide/benchmarking.md). We encourage -you to run your own benchmarks. +![](_static/images/benchmark-results/0.15.0/tpch_queries_compare.png) ## Use Commodity Hardware diff --git a/docs/source/user-guide/index.md b/docs/source/user-guide/index.md index 8fe9f93286..65736c1d46 100644 --- a/docs/source/user-guide/index.md +++ b/docs/source/user-guide/index.md @@ -19,9 +19,17 @@ under the License. # Comet User Guide +The Comet user guide covers installation, configuration, supported data sources, supported operators +and expressions, and tuning advice for running Apache Spark with Comet acceleration. + +User guides are published for each release. The development snapshot tracks the upcoming release and +may include features and fixes that are not yet generally available. Select a version from the +navigation menu to view its guide. + ```{toctree} :maxdepth: 2 :caption: User Guides +:hidden: 0.16.0-SNAPSHOT 0.15.x <0.15/index> diff --git a/docs/source/user-guide/latest/compatibility/expressions/index.md b/docs/source/user-guide/latest/compatibility/expressions/index.md index 9fbec44f0b..ba0c4b5d50 100644 --- a/docs/source/user-guide/latest/compatibility/expressions/index.md +++ b/docs/source/user-guide/latest/compatibility/expressions/index.md @@ -27,6 +27,7 @@ Compatibility notes are grouped by expression category: ```{toctree} :maxdepth: 1 +:hidden: aggregate array diff --git a/docs/source/user-guide/latest/compatibility/index.md b/docs/source/user-guide/latest/compatibility/index.md index 7bda0570d1..1ba9d9e181 100644 --- a/docs/source/user-guide/latest/compatibility/index.md +++ b/docs/source/user-guide/latest/compatibility/index.md @@ -32,6 +32,7 @@ This guide documents areas where Comet's behavior is known to differ from Spark. ```{toctree} :maxdepth: 1 +:hidden: scans floating-point diff --git a/docs/source/user-guide/latest/datasources.md b/docs/source/user-guide/latest/datasources.md index 46030e4051..065b719ba5 100644 --- a/docs/source/user-guide/latest/datasources.md +++ b/docs/source/user-guide/latest/datasources.md @@ -181,7 +181,7 @@ the `object_store` crate's format. This implementation maintains compatibility with existing Hadoop S3A configurations, so existing code will continue to work as long as the configurations are supported and can be translated without loss of functionality. -#### Root CA Certificates +### Root CA Certificates One major difference between Spark and Comet is the mechanism for discovering Root CA Certificates. Spark uses the JVM to read CA Certificates from the Java Trust Store, but native Comet @@ -189,7 +189,7 @@ scans use system Root CA Certificates (typically stored in `/etc/ssl/certs` on Linux). These scans will not be able to interact with S3 if the Root CA Certificates are not installed. -#### Supported Credential Providers +### Supported Credential Providers AWS credential providers can be configured using the `fs.s3a.aws.credentials.provider` configuration. The following table shows the supported credential providers and their configuration options: @@ -208,7 +208,7 @@ AWS credential providers can be configured using the `fs.s3a.aws.credentials.pro Multiple credential providers can be specified in a comma-separated list using the `fs.s3a.aws.credentials.provider` configuration, just as Hadoop AWS supports. If `fs.s3a.aws.credentials.provider` is not configured, Hadoop S3A's default credential provider chain will be used. All configuration options also support bucket-specific overrides using the pattern `fs.s3a.bucket.{bucket-name}.{option}`. -#### Additional S3 Configuration Options +### Additional S3 Configuration Options Beyond credential providers, the `native_datafusion` and `native_iceberg_compat` implementations support additional S3 configuration options: @@ -222,7 +222,7 @@ S3 configuration options: All configuration options support bucket-specific overrides using the pattern `fs.s3a.bucket.{bucket-name}.{option}`. -#### Examples +### Examples The following examples demonstrate how to configure S3 access with the `native_datafusion` and `native_iceberg_compat` Parquet scan implementations using different authentication methods. @@ -255,7 +255,7 @@ $SPARK_HOME/bin/spark-shell \ ... ``` -#### Limitations +### Limitations The S3 support of `native_datafusion` and `native_iceberg_compat` has the following limitations: diff --git a/docs/source/user-guide/latest/index.rst b/docs/source/user-guide/latest/index.rst index 480ec4f702..314a0a51bd 100644 --- a/docs/source/user-guide/latest/index.rst +++ b/docs/source/user-guide/latest/index.rst @@ -22,10 +22,20 @@ Comet $COMET_VERSION User Guide ================================ +This guide covers Comet $COMET_VERSION: how to install it, build it from source, configure it for +your Spark deployment, and get the best results from it. It also documents the data sources, data +types, operators, and expressions that Comet supports, along with a compatibility guide describing +known differences from Apache Spark. + +Operational topics include reading and understanding Comet query plans, tuning, available metrics, +and integration guides for Apache Iceberg and Kubernetes. Select a topic from the navigation menu +to read more. + .. _toc.user-guide-links-$COMET_VERSION: .. toctree:: :maxdepth: 1 :caption: Comet $COMET_VERSION User Guide + :hidden: Installing Comet Building From Source diff --git a/docs/source/user-guide/latest/installation.md b/docs/source/user-guide/latest/installation.md index 69f780c6c9..4b7717b688 100644 --- a/docs/source/user-guide/latest/installation.md +++ b/docs/source/user-guide/latest/installation.md @@ -40,7 +40,7 @@ Comet $COMET_VERSION supports the following versions of Apache Spark. Refer to t in the [Compatibility Guide] for more information, such as known limitations per Spark version. [Spark Version Compatibility]: compatibility/spark-versions.md -[Compatibility Guide]: compatibility +[Compatibility Guide]: compatibility/index.md We recommend only using Comet with Spark versions where we currently have both Comet and Spark tests enabled in CI. Other versions may work well enough for development and evaluation purposes. diff --git a/spark/src/main/scala/org/apache/comet/GenerateDocs.scala b/spark/src/main/scala/org/apache/comet/GenerateDocs.scala index e35f90e5c7..870fb5e47d 100644 --- a/spark/src/main/scala/org/apache/comet/GenerateDocs.scala +++ b/spark/src/main/scala/org/apache/comet/GenerateDocs.scala @@ -237,7 +237,7 @@ object GenerateDocs { compat.nonEmpty || incompat.nonEmpty || unsupported.nonEmpty } for ((name, compat, incompat, unsupported) <- sorted) { - w.write(s"\n### $name\n".getBytes) + w.write(s"\n## $name\n".getBytes) if (compat.nonEmpty) { w.write( ("\nThe following differences from Spark are always present and do not require" + From 6d0eee2d8b025924454dc8d8a9805283de071b34 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Thu, 7 May 2026 07:10:35 -0600 Subject: [PATCH 14/15] prettier --- docs/source/contributor-guide/debugging.md | 1 - docs/source/contributor-guide/native_shuffle.md | 6 ------ 2 files changed, 7 deletions(-) diff --git a/docs/source/contributor-guide/debugging.md b/docs/source/contributor-guide/debugging.md index 6893e0f16e..e5372d922d 100644 --- a/docs/source/contributor-guide/debugging.md +++ b/docs/source/contributor-guide/debugging.md @@ -73,7 +73,6 @@ process handle -n true -p true -s false SIGBUS SIGSEGV SIGILL ### In CLion 1. After the breakpoint is hit in IntelliJ, in Clion (or LLDB from terminal or editor) - - 1. Attach to the jvm process (make sure the PID matches). In CLion, this is `Run -> Attach to process` 1. Put your breakpoint in the native code diff --git a/docs/source/contributor-guide/native_shuffle.md b/docs/source/contributor-guide/native_shuffle.md index 9cf30dca82..18e80a90c8 100644 --- a/docs/source/contributor-guide/native_shuffle.md +++ b/docs/source/contributor-guide/native_shuffle.md @@ -49,7 +49,6 @@ Native shuffle (`CometExchange`) is selected when all of the following condition columnar output. Row-based Spark operators require JVM shuffle. 3. **Supported partitioning type**: Native shuffle supports: - - `HashPartitioning` - `RangePartitioning` - `SinglePartition` @@ -125,14 +124,12 @@ Native shuffle (`CometExchange`) is selected when all of the following condition ### Write Path 1. **Plan construction**: `CometNativeShuffleWriter` builds a protobuf operator plan containing: - - A scan operator reading from the input iterator - A `ShuffleWriter` operator with partitioning config and compression codec 2. **Native execution**: `CometExec.getCometIterator()` executes the plan in Rust. 3. **Partitioning**: `ShuffleWriterExec` receives batches and routes to the appropriate partitioner: - - `MultiPartitionShuffleRepartitioner`: For hash/range/round-robin partitioning - `SinglePartitionShufflePartitioner`: For single partition (simpler path) @@ -140,13 +137,11 @@ Native shuffle (`CometExchange`) is selected when all of the following condition exceeds the threshold, partitions spill to temporary files. 5. **Encoding**: `ShuffleBlockWriter` encodes each partition's data as compressed Arrow IPC: - - Writes compression type header - Writes field count header - Writes compressed IPC stream 6. **Output files**: Two files are produced: - - **Data file**: Concatenated partition data - **Index file**: Array of 8-byte little-endian offsets marking partition boundaries @@ -158,7 +153,6 @@ Native shuffle (`CometExchange`) is selected when all of the following condition 1. `CometBlockStoreShuffleReader` fetches shuffle blocks via `ShuffleBlockFetcherIterator`. 2. For each block, `NativeBatchDecoderIterator`: - - Reads the 8-byte compressed length header - Reads the 8-byte field count header - Reads the compressed IPC data From 9a457f032bb7b865d325a572590dfd303f04cbd1 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Thu, 7 May 2026 07:17:27 -0600 Subject: [PATCH 15/15] combine index.html and about/index.html --- docs/source/about/index.md | 73 -------------------------------------- docs/source/conf.py | 5 +-- docs/source/index.md | 18 ++++++++-- 3 files changed, 18 insertions(+), 78 deletions(-) delete mode 100644 docs/source/about/index.md diff --git a/docs/source/about/index.md b/docs/source/about/index.md deleted file mode 100644 index 43d25f8dbd..0000000000 --- a/docs/source/about/index.md +++ /dev/null @@ -1,73 +0,0 @@ - - -# Comet Overview - -Apache DataFusion Comet is a high-performance accelerator for Apache Spark, built on top of the powerful -[Apache DataFusion] query engine. Comet is designed to significantly enhance the -performance of Apache Spark workloads while leveraging commodity hardware and seamlessly integrating with the -Spark ecosystem without requiring any code changes. - -[Apache DataFusion]: https://datafusion.apache.org - -The following diagram provides an overview of Comet's architecture. - -![Comet Overview](/_static/images/comet-overview.png) - -## Architecture - -The following diagram shows how Comet integrates with Apache Spark. - -![Comet System Diagram](/_static/images/comet-system-diagram.png) - -## Feature Parity with Apache Spark - -The project strives to keep feature parity with Apache Spark, that is, -users should expect the same behavior (w.r.t features, configurations, -query results, etc) with Comet turned on or turned off in their Spark -jobs. In addition, Comet extension should automatically detect unsupported -features and fallback to Spark engine. - -## Comparison with other open-source Spark accelerators - -There are two other major open-source Spark accelerators: - -- [Apache Gluten (incubating)](https://github.com/apache/incubator-gluten) -- [NVIDIA Spark RAPIDS](https://github.com/NVIDIA/spark-rapids) - -We have a detailed guide [comparing Apache DataFusion Comet with Apache Gluten]. - -Spark RAPIDS is a solution that provides hardware acceleration on NVIDIA GPUs. Comet does not require specialized -hardware. - -[comparing Apache DataFusion Comet with Apache Gluten]: gluten_comparison.md - -## Getting Started - -Refer to the [Comet Installation Guide] to get started. - -[Comet Installation Guide]: /user-guide/latest/installation.md - -```{toctree} -:maxdepth: 1 -:caption: About -:hidden: - -Comparison with Gluten -``` diff --git a/docs/source/conf.py b/docs/source/conf.py index 746c8a5659..311e6fd754 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -141,9 +141,10 @@ myst_enable_extensions = ["colon_fence", "deflist", "tasklist"] redirects = { - "overview.html": "about/index.html", + "overview.html": "index.html", + "about/index.html": "../index.html", "gluten_comparison.html": "about/gluten_comparison.html", - "user-guide/overview.html": "../about/overview.html", + "user-guide/overview.html": "../index.html", "user-guide/gluten_comparison.html": "../about/gluten_comparison.html", "user-guide/compatibility.html": "latest/compatibility.html", "user-guide/configs.html": "latest/configs.html", diff --git a/docs/source/index.md b/docs/source/index.md index 26a2d333db..ba421a6036 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -61,12 +61,26 @@ Comet aims for 100% compatibility with all supported versions of Apache Spark, a your existing Spark deployments and workflows seamlessly. With no code changes required, you can immediately harness the benefits of Comet's acceleration capabilities without disrupting your Spark applications. +The project strives to keep feature parity with Apache Spark, that is, users should expect the same behavior (w.r.t +features, configurations, query results, etc) with Comet turned on or turned off in their Spark jobs. In addition, +the Comet extension automatically detects unsupported features and falls back to the Spark engine. + ## Tight Integration with Apache DataFusion Comet tightly integrates with the core Apache DataFusion project, leveraging its powerful execution engine. With seamless interoperability between Comet and DataFusion, you can achieve optimal performance and efficiency in your Spark workloads. +## Architecture + +The following diagram provides an overview of Comet's architecture. + +![Comet Overview](_static/images/comet-overview.png) + +The following diagram shows how Comet integrates with Apache Spark. + +![Comet System Diagram](_static/images/comet-system-diagram.png) + ## Active Community Comet boasts a vibrant and active community of developers, contributors, and users dedicated to advancing the @@ -79,8 +93,6 @@ To get started with Apache DataFusion Comet, follow the [DataFusion Slack and Discord channels](https://datafusion.apache.org/contributor-guide/communication.html) to connect with other users, ask questions, and share your experiences with Comet. -Follow [Apache DataFusion Comet Overview](https://datafusion.apache.org/comet/about/index.html) to get more detailed information - ## Contributing We welcome contributions from the community to help improve and enhance Apache DataFusion Comet. Whether it's fixing @@ -93,8 +105,8 @@ shaping the future of Comet. Check out our :caption: Index :hidden: -Comet Overview User Guide Contributor Guide +Comparison with Gluten ASF Links ```