Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
10dd91a
feat: add native Delta Lake scan via delta-kernel-rs
schenksj Apr 13, 2026
6980fa1
test: Delta native scan test suites
schenksj Apr 13, 2026
0a21380
bench: Delta benchmarks and TPC runner infrastructure
schenksj Apr 13, 2026
2cef606
ci/docs: Delta CI workflow and documentation
schenksj Apr 13, 2026
c0ada8d
feat: expand kernel predicate pushdown with IN and Cast support
schenksj Apr 13, 2026
2b8dedd
docs: add IN/NOT IN and Cast to supported predicates list
schenksj Apr 13, 2026
64e1f3e
fix: address CI linting and security feedback
schenksj Apr 13, 2026
c97b60e
fix: use Hadoop Path when parsing input file URIs in native Delta scan
schenksj Apr 14, 2026
3e4b6a0
test: add Delta Lake regression suite mirroring the Iceberg pattern
schenksj Apr 14, 2026
29361d6
fix: distinguish CometDeltaNativeScan instances across snapshot versions
schenksj Apr 14, 2026
ee5a375
fix: use correct Delta artifact ID for Spark 3.4 test dependency
schenksj Apr 14, 2026
bf38729
test(delta-regression): robustness fixes + DELTA_JAVA_HOME support
schenksj Apr 14, 2026
db4b6eb
test(delta-regression): make Delta 2.4.0 diff install Comet extensions
schenksj Apr 14, 2026
d8aa2bb
Merge remote-tracking branch 'upstream/main' into delta-kernel-phase-1
schenksj Apr 14, 2026
75e404d
fix: populate InputFileBlockHolder for Delta native scans so Delta ME…
schenksj Apr 14, 2026
0dbb807
test(delta-regression): make spark/test actually run on modern JDKs
schenksj Apr 14, 2026
297d857
feat(delta): close major native-scan coverage gaps + row tracking
schenksj Apr 15, 2026
340a594
chore(delta-regression): drop row-id-lookup targeted test
schenksj Apr 15, 2026
24427b5
fix(delta-regression): force UTC JVM timezone; use rootPaths for sche…
schenksj Apr 16, 2026
ff352e4
docs(delta): refresh support matrix; clarify cloud-fetch guard
schenksj Apr 16, 2026
9e88ac0
fix(delta): translate column-mapping names on the pre-materialised-in…
schenksj Apr 16, 2026
6b276b1
fix(delta): code review hardening + CI workflow registration
schenksj Apr 16, 2026
7844db1
fix(delta): nested column mapping, checkpoint errors, TZ partition va…
schenksj Apr 17, 2026
607e89b
fix(delta): MERGE + column mapping name mode unblocks field metadata
schenksj Apr 17, 2026
a4dfc3a
fix(delta): session-TZ timestamp partition values + z-order shuffle b…
schenksj Apr 17, 2026
a01949a
fix(delta): preserve DV filter wrapper when scan would fall back
schenksj Apr 17, 2026
08f1ed8
fix(delta): alias output_rows to numOutputRows for streaming progress
schenksj Apr 17, 2026
92f1da0
refactor(delta): code review hardening round 2
schenksj Apr 17, 2026
2803322
test(delta): add CometDeltaRoundTripSuite for round-2 fix coverage
schenksj Apr 17, 2026
bbc57dc
test(delta): add dev/run-delta-all-failures.sh
schenksj Apr 17, 2026
98aa8b2
fix(delta): broaden TIMESTAMP partition-value parsing + skip nested-a…
schenksj Apr 17, 2026
e65f6ed
fix(delta): preserve DV wrapper for PreparedDeltaFileIndex too
schenksj Apr 18, 2026
71313d2
fix(delta): extend DV fallback to any scan with is_row_deleted + Prep…
schenksj Apr 18, 2026
6804916
test(comet): plain-Parquet repro for Utf8<=Int32 bug establishes the …
schenksj Apr 18, 2026
f917f67
fix(delta): column-mapping scan hardening (fixes drop-column-with-con…
schenksj Apr 18, 2026
609b904
fix(delta): DV-aware UPDATE bookkeeping + name-mode partition keys + …
schenksj Apr 18, 2026
2508588
fix(delta-regression): unwrap CometScanExec in OptimizeGeneratedColum…
schenksj Apr 19, 2026
da19481
fix(delta-regression): stop overriding dvFileNamePrefix in Delta test…
schenksj May 1, 2026
bb35abf
feat(delta): file-splitting via byte-range tasks in CometDeltaNativeScan
schenksj May 1, 2026
b1fce4f
fix(delta): decline scan when Delta synthetic columns are in output
schenksj May 1, 2026
7d8c3b0
fix(aqe): preserve a logical link on Comet exchange wrappers
schenksj May 1, 2026
a0a5caa
fix(delta-regression): replace `spark%dir%prefix` temp-dir prefix in …
schenksj May 2, 2026
24fb5e4
fix(delta-regression): drop both path-prefix test shims
schenksj May 2, 2026
9f9335e
fix(delta): decline native scan when ignoreMissingFiles is enabled
schenksj May 2, 2026
8b6fdaf
feat(parquet): lazy ignoreMissingFiles in the native Delta scan
schenksj May 2, 2026
ff5c92f
chore(dev): add FAST=1 mode to run-delta-regression.sh
schenksj May 2, 2026
7d52de7
fix(aqe): preserve a logical link on every Comet exec, not just excha…
schenksj May 2, 2026
2b09698
fix(delta): single-URI-encoded table roots and PreparedDeltaFileIndex…
schenksj May 2, 2026
f8771bc
fix(delta): always send fully-encoded URI to native scan
schenksj May 2, 2026
2b0d747
chore: gitignore .claude/ scheduled-tasks lock
schenksj May 2, 2026
bf59576
fix(delta): gate physicalName synthesis on table column-mapping mode
schenksj May 2, 2026
ca26110
fix(delta): don't override user-set useMetadataRowIndex
schenksj May 2, 2026
d203466
fix(delta): decline native Delta scan when input_file_name is referenced
schenksj May 2, 2026
845c1e2
fix(delta): bin-pack scan tasks into Spark partitions
schenksj May 3, 2026
0d67db2
feat(delta): drop typeWidening fallback gate
schenksj May 3, 2026
1c12945
feat(delta): drop rowTracking fallback gate
schenksj May 3, 2026
9c1e64e
feat(delta): wire parquet encryption through native Delta scan
schenksj May 3, 2026
3f8f227
chore(delta): remove dead PreparedDeltaFileIndex DV check
schenksj May 3, 2026
197ee4a
fix(delta): use preparedScan.files for PreparedDeltaFileIndex
schenksj May 3, 2026
5c00233
fix(delta): parse TIMESTAMP_NTZ partition values without session TZ s…
schenksj May 3, 2026
1dbfc8b
fix(delta): expose numFiles metric alias + patch DeltaSuite scan-collect
schenksj May 3, 2026
0741f9d
fix(delta): patch merge-metrics shim to accept Comet's 2-file output
schenksj May 3, 2026
533893c
fix(delta): nuke spark-warehouse on regression rerun + skip auto-flip…
schenksj May 3, 2026
7b6e780
fix(delta): add synthesizedFilePartitions + patch DeltaSinkSuite/Iden…
schenksj May 3, 2026
db5fe82
docs(delta): add end-to-end design + internals reference under docs/d…
schenksj May 3, 2026
3b46792
docs(delta): correct row-tracking config keys + document table-root U…
schenksj May 3, 2026
810011f
fix(delta): decline native scan for CM-id mode + stale CM-name reads
schenksj May 3, 2026
be227f9
fix(delta): accept HTTP 3xx redirects in DeltaErrorsSuite URL validation
schenksj May 3, 2026
f993853
fix(delta): accept Comet's parquet error in SC-8810 corrupted-file test
schenksj May 3, 2026
b0ec773
test(delta): skip 2B-row DV delete test in Comet regression diff
schenksj May 3, 2026
19fc6af
perf(delta): cache no-arg Method handles in DeltaReflection lookups
schenksj May 13, 2026
5496bfb
test(delta): bump test heap to 4g and add JDK 17 --add-opens flags
schenksj May 13, 2026
60c24ce
docs(delta): capture #79 investigation notes in CometScanRule gate co…
schenksj May 13, 2026
779eeef
feat(delta): materialise DVs on pre-materialised FileIndex paths
schenksj May 13, 2026
8588afe
docs(delta): inline TODOs for #75 input_file_name native support desi…
schenksj May 13, 2026
a83ecb3
feat(delta): keep input_file_name() native via one-task-per-partition
schenksj May 13, 2026
f84c6ba
fix(delta): correct nested-CM read for column-mapped tables (#79)
schenksj May 14, 2026
461fa4f
feat(delta): rich Hadoop credential chain for kernel log replay
schenksj May 14, 2026
209894f
docs(contrib): finalize contrib-delta migration plan
schenksj May 14, 2026
efb36df
Merge upstream/main into delta-kernel-phase-1
schenksj May 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions .github/actions/setup-delta-builder/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

name: Setup Delta Builder
description: 'Setup Delta Lake to run Spark SQL regression tests with Comet'
inputs:
delta-version:
description: 'The Delta Lake version (e.g., 3.3.2) to build'
required: true
runs:
using: "composite"
steps:
- name: Clone Delta Lake repo
uses: actions/checkout@v6
with:
repository: delta-io/delta
path: delta-lake
ref: v${{inputs.delta-version}}
fetch-depth: 1

- name: Setup Delta Lake for Comet
shell: bash
run: |
cd delta-lake
git apply ../dev/diffs/delta/${{inputs.delta-version}}.diff
152 changes: 152 additions & 0 deletions .github/workflows/delta_regression_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

name: Delta Lake Regression Tests

concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

on:
push:
branches:
- main
paths-ignore:
- "benchmarks/**"
- "doc/**"
- "docs/**"
- "**.md"
- "native/core/benches/**"
- "native/spark-expr/benches/**"
- "spark/src/test/scala/org/apache/spark/sql/benchmark/**"
pull_request:
paths-ignore:
- "benchmarks/**"
- "doc/**"
- "docs/**"
- "**.md"
- "native/core/benches/**"
- "native/spark-expr/benches/**"
- "spark/src/test/scala/org/apache/spark/sql/benchmark/**"
# manual trigger
workflow_dispatch:

permissions:
contents: read

env:
RUST_VERSION: stable
RUST_BACKTRACE: 1

jobs:
# Build native library once and share with all test jobs
build-native:
name: Build Native Library
runs-on: ubuntu-24.04
container:
image: amd64/rust
steps:
- uses: actions/checkout@v6

- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: 17

- name: Restore Cargo cache
uses: actions/cache/restore@v5
with:
path: |
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}
restore-keys: |
${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-

- name: Build native library
# Use CI profile for faster builds (no LTO) and to share cache with pr_build_linux.yml.
run: |
cd native && cargo build --profile ci
env:
RUSTFLAGS: "-Ctarget-cpu=x86-64-v3"

- name: Save Cargo cache
uses: actions/cache/save@v5
if: github.ref == 'refs/heads/main'
with:
path: |
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}

- name: Upload native library
uses: actions/upload-artifact@v7
with:
name: native-lib-delta-regression
path: native/target/ci/libcomet.so
retention-days: 1

delta-spark:
Comment on lines +58 to +106
needs: build-native
strategy:
matrix:
os: [ubuntu-24.04]
java-version: [17]
delta-version:
- {full: '3.3.2', spark-short: '3.5', scala: '2.13', module: 'spark'}
- {full: '4.0.0', spark-short: '4.0', scala: '2.13', module: 'spark'}
- {full: '2.4.0', spark-short: '3.4', scala: '2.12', module: 'core'}
fail-fast: false
name: delta-regression/${{ matrix.os }}/delta-${{ matrix.delta-version.full }}/java-${{ matrix.java-version }}
runs-on: ${{ matrix.os }}
container:
image: amd64/rust
env:
SPARK_LOCAL_IP: localhost
steps:
- uses: actions/checkout@v6
- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: ${{ matrix.java-version }}
- name: Download native library
uses: actions/download-artifact@v8
with:
name: native-lib-delta-regression
path: native/target/release/
- name: Build Comet
run: |
./mvnw install -Prelease -DskipTests -Pspark-${{ matrix.delta-version.spark-short }}
- name: Setup Delta Lake
uses: ./.github/actions/setup-delta-builder
with:
delta-version: ${{ matrix.delta-version.full }}
- name: Run Comet smoke test (fail fast)
# Verify Comet is actually wired into Delta's test SparkSession before
# running the full suite. Catches silent config drift where the plugin
# is on the classpath but not applied to query plans.
run: |
cd delta-lake
build/sbt "${{ matrix.delta-version.module }}/testOnly org.apache.spark.sql.delta.CometSmokeTest"
- name: Run Delta Lake Spark tests
run: |
cd delta-lake
build/sbt "${{ matrix.delta-version.module }}/test"
Comment on lines +107 to +152
136 changes: 136 additions & 0 deletions .github/workflows/delta_spark_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

name: Delta Lake Native Scan Tests

concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

permissions:
contents: read

on:
push:
branches:
- main
paths-ignore:
- "benchmarks/**"
- "doc/**"
- "docs/**"
- "**.md"
- "native/core/benches/**"
- "native/spark-expr/benches/**"
- "spark/src/test/scala/org/apache/spark/sql/benchmark/**"
pull_request:
paths-ignore:
- "benchmarks/**"
- "doc/**"
- "docs/**"
- "**.md"
- "native/core/benches/**"
- "native/spark-expr/benches/**"
- "spark/src/test/scala/org/apache/spark/sql/benchmark/**"
workflow_dispatch:

env:
RUST_VERSION: stable
RUST_BACKTRACE: 1

jobs:
build-native:
name: Build Native Library
runs-on: ubuntu-24.04
container:
image: amd64/rust
steps:
- uses: actions/checkout@v6

- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: 17

- name: Restore Cargo cache
uses: actions/cache/restore@v5
with:
path: |
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}
restore-keys: |
${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-

- name: Build native library
run: |
cd native && cargo build --profile ci
env:
RUSTFLAGS: "-Ctarget-cpu=x86-64-v3"

- name: Save Cargo cache
uses: actions/cache/save@v5
if: github.ref == 'refs/heads/main'
with:
path: |
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}

- name: Upload native library
uses: actions/upload-artifact@v7
with:
name: native-lib-delta
path: native/target/ci/libcomet.so
retention-days: 1

delta-native-suite:
Comment thread
github-advanced-security[bot] marked this conversation as resolved.
Fixed
needs: build-native
strategy:
matrix:
os: [ubuntu-24.04]
java-version: [17]
spark-version:
- {short: '3.4', full: '3.4.3'}
- {short: '3.5', full: '3.5.8'}
- {short: '4.0', full: '4.0.1'}
fail-fast: false
name: delta-native/${{ matrix.os }}/spark-${{ matrix.spark-version.full }}/java-${{ matrix.java-version }}
runs-on: ${{ matrix.os }}
container:
image: amd64/rust
env:
SPARK_LOCAL_IP: localhost
steps:
- uses: actions/checkout@v6
- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: ${{ matrix.java-version }}
- name: Download native library
uses: actions/download-artifact@v8
with:
name: native-lib-delta
path: native/target/debug/
- name: Run CometDeltaNativeSuite
run: |
./mvnw -Pspark-${{ matrix.spark-version.short }} -pl spark -am test \
-Dsuites=org.apache.comet.CometDeltaNativeSuite \
-Dmaven.gitcommitid.skip
Comment thread
github-advanced-security[bot] marked this conversation as resolved.
Fixed
6 changes: 6 additions & 0 deletions .github/workflows/pr_build_linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,12 @@ jobs:
org.apache.comet.CometIcebergNativeSuite
org.apache.comet.CometIcebergRewriteActionSuite
org.apache.comet.iceberg.IcebergReflectionSuite
org.apache.comet.CometDeltaNativeSuite
org.apache.comet.CometDeltaColumnMappingSuite
org.apache.comet.CometDeltaAdvancedSuite
org.apache.comet.CometDeltaRowTrackingSuite
org.apache.comet.CometDeltaRoundTripSuite
org.apache.comet.CometFuzzDeltaSuite
- name: "csv"
value: |
org.apache.comet.csv.CometCsvNativeReadSuite
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/pr_build_macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,12 @@ jobs:
org.apache.comet.CometIcebergNativeSuite
org.apache.comet.CometIcebergRewriteActionSuite
org.apache.comet.iceberg.IcebergReflectionSuite
org.apache.comet.CometDeltaNativeSuite
org.apache.comet.CometDeltaColumnMappingSuite
org.apache.comet.CometDeltaAdvancedSuite
org.apache.comet.CometDeltaRowTrackingSuite
org.apache.comet.CometDeltaRoundTripSuite
org.apache.comet.CometFuzzDeltaSuite
- name: "csv"
value: |
org.apache.comet.csv.CometCsvNativeReadSuite
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ spark/benchmarks
comet-event-trace.json
__pycache__
output
.claude/
docs/comet-*/
docs/build/
docs/temp/
Loading