Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
100 commits
Select commit Hold shift + click to select a range
ca2ffaf
add support clickhouse destination wip
bnjjj Feb 23, 2026
01da517
add support for clickhouse destination in ETL and API
bnjjj Feb 24, 2026
feb8d65
Add example binary
bnjjj Feb 26, 2026
d6bdb14
fixes
bnjjj Mar 2, 2026
df9f0a9
Add local ClickHouse test helper
jmqd Apr 5, 2026
03b6975
Fix ClickHouse test row id type
jmqd Apr 5, 2026
f4543bd
Avoid async cleanup panic in ClickHouse tests
jmqd Apr 5, 2026
f2430af
Fix ClickHouse UUID encoding
jmqd Apr 5, 2026
41c84fc
Add ClickHouse update streaming integration test
jmqd Apr 7, 2026
bc9a5f3
Format ClickHouse destination files
jmqd Apr 7, 2026
0be4e5f
Adapt ClickHouse destination to async flushing API
jmqd Apr 7, 2026
99dd054
Remove unused log_config helpers from replicator
jmqd Apr 7, 2026
2831b72
Sort Cargo.toml files
jmqd Apr 7, 2026
04ead7c
Remove skip_if_missing_clickhouse_env_vars from tests
jmqd Apr 8, 2026
0a92f3d
Add boundary-values integration test for ClickHouse
jmqd Apr 9, 2026
6ba07bb
Add ClickHouse env vars to CI
jmqd Apr 9, 2026
363aa29
Fix missing ClickHouse secret deletion on pipeline teardown
jmqd Apr 10, 2026
bc376b0
Error on NULL value in non-nullable ClickHouse column
jmqd Apr 10, 2026
935112a
Use ApplyWorkerPanic error kind for ClickHouse JoinSet failures
jmqd Apr 10, 2026
0201934
Remove unused PartialEq/Eq impls from AllTypesRow
jmqd Apr 13, 2026
6e8f6d2
Improve wait_for_update_flow_rows timeout diagnostic
jmqd Apr 13, 2026
b16f6f0
Add DELETE streaming integration test for ClickHouse
jmqd Apr 14, 2026
40742da
Add GIVEN/WHEN/THEN structure to ClickHouse integration tests
jmqd Apr 14, 2026
19d074b
Add pipeline restart/recovery integration test for ClickHouse
jmqd Apr 14, 2026
adb28cd
Add truncate integration test for ClickHouse
jmqd Apr 14, 2026
a3b0043
Add intermediate INSERT flush integration test for ClickHouse
jmqd Apr 14, 2026
7ebab69
Add unit test for NULL rejection in non-nullable ClickHouse column
jmqd Apr 15, 2026
2d42b86
Add multi-table integration test for ClickHouse
jmqd Apr 15, 2026
e5fa4b2
Add sequential transaction ordering test for ClickHouse
jmqd Apr 15, 2026
ef46b25
Add default replica identity DELETE test for ClickHouse
jmqd Apr 15, 2026
6ceefab
Use two pre-existing rows in default replica identity DELETE test
jmqd Apr 15, 2026
589de1b
Add large batch (1024 rows) table copy test for ClickHouse
jmqd Apr 16, 2026
b26235b
Add ping connectivity tests for ClickHouseClient
jmqd Apr 16, 2026
a6d09a9
Refine default replica identity DELETE test style and LSN check
jmqd Apr 17, 2026
d57f3c8
Use Url for ClickHouse
jmqd Apr 20, 2026
c1ef49d
Keep ClickHouse password secret
jmqd Apr 20, 2026
af260db
Merge branch 'main' into jm/clickhouse
jmqd Apr 20, 2026
7ba200a
Update ColumnSchema in ClickHouse schema tests for new field layout
jmqd Apr 20, 2026
f0ef83d
Adapt ClickHouse destination to new Destination trait and event APIs
jmqd Apr 20, 2026
b3e910f
Add schema change (ALTER TABLE) support for ClickHouse destination
jmqd Apr 20, 2026
b4981d7
Merge remote-tracking branch 'origin/main' into jm/clickhouse
jmqd Apr 21, 2026
d2df3fb
Merge remote-tracking branch 'origin/main' into jm/clickhouse
jmqd Apr 21, 2026
f7f834d
Add schema change integration tests and fix ALTER TABLE bugs
jmqd Apr 21, 2026
4ab4ab4
Remove unused MemorySnapshot::total() method
jmqd Apr 21, 2026
2ffd34a
Allow match_same_arms in type mapping and add retry to test setup
jmqd Apr 22, 2026
b42f16a
Merge remote-tracking branch 'origin/main' into jm/clickhouse
jmqd Apr 22, 2026
5706d1a
Remove match_same_arms allow attributes
jmqd Apr 22, 2026
ff33dc2
Add schema change crash recovery and fix test flakiness
jmqd Apr 22, 2026
a2939c3
Merge branch 'main' into jm/clickhouse
jmqd Apr 22, 2026
a5da67c
Merge remote-tracking branch 'origin/main' into jm/clickhouse
jmqd Apr 23, 2026
892e480
Expand key-only DELETE rows into full tombstones for ClickHouse
jmqd Apr 24, 2026
90b8016
Use DateTime::UNIX_EPOCH.naive_utc() for Timestamp default cell
jmqd Apr 24, 2026
a86e453
Use RENAME COLUMN IF EXISTS for idempotent column rename
jmqd Apr 24, 2026
cd5d5fa
Assert date/time zero values in default-identity DELETE tombstone test
jmqd Apr 24, 2026
94dd5d6
Merge branch 'main' into jm/clickhouse
jmqd Apr 24, 2026
cf051f4
Merge branch 'main' into jm/clickhouse
jmqd Apr 27, 2026
7eb9bbd
Fix passwordless ClickHouse K8s deployment
jmqd Apr 27, 2026
690005d
Quote ClickHouse SQL identifiers
jmqd Apr 27, 2026
c76b1c6
Validate ClickHouse RowBinary row width
jmqd Apr 27, 2026
2ca3aaf
Store ClickHouse CDC LSN as UInt64
jmqd Apr 27, 2026
a60dd1c
Derive ClickHouse nullable flags from destination schema
jmqd Apr 27, 2026
38316da
Fix ClickHouse test script target
jmqd Apr 27, 2026
a6c738d
Merge branch 'main' into jm/clickhouse
jmqd Apr 27, 2026
6f1556b
Drop ClickHouse section banners and tighten doccomments
jmqd Apr 28, 2026
b0cd129
Rename ClickHouseClient::ping to validate_connectivity
jmqd Apr 28, 2026
02696d3
Have ClickHouse insert_rows call insert.end() once per chunk
jmqd Apr 28, 2026
28d0bc3
Simplify ClickHouse table-exists, event-flush, and array encoding paths
jmqd Apr 28, 2026
3fca742
Polish ClickHouse doccomments and tidy encoding helpers
jmqd Apr 28, 2026
07ee031
Merge branch 'main' into jm/clickhouse
jmqd Apr 29, 2026
fce8975
Polish ClickHouse destination naming and metadata branch
jmqd Apr 29, 2026
f4b20cf
Use shared try_stringify_table_name in ClickHouse destination
jmqd Apr 29, 2026
4d1de92
Record DDL duration histogram for all ClickHouse DDL paths
jmqd Apr 29, 2026
4c9561b
Add DDL kind label to etl_ch_ddl_duration_seconds histogram
jmqd Apr 29, 2026
1413ba6
Merge branch 'main' into jm/clickhouse
jmqd Apr 29, 2026
72e32a2
Spell out 'clickhouse' in identifiers, env vars, and config keys
jmqd Apr 30, 2026
cff36a7
Cap ClickHouse INSERT size at a fixed 64 MiB
jmqd Apr 30, 2026
f50d064
Revert MemorySnapshot to private after dropping external sysinfo probes
jmqd Apr 30, 2026
6154433
Tighten visibility of ClickHouse metrics and schema helpers
jmqd Apr 30, 2026
cd0cb3e
Error on partial update rows in ClickHouse instead of skipping
jmqd Apr 30, 2026
5ff2ba6
Reject unsupported replica identities for ClickHouse
jmqd Apr 30, 2026
4fa09f9
Merge remote-tracking branch 'origin/main' into jm/clickhouse
jmqd Apr 30, 2026
c66924c
Update etl-api/src/configs/destination.rs
jmqd May 1, 2026
e9168b9
Use map().transpose() for ClickHouse password decryption
jmqd May 1, 2026
d2a6821
Drop redundant set_log_level() from ClickHouse example
jmqd May 1, 2026
51cd46e
Drop Arc around ClickHouseInserterConfig
jmqd May 1, 2026
d21f04b
Use FIRST placement when ClickHouse ADD COLUMN has no anchor
jmqd May 1, 2026
4f79d29
Merge branch 'main' into jm/clickhouse
jmqd May 1, 2026
81fbd10
fix(replicator): handle ClickHouse in destination_name
jmqd May 1, 2026
b9fe4af
Construct ClickHouse password Secret directly without json round-trip
jmqd May 1, 2026
55ca2b5
Lower 'delete event has no row data' log to debug
jmqd May 1, 2026
79cd26e
Pass column iterator to build_create_table_sql to drop intermediate Vec
jmqd May 1, 2026
cd486b8
test(ci): include clickhouse_pipeline in shared-pg test group
jmqd May 4, 2026
db30d03
fix(clickhouse): map Postgres date to Date32 and error on out-of-range
jmqd May 4, 2026
6e54249
test(clickhouse): cover pre-1970 and far-future dates in pipeline copy
jmqd May 4, 2026
1fe9f2b
test(clickhouse): rename Date32 boundary constants and inline -1
jmqd May 4, 2026
4323ff6
Drop high-cardinality table label from ClickHouse DDL/INSERT histograms
jmqd May 4, 2026
4e3ebe4
Update ClickHouse metric descriptions to match actual labels
jmqd May 4, 2026
999247e
chore(replicator): drop sample configuration yaml files
jmqd May 4, 2026
a314b47
test(clickhouse): switch DefaultIdentityDeleteRow.date_col to i32
jmqd May 4, 2026
b1c5f5c
Merge branch 'main' into jm/clickhouse
jmqd May 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .config/nextest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ test-threads = "num-cpus"
max-threads = 1

[[profile.default.overrides]]
filter = "test(exclusive_) | binary_id(etl::main) | (binary_id(etl-destinations::main) & test(/^(bigquery_pipeline|ducklake_destination|ducklake_pipeline|iceberg_destination)::/)) | (binary_id(etl-destinations) & test(/ducklake::core::tests::postgres_backed::/))"
filter = "test(exclusive_) | binary_id(etl::main) | (binary_id(etl-destinations::main) & test(/^(bigquery_pipeline|clickhouse_pipeline|ducklake_destination|ducklake_pipeline|iceberg_destination)::/)) | (binary_id(etl-destinations) & test(/ducklake::core::tests::postgres_backed::/))"
test-group = "shared-pg"
3 changes: 3 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,9 @@ jobs:
NUM_LOCAL_DATABASES: 4
TESTS_DATABASE_USERNAME: postgres
TESTS_DATABASE_PASSWORD: postgres
TESTS_CLICKHOUSE_URL: http://localhost:8123
TESTS_CLICKHOUSE_USER: etl
TESTS_CLICKHOUSE_PASSWORD: etl
ETL_DUCKDB_EXTENSION_ROOT: ${{ github.workspace }}/vendor/duckdb/extensions
steps:
- name: Checkout
Expand Down
129 changes: 128 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ byteorder = { version = "1.5.0", default-features = false }
bytes = { version = "1.10.1" }
chrono = { version = "0.4.41", default-features = false }
clap = { version = "4.5.42", default-features = false }
clickhouse = { version = "0.14", default-features = false }
config = { version = "0.14", default-features = false }
configcat = { version = "0.1.3", default-features = false }
const-oid = { version = "0.9.6", default-features = false }
Expand Down Expand Up @@ -132,7 +133,7 @@ tracing-actix-web = { version = "0.7.19", default-features = false }
tracing-appender = { version = "0.2.3", default-features = false }
tracing-log = { version = "0.2.0", default-features = false }
tracing-subscriber = { version = "0.3.19", default-features = false }
url = { version = "2.5.8" }
url = { version = "2.5.8", features = ["serde"] }
utoipa = { version = "5.4.0", default-features = false }
utoipa-swagger-ui = { version = "9.0.2", default-features = false, features = [
"vendored",
Expand Down
42 changes: 37 additions & 5 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,10 @@ The fastest way to get started is using the setup script:
```

This script will:
1. Start PostgreSQL via Docker Compose
2. Run etl-api migrations
3. Seed the default replicator image
4. Configure the Kubernetes environment (OrbStack)
1. Start PostgreSQL, ClickHouse, and the local Iceberg dependencies via Docker Compose.
2. Run etl-api migrations.
3. Seed the default replicator image.
4. Configure the Kubernetes environment (OrbStack).

## Database Setup

Expand Down Expand Up @@ -100,12 +100,19 @@ POSTGRES_DATA_VOLUME=/path/to/data ./scripts/init.sh
| `POSTGRES_DB` | `postgres` | Database name |
| `POSTGRES_PORT` | `5430` | Database port |
| `POSTGRES_HOST` | `localhost` | Database host |
| `CLICKHOUSE_HTTP_PORT` | `8123` | ClickHouse HTTP port |
| `CLICKHOUSE_NATIVE_PORT` | `9000` | ClickHouse native TCP port |
| `CLICKHOUSE_USER` | `etl` | ClickHouse user for the local Docker Compose setup |
| `CLICKHOUSE_PASSWORD` | `etl` | ClickHouse password for the local Docker Compose setup |
| `SKIP_DOCKER` | (empty) | Skip Docker Compose if set |
| `POSTGRES_DATA_VOLUME` | (empty) | Path for persistent storage |
| `POSTGRES_DATA_VOLUME` | (empty) | Path for PostgreSQL persistent storage |
| `CLICKHOUSE_DATA_VOLUME` | (empty) | Path for ClickHouse persistent storage |
| `REPLICATOR_IMAGE` | `ramsup/etl-replicator:latest` | Default replicator image |

PostgreSQL 18+ containers store data under `/var/lib/postgresql/<major>/data`, so the Docker Compose setup mounts the parent `/var/lib/postgresql` directory to keep upgrades compatible.

The same Docker Compose stack also starts ClickHouse on `http://localhost:8123` by default, which is enough for local destination development and ClickHouse integration tests.

### Manual Setup

If you prefer manual setup or have an existing PostgreSQL instance:
Expand Down Expand Up @@ -369,6 +376,18 @@ Iceberg destination tests use local MinIO and Lakekeeper instances. The followin

**Note:** Iceberg tests are only run when the `iceberg` and `test-utils` features are enabled. These use hardcoded local URLs and do not require environment variables.

#### ClickHouse Test Variables

ClickHouse destination tests require a reachable ClickHouse HTTP endpoint:

| Variable | Required | Description |
|----------|----------|-------------|
| `TESTS_CLICKHOUSE_URL` | **Yes** | ClickHouse HTTP URL (for example, `http://localhost:8123`) |
| `TESTS_CLICKHOUSE_USER` | **Yes** | ClickHouse user name (for the local Docker Compose setup, use `etl`) |
| `TESTS_CLICKHOUSE_PASSWORD` | No | ClickHouse password; for the local Docker Compose setup, use `etl` |

**Note:** ClickHouse tests are only run when the `clickhouse` and `test-utils` features are enabled. Each test creates a unique database in ClickHouse and drops it automatically when the test finishes. The Docker Compose setup started by `./scripts/init.sh` is sufficient for these tests.

#### Test Output and Logging

| Variable | Description |
Expand Down Expand Up @@ -407,6 +426,11 @@ export TESTS_DATABASE_PASSWORD=postgres
export TESTS_BIGQUERY_PROJECT_ID=your-gcp-project-id
export TESTS_BIGQUERY_SA_KEY_PATH=/path/to/service-account-key.json

# ClickHouse test configuration (optional - only needed for ClickHouse tests)
export TESTS_CLICKHOUSE_URL=http://localhost:8123
export TESTS_CLICKHOUSE_USER=etl
export TESTS_CLICKHOUSE_PASSWORD=etl

# Enable test output (optional)
export ENABLE_TRACING=1
export RUST_LOG=info
Expand All @@ -432,6 +456,11 @@ TESTS_DATABASE_PASSWORD=postgres
TESTS_BIGQUERY_PROJECT_ID=your-gcp-project-id
TESTS_BIGQUERY_SA_KEY_PATH=/path/to/service-account-key.json

# ClickHouse (optional - only for ClickHouse tests)
TESTS_CLICKHOUSE_URL=http://localhost:8123
TESTS_CLICKHOUSE_USER=etl
TESTS_CLICKHOUSE_PASSWORD=etl

# Test output (optional)
ENABLE_TRACING=1
RUST_LOG=info
Expand Down Expand Up @@ -462,6 +491,9 @@ TESTS_DATABASE_HOST=localhost TESTS_DATABASE_PORT=5430 TESTS_DATABASE_USERNAME=p

# Run tests with tracing output for debugging
TESTS_DATABASE_HOST=localhost TESTS_DATABASE_PORT=5430 TESTS_DATABASE_USERNAME=postgres TESTS_DATABASE_PASSWORD=postgres ENABLE_TRACING=1 RUST_LOG=info cargo test -p etl-api --test tenants tenant_can_be_created -- --nocapture

# Run the ClickHouse destination integration test against the local Docker Compose service
TESTS_DATABASE_HOST=localhost TESTS_DATABASE_PORT=5430 TESTS_DATABASE_USERNAME=postgres TESTS_DATABASE_PASSWORD=postgres TESTS_CLICKHOUSE_URL=http://localhost:8123 TESTS_CLICKHOUSE_USER=etl TESTS_CLICKHOUSE_PASSWORD=etl cargo test -p etl-destinations --features clickhouse,test-utils clickhouse_pipeline -- --nocapture
```

**Packages requiring `--features test-utils`:**
Expand Down
5 changes: 3 additions & 2 deletions etl-api/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ configcat = { workspace = true }
constant_time_eq = { workspace = true }
etl = { workspace = true }
etl-config = { workspace = true, features = ["utoipa", "supabase"] }
etl-destinations = { workspace = true, features = ["bigquery", "iceberg", "ducklake"] }
etl-destinations = { workspace = true, features = ["bigquery", "clickhouse", "ducklake", "iceberg"] }
etl-postgres = { workspace = true, features = ["replication"] }
etl-telemetry = { workspace = true }
k8s-openapi = { workspace = true, features = ["latest"] }
Expand All @@ -49,11 +49,12 @@ thiserror = { workspace = true }
tokio = { workspace = true, features = ["rt-multi-thread", "macros"] }
tracing = { workspace = true, default-features = false }
tracing-actix-web = { workspace = true, features = ["emit_event_on_error"] }
url = { workspace = true }
utoipa = { workspace = true, features = ["actix_extras"] }
utoipa-swagger-ui = { workspace = true, features = ["actix-web"] }

[dev-dependencies]
etl-destinations = { workspace = true, features = ["test-utils", "iceberg", "bigquery", "ducklake"] }
etl-destinations = { workspace = true, features = ["test-utils", "bigquery", "clickhouse", "ducklake", "iceberg"] }
etl-postgres = { workspace = true, features = ["test-utils", "sqlx"] }

insta = { workspace = true, features = ["json", "redactions"] }
Expand Down
Loading
Loading