From 85b092101513b27150679085fede89e17a4bd4b7 Mon Sep 17 00:00:00 2001 From: Nayan Joshi Date: Mon, 1 Jun 2026 13:20:36 +0530 Subject: [PATCH 1/4] updated release notes with v0.7.4 and mysql benchmark --- docs/benchmarks/ingestion.mdx | 20 ++++++++++---------- docs/release/ingestion/v0.7.0.mdx | 17 +++++++++++++---- 2 files changed, 23 insertions(+), 14 deletions(-) diff --git a/docs/benchmarks/ingestion.mdx b/docs/benchmarks/ingestion.mdx index 60e5be88..520bdefa 100644 --- a/docs/benchmarks/ingestion.mdx +++ b/docs/benchmarks/ingestion.mdx @@ -423,7 +423,7 @@ We used AWS Glue as Iceberg catalog and AWS S3 as the storage layer on the desti - Since the original repo supports only PostgreSQL, we first ingested the NYC Taxi Data in the cloud PostgreSQL database (Azure Flexible DB), and then transferred the tables from there to our MySQL database. - Total rows **4,001,991,536 rows** including both tables. - The average row size is **144 bytes** for `trips` and **121 bytes** for `fhv_trips`. -- OLake & Debezium were run on **AWS EC2 c6i.16xlarge (64 vCPUs, 128 GiB memory)** +- OLake & Debezium were run on **Azure Standard D64ls v5 VM (64 vCPUs, 128 GiB memory)** - Database instance: **Azure Standard D32as v6 (32 vCPUs, 128 GiB Memory)**
@@ -434,17 +434,17 @@ _(OLake vs. Popular Data-Movement Tool)_ | Tool | Rows Synced | Throughput (rows / sec) | Relative to OLake | |----------------------------------------------------|-------------|-------------------------|--------------------| -| **OLake**
(as of 14th Nov 2025) | **4.0 B** | **3,38,005 RPS** | – | -| Fivetran
(as of 14th Nov 2025) | 4.0 B | 119,106 RPS | **2.83 × slower** | +| **OLake**
(as of 14th Nov 2025) | **4.0 B** | **1,39,773 RPS** | – | +| Fivetran
(as of 14th Nov 2025) | 4.0 B | 73,087 RPS | **1.91 × slower** | -**Memory usage (OLake)** - `c6i.16xlarge (64 vCPUs, 128 GiB memory)` +**Memory usage (OLake)** - `Standard D64ls v5 (64 vcpus, 128 GiB Memory)` | Memory Stats | Usage (GB) | |--------|----| | Min | 3.24 | -| Max | 75.1 | -| Mean | 48.95 | +| Max | 83.6 | +| Mean | 53.25 | > OLake maintains high throughput while keeping memory usage efficient. @@ -452,17 +452,17 @@ _(OLake vs. Popular Data-Movement Tool)_ | Tool | CDC Window | Throughput (rows / sec) | Relative to OLake | | ------------------ | -----------: | ----------------------: | ----------------- | -| **OLake** | **16.06 min** | **51,867 RPS** | – | -| Fivetran | 29.86 min | 27,901 RPS | **1.85 × slower** | +| **OLake** | **13.9 min** | **59,951 RPS** | – | +| Fivetran | 21.15 min | 39,374 RPS | **1.52 × slower** | -**Key takeaway:** For incremental workloads OLake leads the pack, moving 50 million MySQL changes into Iceberg **85.9 % faster than Fivetran** +**Key takeaway:** For incremental workloads OLake leads the pack, moving 50 million MySQL changes into Iceberg **52.3 % faster than Fivetran** #### 3. Cost Comparison (Vendor List Prices) | Tool | Scenario | Spend (USD) | Rows Synced | | ------------- | --------------- | ------------------------------------------------------------------------------------: | -----------: | -| **OLake** | Full Load / CDC | Cost of a `c6i.16xlarge (64 vCPUs, 128 GiB memory)` running for 3.3 hours **< $ 11** | 4.0 B / 50 M | +| **OLake** | Full Load / CDC | Cost of a `Standard D64ls v5 (64 vcpus, 128 GiB Memory)` running for 7.95 hours **< $ 22** | 4.0 B / 50 M | | Fivetran | Full Load | $ 0 (free full sync) | 4.0 B | | Fivetran | CDC | $ 2, 375.80 | 50 M | diff --git a/docs/release/ingestion/v0.7.0.mdx b/docs/release/ingestion/v0.7.0.mdx index b0f968e0..a77277b1 100644 --- a/docs/release/ingestion/v0.7.0.mdx +++ b/docs/release/ingestion/v0.7.0.mdx @@ -1,9 +1,9 @@ --- -title: "OLake Go (v0.7.0 - v0.7.3)" +title: "OLake Go (v0.7.0 - v0.7.4)" --- -# OLake Go (v0.7.0 - v0.7.3) -April 21, 2026 – May 15, 2026 +# OLake Go (v0.7.0 - v0.7.4) +April 21, 2026 – May 30, 2026 ## 🎯 What's New @@ -17,10 +17,14 @@ April 21, 2026 – May 15, 2026 4. **MSSQL read replica support -**
Added optional `jdbc_url_params` to the MSSQL source so you can target Always On read replicas (for example with read-intent), and updated CDC to use replica-safe paths that avoid primary-only agent/msdb and capture-instance management on secondaries. +5. **MongoDB delete pre-image capture -**
Added support to capture the full document on delete events using `fullDocumentBeforeChange: "whenAvailable"` for MongoDB 6.0+ clusters with pre-images enabled, falling back to `_id`-only `documentKey` when pre-images are unavailable to preserve existing behaviour. + ### Destinations 1. **Skip equality deletes for CDC inserts post-backfill -**
Equality deletes are now skipped for CDC inserts once the backfill→CDC overlap window is complete, reducing unnecessary write overhead. A new `dedup_inserts` flag on the Iceberg `olake_2pc` table property tracks this — Java sets it to `true` on backfill commit, and Go clears it to `false` after the first successful CDC commit. This applies to both the Arrow and legacy gRPC writers. +2. **2PC integration tests -**
Added integration tests for two-phase commit (2PC) to validate end-to-end behavior and improve reliability of 2PC flows. + ## 🔧 Bug Fixes & Stability 1. **Upgrade pgx/v5 to v5.9.2 for security fixes -**
Upgraded `github.com/jackc/pgx/v5` from `v5.7.3` to `v5.9.2` to remediate two security vulnerabilities: a critical memory-safety flaw (`CVE-2026-33816`) that could allow memory corruption and a low-severity SQL injection advisory (`GHSA-j88v-2chj-qfwx`). No existing functionality is affected by this upgrade. @@ -35,4 +39,9 @@ April 21, 2026 – May 15, 2026 6. **MongoDB primary key pinning for deterministic deduplication -**
Previously, all indexed fields were treated as primary keys, so updates to non-unique indexed fields changed the `_olake_id` and broke Iceberg equality deletes, creating duplicate rows. The primary key is now pinned strictly to MongoDB’s guaranteed-unique `_id`, ensuring stable hashes and correct deduplicated upserts. -7. **DB2 driver download fix in integration tests -**
DB2 integration tests now reuse the already-installed `clidriver` by copying it into the workspace, so Docker containers find it locally instead of repeatedly hitting the flaky IBM CDN download path. \ No newline at end of file +7. **DB2 driver download fix in integration tests -**
DB2 integration tests now reuse the already-installed `clidriver` by copying it into the workspace, so Docker containers find it locally instead of repeatedly hitting the flaky IBM CDN download path. + +8. **Upgrade `golang.org/x/crypto` to v0.52.0 for SSH security fixes -**
Upgraded `golang.org/x/crypto` from `v0.50.0` to `v0.52.0` across all Go modules to patch five SSH-related vulnerabilities reported by `govulncheck` (GO-2026-5013, GO-2026-5017, GO-2026-5018, GO-2026-5019, GO-2026-5020), bringing the workspace onto the latest secure SSH implementation. + +9. **Graceful shutdown via SIGINT/SIGTERM-aware root context -**
Wired SIGINT/SIGTERM into the Cobra root context using `signal.NotifyContext`, so CDC, backfill, and destination writers now respect `ctx.Done()` and shut down cleanly on pod eviction, `docker stop`, or Ctrl-C instead of being killed mid-read. + From 9f4d26baf871933b094607543d905e19348bee65 Mon Sep 17 00:00:00 2001 From: Nayan Joshi Date: Mon, 1 Jun 2026 13:24:37 +0530 Subject: [PATCH 2/4] Updated the date of benchmark in doc --- docs/benchmarks/ingestion.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/benchmarks/ingestion.mdx b/docs/benchmarks/ingestion.mdx index 520bdefa..1d155483 100644 --- a/docs/benchmarks/ingestion.mdx +++ b/docs/benchmarks/ingestion.mdx @@ -434,8 +434,8 @@ _(OLake vs. Popular Data-Movement Tool)_ | Tool | Rows Synced | Throughput (rows / sec) | Relative to OLake | |----------------------------------------------------|-------------|-------------------------|--------------------| -| **OLake**
(as of 14th Nov 2025) | **4.0 B** | **1,39,773 RPS** | – | -| Fivetran
(as of 14th Nov 2025) | 4.0 B | 73,087 RPS | **1.91 × slower** | +| **OLake**
(as of 30th May 2026) | **4.0 B** | **1,39,773 RPS** | – | +| Fivetran
(as of 30th May 2026) | 4.0 B | 73,087 RPS | **1.91 × slower** | **Memory usage (OLake)** - `Standard D64ls v5 (64 vcpus, 128 GiB Memory)` From db2abd4878b0f9fb951b060d24a2980917fb6e89 Mon Sep 17 00:00:00 2001 From: Nayan Joshi Date: Mon, 1 Jun 2026 15:56:28 +0530 Subject: [PATCH 3/4] benchmark values updated on landing page --- src/data/benchmarkData.ts | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/src/data/benchmarkData.ts b/src/data/benchmarkData.ts index 3a5f03a2..7624e916 100644 --- a/src/data/benchmarkData.ts +++ b/src/data/benchmarkData.ts @@ -269,28 +269,28 @@ export const CONNECTOR_BENCHMARKS: Record = { estuary: '-' }, elapsedTime: { - olake: '3.3 hours', + olake: '7.95 hours', airbyte: '-', - fivetran: '9.34 hours', + fivetran: '15.20 hours', debezium: '-', estuary: '-' }, speed: { - olake: '3,38,005 RPS', + olake: '1,39,773 RPS', airbyte: '-', - fivetran: '1,19,106 RPS', + fivetran: '73,087 RPS', debezium: '-', estuary: '-' }, comparison: { olake: '-', airbyte: '-', - fivetran: '2.83x slower', + fivetran: '1.91x slower', debezium: '-', estuary: '-' }, cost: { - olake: '< $ 11', + olake: '< $ 22', airbyte: '-', fivetran: '$ 0 (free full sync)', debezium: '-', @@ -593,30 +593,30 @@ export const CONNECTOR_CDC_BENCHMARKS: Record = estuary: '-' }, elapsedTime: { - olake: '16.06 min', + olake: '13.90 min', airbyte: '-', - fivetran: '29.86 min', + fivetran: '21.15 min', debezium: '-', estuary: '-' }, speed: { - olake: '51,867 RPS', + olake: '59,951 RPS', airbyte: '-', - fivetran: '27,901 RPS', + fivetran: '39,374 RPS', debezium: '-', estuary: '-' }, comparison: { olake: '-', airbyte: '-', - fivetran: '1.85x slower', + fivetran: '1.52x slower', debezium: '-', estuary: '-' }, cost: { - olake: '-', + olake: '$ 1', airbyte: '-', - fivetran: '-', + fivetran: '$ 2, 375.80', debezium: '-', estuary: '-' } From 042815e1f270613e5e0f2557d635a7927e143d7b Mon Sep 17 00:00:00 2001 From: Nayan Joshi Date: Mon, 1 Jun 2026 15:59:40 +0530 Subject: [PATCH 4/4] removed debezium instance creation statement --- docs/benchmarks/ingestion.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/benchmarks/ingestion.mdx b/docs/benchmarks/ingestion.mdx index 1d155483..f67f9f11 100644 --- a/docs/benchmarks/ingestion.mdx +++ b/docs/benchmarks/ingestion.mdx @@ -423,7 +423,7 @@ We used AWS Glue as Iceberg catalog and AWS S3 as the storage layer on the desti - Since the original repo supports only PostgreSQL, we first ingested the NYC Taxi Data in the cloud PostgreSQL database (Azure Flexible DB), and then transferred the tables from there to our MySQL database. - Total rows **4,001,991,536 rows** including both tables. - The average row size is **144 bytes** for `trips` and **121 bytes** for `fhv_trips`. -- OLake & Debezium were run on **Azure Standard D64ls v5 VM (64 vCPUs, 128 GiB memory)** +- OLake was run on **Azure Standard D64ls v5 VM (64 vCPUs, 128 GiB memory)** - Database instance: **Azure Standard D32as v6 (32 vCPUs, 128 GiB Memory)**