Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions docs/benchmarks/ingestion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,7 @@ We used AWS Glue as Iceberg catalog and AWS S3 as the storage layer on the desti
- Since the original repo supports only PostgreSQL, we first ingested the NYC Taxi Data in the cloud PostgreSQL database (Azure Flexible DB), and then transferred the tables from there to our MySQL database.
- Total rows **4,001,991,536 rows** including both tables.
- The average row size is **144 bytes** for `trips` and **121 bytes** for `fhv_trips`.
- OLake & Debezium were run on **AWS EC2 c6i.16xlarge (64 vCPUs, 128 GiB memory)**
- OLake was run on **Azure Standard D64ls v5 VM (64 vCPUs, 128 GiB memory)**
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where the machine of debezium being mentioned?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debezium was to be removed as we never ran benchmarking for debezium

- Database instance: **Azure Standard D32as v6 (32 vCPUs, 128 GiB Memory)**

<br/>
Expand All @@ -434,35 +434,35 @@ _(OLake vs. Popular Data-Movement Tool)_

| Tool | Rows Synced | Throughput (rows / sec) | Relative to OLake |
|----------------------------------------------------|-------------|-------------------------|--------------------|
| **OLake** <br/><span className='text-xs text-slate-500'>(as of 14th Nov 2025)</span> | **4.0 B** | **3,38,005 RPS** | – |
| Fivetran <br/><span className='text-xs text-slate-500'>(as of 14th Nov 2025)</span> | 4.0 B | 119,106 RPS | **2.83 × slower** |
| **OLake** <br/><span className='text-xs text-slate-500'>(as of 30th May 2026)</span> | **4.0 B** | **1,39,773 RPS** | – |
| Fivetran <br/><span className='text-xs text-slate-500'>(as of 30th May 2026)</span> | 4.0 B | 73,087 RPS | **1.91 × slower** |


**Memory usage (OLake)** - `c6i.16xlarge (64 vCPUs, 128 GiB memory)`
**Memory usage (OLake)** - `Standard D64ls v5 (64 vcpus, 128 GiB Memory)`

| Memory Stats | Usage (GB) |
|--------|----|
| Min | 3.24 |
| Max | 75.1 |
| Mean | 48.95 |
| Max | 83.6 |
| Mean | 53.25 |

> OLake maintains high throughput while keeping memory usage efficient.

#### 2. Speed Comparison – **Change-Data-Capture (CDC)**

| Tool | CDC Window | Throughput (rows / sec) | Relative to OLake |
| ------------------ | -----------: | ----------------------: | ----------------- |
| **OLake** | **16.06 min** | **51,867 RPS** | – |
| Fivetran | 29.86 min | 27,901 RPS | **1.85 × slower** |
| **OLake** | **13.9 min** | **59,951 RPS** | – |
| Fivetran | 21.15 min | 39,374 RPS | **1.52 × slower** |


**Key takeaway:** For incremental workloads OLake leads the pack, moving 50 million MySQL changes into Iceberg **85.9 % faster than Fivetran**
**Key takeaway:** For incremental workloads OLake leads the pack, moving 50 million MySQL changes into Iceberg **52.3 % faster than Fivetran**

#### 3. Cost Comparison (Vendor List Prices)

| Tool | Scenario | Spend (USD) | Rows Synced |
| ------------- | --------------- | ------------------------------------------------------------------------------------: | -----------: |
| **OLake** | Full Load / CDC | Cost of a `c6i.16xlarge (64 vCPUs, 128 GiB memory)` running for 3.3 hours **< $ 11** | 4.0 B / 50 M |
| **OLake** | Full Load / CDC | Cost of a `Standard D64ls v5 (64 vcpus, 128 GiB Memory)` running for 7.95 hours **< $ 22** | 4.0 B / 50 M |
| Fivetran | Full Load | $ 0 (free full sync) | 4.0 B |
| Fivetran | CDC | $ 2, 375.80 | 50 M |

Expand Down
17 changes: 13 additions & 4 deletions docs/release/ingestion/v0.7.0.mdx
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
title: "OLake Go (v0.7.0 - v0.7.3)"
title: "OLake Go (v0.7.0 - v0.7.4)"
---

# OLake Go (v0.7.0 - v0.7.3)
April 21, 2026 – May 15, 2026
# OLake Go (v0.7.0 - v0.7.4)
April 21, 2026 – May 30, 2026

## 🎯 What's New

Expand All @@ -17,10 +17,14 @@ April 21, 2026 – May 15, 2026

4. **MSSQL read replica support -** <br/> Added optional `jdbc_url_params` to the MSSQL source so you can target Always On read replicas (for example with read-intent), and updated CDC to use replica-safe paths that avoid primary-only agent/msdb and capture-instance management on secondaries.

5. **MongoDB delete pre-image capture -** <br/> Added support to capture the full document on delete events using `fullDocumentBeforeChange: "whenAvailable"` for MongoDB 6.0+ clusters with pre-images enabled, falling back to `_id`-only `documentKey` when pre-images are unavailable to preserve existing behaviour.

### Destinations

1. **Skip equality deletes for CDC inserts post-backfill -** <br/> Equality deletes are now skipped for CDC inserts once the backfill→CDC overlap window is complete, reducing unnecessary write overhead. A new `dedup_inserts` flag on the Iceberg `olake_2pc` table property tracks this — Java sets it to `true` on backfill commit, and Go clears it to `false` after the first successful CDC commit. This applies to both the Arrow and legacy gRPC writers.

2. **2PC integration tests -** <br/> Added integration tests for two-phase commit (2PC) to validate end-to-end behavior and improve reliability of 2PC flows.

## 🔧 Bug Fixes & Stability

1. **Upgrade pgx/v5 to v5.9.2 for security fixes -** <br/> Upgraded `github.com/jackc/pgx/v5` from `v5.7.3` to `v5.9.2` to remediate two security vulnerabilities: a critical memory-safety flaw (`CVE-2026-33816`) that could allow memory corruption and a low-severity SQL injection advisory (`GHSA-j88v-2chj-qfwx`). No existing functionality is affected by this upgrade.
Expand All @@ -35,4 +39,9 @@ April 21, 2026 – May 15, 2026

6. **MongoDB primary key pinning for deterministic deduplication -** <br/> Previously, all indexed fields were treated as primary keys, so updates to non-unique indexed fields changed the `_olake_id` and broke Iceberg equality deletes, creating duplicate rows. The primary key is now pinned strictly to MongoDB’s guaranteed-unique `_id`, ensuring stable hashes and correct deduplicated upserts.

7. **DB2 driver download fix in integration tests -** <br/> DB2 integration tests now reuse the already-installed `clidriver` by copying it into the workspace, so Docker containers find it locally instead of repeatedly hitting the flaky IBM CDN download path.
7. **DB2 driver download fix in integration tests -** <br/> DB2 integration tests now reuse the already-installed `clidriver` by copying it into the workspace, so Docker containers find it locally instead of repeatedly hitting the flaky IBM CDN download path.

8. **Upgrade `golang.org/x/crypto` to v0.52.0 for SSH security fixes -** <br/> Upgraded `golang.org/x/crypto` from `v0.50.0` to `v0.52.0` across all Go modules to patch five SSH-related vulnerabilities reported by `govulncheck` (GO-2026-5013, GO-2026-5017, GO-2026-5018, GO-2026-5019, GO-2026-5020), bringing the workspace onto the latest secure SSH implementation.

9. **Graceful shutdown via SIGINT/SIGTERM-aware root context -** <br/> Wired SIGINT/SIGTERM into the Cobra root context using `signal.NotifyContext`, so CDC, backfill, and destination writers now respect `ctx.Done()` and shut down cleanly on pod eviction, `docker stop`, or Ctrl-C instead of being killed mid-read.

26 changes: 13 additions & 13 deletions src/data/benchmarkData.ts
Original file line number Diff line number Diff line change
Expand Up @@ -269,28 +269,28 @@ export const CONNECTOR_BENCHMARKS: Record<ConnectorId, ConnectorBenchmark> = {
estuary: '-'
},
elapsedTime: {
olake: '3.3 hours',
olake: '7.95 hours',
airbyte: '-',
fivetran: '9.34 hours',
fivetran: '15.20 hours',
debezium: '-',
estuary: '-'
},
speed: {
olake: '3,38,005 RPS',
olake: '1,39,773 RPS',
airbyte: '-',
fivetran: '1,19,106 RPS',
fivetran: '73,087 RPS',
debezium: '-',
estuary: '-'
},
comparison: {
olake: '-',
airbyte: '-',
fivetran: '2.83x slower',
fivetran: '1.91x slower',
debezium: '-',
estuary: '-'
},
cost: {
olake: '< $ 11',
olake: '< $ 22',
airbyte: '-',
fivetran: '$ 0 (free full sync)',
debezium: '-',
Expand Down Expand Up @@ -593,30 +593,30 @@ export const CONNECTOR_CDC_BENCHMARKS: Record<ConnectorId, ConnectorBenchmark> =
estuary: '-'
},
elapsedTime: {
olake: '16.06 min',
olake: '13.90 min',
airbyte: '-',
fivetran: '29.86 min',
fivetran: '21.15 min',
debezium: '-',
estuary: '-'
},
speed: {
olake: '51,867 RPS',
olake: '59,951 RPS',
airbyte: '-',
fivetran: '27,901 RPS',
fivetran: '39,374 RPS',
debezium: '-',
estuary: '-'
},
comparison: {
olake: '-',
airbyte: '-',
fivetran: '1.85x slower',
fivetran: '1.52x slower',
debezium: '-',
estuary: '-'
},
cost: {
olake: '-',
olake: '$ 1',
airbyte: '-',
fivetran: '-',
fivetran: '$ 2, 375.80',
debezium: '-',
estuary: '-'
}
Expand Down
Loading