dbmazz

Real-time PostgreSQL CDC in ~11 MB of RAM. One binary. No Kafka.

A single Rust daemon that streams PostgreSQL changes to StarRocks, Snowflake, or another PostgreSQL with sub-second replication lag — 23–360× lighter than Debezium standard deployments and 727× lighter than Airbyte's minimum recommendation. (How we measured ↓)

Built and maintained by EZ-CDC.

Quickstart · What is dbmazz? · At a glance · How it works · Performance · Production

What is dbmazz?

dbmazz is what CDC looks like without the JVM tax: a single Rust binary, built and maintained by EZ-CDC, that reads the PostgreSQL WAL via logical replication and streams every INSERT, UPDATE, and DELETE into your sink in real time, with sub-second replication lag in steady state. No Kafka, no Flink, no ZooKeeper, no Connect cluster, no schema registry — and no batch windows.

The whole thing is ~30 MB on disk and ~11 MB resident in memory — about the size of an idle shell session, not a database tool. Run it from the official Docker image or build it from source. Each instance handles one replication job.

^{The ez-cdc CLI in quickstart mode — live throughput, lag, source vs target row counts.}

At a glance

How dbmazz compares to other open-source CDC tools on resource footprint and deployment complexity — the two dimensions where dbmazz is purpose-built to win.

	dbmazz	Debezium (standard)	Airbyte
Language / runtime	Rust (static binary)	Java / JVM	Java / Python
Replication mode	Real-time CDC (sub-second WAL streaming)	Real-time CDC (sub-second WAL streaming)	Batch ETL (scheduled syncs, minutes – hours)
Components for one PG → sink pipeline	1 (single binary)	2–7 (JVM + Kafka + …)	7–10+ (microservices)
External dependencies	none	Kafka, ZooKeeper	Postgres, Temporal, scheduler
Published min memory	~11 MB RSS	256 MB – 4 GB	8 GB+ minimum recommended
Deployment	Single CLI	Compose / K8s with multiple services	Docker Compose / K8s with microservices

Each tool optimizes for different things. dbmazz optimizes for resource efficiency and operational simplicity. The numbers above are from each project's own documentation:

Debezium: official FAQ (256 MB – 2 GB typical heap), with 4 GB+ for high-throughput per RisingWave's deployment guide.
Airbyte: official deployment docs (4 vCPU + 8 GB minimum recommended).
dbmazz: from our own benchmark report (40 daemons concurrent on a single 2 vCPU / 4 GB worker, ~11 MB RSS each).

🤔 Why dbmazz?

In a 3-day production load test we ran 40 dbmazz daemons in parallel on a single 2 vCPU / 4 GB worker, each replicating its own PostgreSQL database to a shared StarRocks instance. Total worker memory: 522 MB out of 4 GB. Total worker CPU: 15 % average, 32 % peak. Every daemon held its own replication slot, parsed pgoutput, and pushed batched writes to the sink — converging on roughly 1 % of one CPU core and ~11 MB of RSS, regardless of whether the source was producing 1 000 inserts/sec or 1 insert/minute. dbmazz overhead is fixed-cost per daemon, not load-dependent. (Full CDC footprint benchmark)

For backfill at scale, we've also benchmarked TPC-DS 1 TB at ~110 K rows/sec sustained on an 8 vCPU worker, with the bottleneck at worker compute — not PostgreSQL or the sink. Full numbers, methodology, and hardware specs in the snapshot benchmark.

Where dbmazz wins

🪶 The smallest CDC footprint that still does real work. ~30 MB on disk, ~11 MB resident in memory at steady state — about the size of an idle shell session. Runs comfortably on a t3.micro, a Raspberry Pi, or as a sidecar to your application container. Compare to 256 MB – 4 GB for Debezium or 8 GB+ for Airbyte.
⚡ Real-time streaming, not scheduled batch syncs. dbmazz delivers WAL events as they happen, with sub-second replication lag in steady state. No sync windows, no waiting for the next batch. If freshness matters, that's the difference between an operational replica and a day-old data lake.
🪐 Multi-tenant CDC at minimal cost. The benchmark shows 40 daemons on a 2 vCPU / 4 GB box with ~70 % memory headroom still free. A t3.medium at $0.04/hour can carry your whole CDC fleet.
🏔️ Postgres → analytical warehouse without a streaming platform — direct sink writes (Stream Load for StarRocks, binary COPY for Postgres, Parquet staging for Snowflake), no Kafka in between. No Kafka cluster. No ZooKeeper. No schema registry. No Connect cluster. No Temporal stack. No microservices.
🚀 Time to first replication: under 2 minutes. Install the CLI, point it at two databases, watch the live dashboard. No deployment manifests, no Kubernetes, no SaaS signup.
🦀 Built on Rust. Memory-safe, no GC pauses, no JVM warm-up time. Predictable performance, predictable footprint.

🚀 Try it in 2 minutes

dbmazz is operated through the ez-cdc CLI. Install it with one command:

curl -sSL https://raw.githubusercontent.com/ez-cdc/ez-cdc-cli-releases/main/install.sh | sh

Then spin up a self-contained demo (PostgreSQL source + sink already seeded) and watch a pipeline run:

ez-cdc quickstart --demo

That boots the demo stack, starts dbmazz, and drops you into the live dashboard — no config files, no databases of your own required. Press t to generate live traffic, q to quit.

The CLI runs on Linux and macOS (amd64 and arm64).

Or run the daemon directly with Docker. Skip the CLI and pull the official image — one command, your daemon is up:

docker run --rm \
  -e SOURCE_URL='postgresql://user:pass@source-host:5432/mydb' \
  -e SINK_URL='http://sink-host:8030' \
  -e SINK_TYPE=starrocks \
  -e SINK_DATABASE=analytics \
  ghcr.io/ez-cdc/dbmazz:latest

Full Docker reference (env vars, healthcheck, persisting state, build-from-source): docs.ez-cdc.com/self-hosted/docker.

Want to point dbmazz at your own databases, run the verification suite, or dig into the full CLI reference? See docs.ez-cdc.com/self-hosted/cli.

🔌 Supported sources & sinks

Source	Sink	Status	Notes
PostgreSQL 12+	StarRocks 3.2+	✅ Stable	JSON Stream Load, partial-update for TOAST columns, audit columns auto-managed, schema evolution requires per-table `fast_schema_evolution=true`
PostgreSQL 12+	PostgreSQL 15+	✅ Stable	Binary `COPY` → raw table → `MERGE` normalizer; supports hard delete
PostgreSQL 12+	Snowflake	✅ Stable	Parquet → PUT (stage) → `COPY INTO` → background `MERGE`; JWT key-pair auth supported; requires `ALTER TABLE` privilege on target schema
PostgreSQL 12+	S3 / Iceberg	🚧 Roadmap	Tracked in issues
MySQL 5.7+ / 8.0+	All sinks	🧪 Beta	Binlog-based with GTID-aware checkpointing. See `docs/mysql-source.md`.

Adding a new sink is intentionally small: implement a 6-method Sink trait and CDC + snapshot work automatically. See docs/contributing-connectors.md.

How it works

PostgreSQL (source)               dbmazz                          Sink (target)
┌──────────────┐               ┌────────────────────┐          ┌──────────────┐
│  WAL         │   logical     │ WAL Handler        │          │ StarRocks    │
│  (INSERT,    │   replication │   │                │          │ PostgreSQL   │
│   UPDATE,    │ ────────────▶ │   ▼                │          │ Snowflake    │
│   DELETE)    │   (pgoutput)  │ source/converter   │          │              │
│              │               │   │                │          │              │
│              │               │   ▼                │          │              │
│              │               │ Pipeline           │  write   │              │
│              │               │   │ batch + flush  │──batch──▶│              │
│              │               │   ▼                │          │              │
│              │               │ Checkpoint (LSN)   │          │              │
│              │ ◀─────────────│  confirm to PG     │          │              │
└──────────────┘               └────────────────────┘          └──────────────┘

dbmazz reads PostgreSQL's logical replication stream (pgoutput), batches events, and writes them to a sink. Each sink owns its loading strategy — Stream Load, binary COPY, or Parquet staging — the engine doesn't care.

Full architecture, guarantees, and design decisions: docs/architecture.md.

📊 Performance

We publish reproducible benchmarks with full hardware specs and methodology. Numbers below come from real runs, not synthetic projections.

Read this first — Snapshot vs CDC

dbmazz operates in two modes with fundamentally different resource profiles. Don't conflate them.

Snapshot (one-time backfill) is a heavy parallel workload. It runs N concurrent SELECT chunks over multiple PostgreSQL connections, serializes millions of rows to the sink format, and pushes them in batches. The CPU and memory you see in the snapshot benchmark below reflect this workload — N workers each holding a chunk in memory waiting for the sink to ACK — not steady-state daemon overhead. Snapshot runs once.
CDC streaming (steady state) is an entirely different beast. dbmazz reads logical replication messages over a single PostgreSQL connection, batches them in a small in-memory channel, and flushes to the sink. There is no read-side parallelism because the WAL stream is sequential. Memory is bounded by the channel buffer (a few thousand events, configurable). This is what you run 99% of the time, and it costs almost nothing.

TL;DR: if you see "1.7 GB RSS" in the snapshot benchmark and assume that's what dbmazz uses on your production CDC pipeline — that's the wrong takeaway. CDC steady state runs in a fraction of the resources.

Snapshot — TPC-DS 1 TB (partial run)

We published a partial TPC-DS 1 TB snapshot run alongside the source code so you can see exactly what we measured, on what hardware, and where the bottleneck is.

Source: PostgreSQL 16 on RDS (us-west-2), gp3 storage
Sink: StarRocks on EC2 (same region)
Worker: c5.2xlarge — 8 vCPU, 16 GB RAM
Workers / chunk size: 25 concurrent / 50,000 rows
Dataset: 25 tables, ~6.35 B rows total
Result: ~110 K rows/sec sustained, ~1.7 GB RSS, 82–91 % CPU sustained
Estimated total time (full 1 TB): ~16 hours

Status: this is a partial run (2,176 of 113,129 chunks completed at the time of recording). A complete end-to-end run is in progress. Full report: benchmarks/2026-03-07-tpcds-1tb-snapshot.md.

Where's the bottleneck? All 8 vCPUs of the worker are saturated on JSON serialization and gzip compression — the bottleneck is worker compute, not PostgreSQL or StarRocks. Larger instances scale further until you hit the source DB's read IOPS ceiling (typically ~12 K IOPS on RDS gp3).

CDC streaming — per-daemon footprint

In a 3-day production load test, 40 dbmazz daemons ran concurrently on a single ~t3.medium worker (2 vCPU / 4 GB RAM), split across three workload tiers that differed by 4 orders of magnitude in configured insert rate. Per-daemon footprint was nearly constant across all three tiers:

Tier	Configured rate (per source)	Jobs	CPU avg	RSS avg
High-rate	500–1 000 inserts/sec	10	11.3 millicores	11.0 MB
Moderate-rate	50–100 inserts/sec	20	10.7 millicores	10.7 MB
Low-rate	1–5 inserts/min	10	12.3 millicores	10.7 MB

The headline finding: dbmazz overhead is fixed-cost per daemon, not load-dependent. Whether the source is producing 1 000 inserts/sec or 1 insert/minute, a daemon converges on roughly 1 % of one CPU core and ~11 MB of RSS. The cost is dominated by holding the replication slot, parsing pgoutput, and maintaining the sink connection — the marginal cost per event is invisible at this scale.

The same worker reported 15 % total CPU and 522 MB total RAM used (12.7 % of capacity) with all 40 daemons running. dbmazz scales horizontally on a single host: each daemon is an independent Unix process with its own replication slot and sink connection.

Full setup, per-tier breakdown, methodology, queries, and honest caveats: benchmarks/2026-04-13-cdc-footprint-multitenant.md.

What this does NOT measure: maximum sustained throughput per daemon (no daemon was anywhere close to saturated), end-to-end lag percentiles (the workload was light enough that lag stayed below the metric's reporting threshold), or behaviour under a sink slowdown. A reproducible single-daemon throughput benchmark with pgbench-driven sustained load is tracked in issue #71.

🐳 Production deployment

For managed BYOC deployment with auto-healing workers, centralized monitoring, RBAC, audit logs, and a web portal — running dbmazz in your own AWS or GCP account — see EZ-CDC Cloud.

👩‍💻 For developers

Build and test

cargo build --release
cargo test
cargo fmt -- --check
cargo clippy -- -D warnings

Requires Rust 1.91.1+. System deps: musl-tools, pkg-config, perl, make.

Contributing a new sink

The engine is sink-agnostic. Adding a new sink means implementing one trait — six methods, one with a default — modelled after Kafka Connect:

#[async_trait]
pub trait Sink: Send + Sync {
    fn name(&self) -> &'static str;
    fn capabilities(&self) -> SinkCapabilities;
    async fn validate_connection(&self) -> Result<()>;
    async fn setup(&mut self, source_schemas: &[SourceTableSchema]) -> Result<()> { Ok(()) }
    async fn write_batch(&mut self, records: Vec<CdcRecord>) -> Result<SinkResult>;
    async fn close(&mut self) -> Result<()>;
}

Snapshot and CDC both go through write_batch() — there is no separate snapshot path. The sink owns its loading strategy (Stream Load, COPY, S3 staging, MERGE, etc.) and the engine doesn't care.

Full step-by-step guide: docs/contributing-connectors.md.

💬 Community

GitHub Discussions — questions, ideas, show & tell: github.com/ez-cdc/dbmazz/discussions
Issues — bug reports and feature requests: github.com/ez-cdc/dbmazz/issues
Changelog — CHANGELOG.md

We're a small project. PRs and issues are read by humans, not bots.

🤝 Contributing

dbmazz welcomes contributions.

Read CONTRIBUTING.md for setup, conventions, and the PR checklist.
To add a new source or sink connector, see docs/contributing-connectors.md.

By contributing you agree your contributions are licensed under the Elastic License v2.0, the same license as the project.

📄 License

Elastic License v2.0.

In plain English: dbmazz is free for commercial and non-commercial use, including running it in production, embedding it in your own product, or modifying it for internal use. The only restriction is that you cannot offer dbmazz to third parties as a managed service. Self-hosting is unrestricted.

☁️ About EZ-CDC

dbmazz is the open-source CDC engine maintained by EZ-CDC. We're a small team building modern data replication tools for teams that want streaming Postgres CDC without operating a streaming platform — and dbmazz is the daemon at the heart of everything we ship.

The same team also runs EZ-CDC Cloud: a managed BYOC platform that deploys dbmazz into your own AWS or GCP account.

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
.claude/reports		.claude/reports
.githooks		.githooks
.github		.github
assets		assets
benchmarks		benchmarks
docs		docs
session-logs		session-logs
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Cross.toml		Cross.toml
Dockerfile		Dockerfile
Dockerfile.mysql		Dockerfile.mysql
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbmazz

What is dbmazz?

At a glance

🤔 Why dbmazz?

Where dbmazz wins

🚀 Try it in 2 minutes

🔌 Supported sources & sinks

How it works

📊 Performance

Read this first — Snapshot vs CDC

Snapshot — TPC-DS 1 TB (partial run)

CDC streaming — per-daemon footprint

🐳 Production deployment

👩‍💻 For developers

Build and test

Contributing a new sink

💬 Community

🤝 Contributing

📄 License

☁️ About EZ-CDC

About

Uh oh!

Releases 19

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dbmazz

What is dbmazz?

At a glance

🤔 Why dbmazz?

Where dbmazz wins

🚀 Try it in 2 minutes

🔌 Supported sources & sinks

How it works

📊 Performance

Read this first — Snapshot vs CDC

Snapshot — TPC-DS 1 TB (partial run)

CDC streaming — per-daemon footprint

🐳 Production deployment

👩‍💻 For developers

Build and test

Contributing a new sink

💬 Community

🤝 Contributing

📄 License

☁️ About EZ-CDC

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 19

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages