feat: MySQL CDC GA — type fidelity, bootstrap, cursor chunker, non-integer PKs#99
Open
dariomazzitellireplik-coder wants to merge 1 commit into
Open
feat: MySQL CDC GA — type fidelity, bootstrap, cursor chunker, non-integer PKs#99dariomazzitellireplik-coder wants to merge 1 commit into
dariomazzitellireplik-coder wants to merge 1 commit into
Conversation
…teger PKs Graduates MySQL source from BETA to STABLE in v2.5.0. Closes the correctness and UX gaps identified at v2.3.0 (PR #97). ## Type fidelity - Add Value::UInt64(u64) and DataType::UInt64 for BIGINT UNSIGNED. Previously MySQL silently wrapped values >= 2^63 to negative i64. Sinks map to NUMERIC(20,0) (PG/SF) or LARGEINT (StarRocks); value_to_json stringifies to avoid downstream precision loss. - Replace the temporary ISO-string TIMESTAMP shim from PR #97 with Value::Timestamp(micros). DATETIME(p) microseconds are preserved. - Schema introspection reads NUMERIC_PRECISION / NUMERIC_SCALE / COLUMN_TYPE; DECIMAL columns land as DataType::Decimal{p,s} instead of the hardcoded (38,9). - MySQL converter routes MYSQL_TYPE_NEWDECIMAL / MYSQL_TYPE_DECIMAL bytes through Value::Decimal instead of Value::String. ## First-run bootstrap (H5) On a fresh start with no checkpoint, SHOW MASTER STATUS is captured BEFORE the snapshot worker spawns and persisted as a PROVISIONAL checkpoint. The post-snapshot CDC stream resumes from that point — no more replaying days of binlogs, no hard error if old binlogs were purged. The first commit promotes the row to ACTIVE. dbmazz_checkpoints gains a nullable status column (idempotent migration via SHOW COLUMNS probe). ## Snapshot performance (M3) Replaces MIN(pk)/MAX(pk) + linear partitioning with cursor-based keyset paging: SELECT pk WHERE pk > ? ORDER BY pk LIMIT chunk_size+1. Each chunk has bounded row count regardless of PK density. Sparse distributions (gaps from DELETEs / auto-increment skips) no longer produce empty / oversized chunks. ## Non-integer PK support (M4) find_mysql_integer_pk replaced with find_mysql_pk, dispatching on DataType: Int*/UInt64 -> PkKind::Int|UInt, String/Text/Uuid -> PkKind::Str (with COLLATE utf8mb4_bin for deterministic ordering), Bytes -> PkKind::Bytes. Composite PK and no-PK tables are skipped with WARN logs. dbmazz_snapshot_state gains typed columns (start_pk_text, end_pk_text, pk_kind) with backfill from legacy i64 columns. Idempotent. ## Cleanup - Drop dead trait methods: Source::start_replication, checkpoint_position, cleanup (engine uses create_loop only). - Drop SinkResult::last_position (populated but never consumed; LSN flows through PipelineEvent::lsn). ## Verify matrix (dbmazz-mysql:dev, v2.5.0) - PG -> PG: 18/0/0 - PG -> StarRocks: 17/0/1 (A4 skip expected; SR has no metadata table) - MySQL -> PG: 18/0/0 - MySQL -> SR: 17/0/1 Zero regressions, full Tier 1 green including C11 type roundtrip against MySQL sources.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Graduates MySQL source from BETA → STABLE in v2.5.0. Closes the correctness and UX gaps identified at v2.3.0 (PR #97). After this lands, the README/docs drop the
🧪 Betabadge and the C11 type-roundtrip verify check passes against MySQL sources without--skip C11.OpenSpec change:
mysql-cdc-beta-to-ga.Highlights
Type fidelity
Value::UInt64(u64)+DataType::UInt64forBIGINT UNSIGNED. Previously dbmazz silently wrapped values ≥ 2^63 to negativei64. Sinks map toNUMERIC(20,0)(PG/SF) orLARGEINT(StarRocks).Value::Timestamp(micros)replaces the temporary ISO-string shim from PR feat: add MySQL CDC source (BETA) #97 forTIMESTAMP/DATETIMEcolumns.DATETIME(p)microseconds preserved end-to-end (previously the_usparameter was dropped).information_schema.columns— was hardcodedDecimal{38,9}regardless of source.First-run binlog bootstrap (H5)
On a fresh start,
SHOW MASTER STATUSis captured before the snapshot worker spawns and persisted asPROVISIONAL. The post-snapshot CDC stream resumes from that point — no more replaying days of binlogs, no hard error if old binlogs were purged. The first commit promotes the row toACTIVE.Snapshot performance (M3)
Cursor-based keyset paging replaces
MIN(pk)/MAX(pk)+ linear partitioning. Each chunk has bounded row count regardless of PK density — sparse distributions no longer produce empty / oversized chunks.Non-integer PK support (M4)
VARCHAR/CHAR/UUID/BINARY/VARBINARYprimary keys now snapshot-able.PkKind::StrusesCOLLATE utf8mb4_binfor deterministic byte-wise ordering. Composite PK and no-PK tables are skipped with WARN logs.Cleanup
Source::start_replication,checkpoint_position,cleanup— engine usescreate_loopexclusively.SinkResult::last_position— populated by all three sinks but never consumed.Migrations (auto, idempotent)
dbmazz_snapshot_stateaddsstart_pk_text/end_pk_text/pk_kind; relaxesNOT NULLon legacystart_pk/end_pk. Backfills Int-kinded rows.dbmazz_checkpointsadds nullablestatus VARCHAR(16)forPROVISIONALbootstrap rows.Verify matrix
Run against
dbmazz-mysql:devbuilt from this branch:C11 type-roundtrip now passes 7/7 against MySQL — confirms BIGINT UNSIGNED + DATETIME micros + DECIMAL precision + Value::Timestamp end-to-end.
Static checks
cargo fmt --all -- --checkcleancargo clippy --features mysql-source -- -D warningscleancargo clippy -- -D warningscleancargo test --features mysql-source --lib→ 270 passed / 0 failedcargo test --lib→ 158 passed / 0 failedTest plan
cargo fmt --all -- --checkcargo clippy --features mysql-source -- -D warningscargo clippy -- -D warningscargo test --features mysql-sourcecargo testez-cdc verify— PG → PG: 18/0/0ez-cdc verify— PG → SR: 17/0/1ez-cdc verify— MySQL → PG: 18/0/0 (C11 7/7)ez-cdc verify— MySQL → SR: 17/0/1 (C11 7/7)ez-cdc verifyMySQL → Snowflake (deferred — no SF creds in dev-stack)#[ignore]'d test intasks.md)Cross-repo
After merge: open a sister PR on
ez-cdc-cliREADME to drop the historical "C11 expected to FAIL with MySQL" disclaimer (already noted in v0.5.3 CHANGELOG that this is resolved with dbmazz ≥ 2.4.0; v2.5.0 makes it fully accurate without the--skip C11recommendation).Breaking changes (in-tree only)
Value::UInt64andDataType::UInt64are new variants — sinks updated in lockstep.Sourcetrait method removals — PG/MySQL impls updated in lockstep.SinkResult::last_positionremoval — sink impls updated in lockstep.