Skip to content

metrics: add DML rows affected metric#3909

Merged
ti-chi-bot[bot] merged 49 commits intopingcap:masterfrom
wk989898:metric-affected
Mar 2, 2026
Merged

metrics: add DML rows affected metric#3909
ti-chi-bot[bot] merged 49 commits intopingcap:masterfrom
wk989898:metric-affected

Conversation

@wk989898
Copy link
Collaborator

@wk989898 wk989898 commented Jan 4, 2026

What problem does this PR solve?

Issue Number: close #3889

What is changed and how it works?

  • New DML Rows Affected Metric: Introduced ExecDMLEventRowsAffectedCounter to track the number of rows affected by DML operations (insert, update, delete) in the sink, providing granular insights into data changes.
  • Grafana Dashboard Integration: Added a new panel named "Sink Event Row Affected Count / s" to the ticdc_new_arch.json, ticdc_new_arch_next_gen.json, and ticdc_new_arch_with_keyspace_name.json Grafana dashboards to visualize this new metric, enhancing monitoring capabilities.
  • Enhanced DML Tracking Logic: Modified the DML preparation and execution logic in mysql_writer_dml.go to record both actual and expected affected rows for each DML type, leveraging a new rowTypes field in preparedDMLs.
  • Statistics Module Update: The pkg/metrics/statistics.go file now includes dedicated functions (RecordTotalRowsAffected and RecordRowsAffected) to accurately record these new row affected metrics, ensuring comprehensive data collection.

update go-sql-driver/mysql, it influences the decoder if querying the db by the snapshot read. see https://github.com/go-sql-driver/mysql/releases/tag/v1.8.0

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  1. Create a changefeed with sink-uri "mysql://root:@127.0.0.1:3306/?batch-dml-enable=false&multi-stmt-enable=false"
  2. Insert some data into the upstream
  3. Delete some data from the downstream
截屏2026-02-25 12 10 04

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

add DML rows affected metric

Summary by CodeRabbit

  • New Features

    • New Grafana visualizations: "EventService Output Different DML Event Types / s" and "Row Affected Count / s" added to dashboards.
  • Improvements

    • New observability metrics: per-mode DML-type counts and per-row-type affected-row counts.
    • MySQL sink now preserves and reports row-type information through batching, improving accuracy of affected-row metrics and dashboard reporting.

@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 4, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @wk989898, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances TiCDC's monitoring capabilities by introducing a new metric that precisely tracks the number of rows affected by Data Manipulation Language (DML) operations as they are flushed to the downstream system. This provides critical insights into the volume and type of data changes being replicated, improving observability and enabling more accurate performance analysis of the data synchronization process. The new metric is also integrated into the existing Grafana dashboards for immediate visualization.

Highlights

  • New DML Rows Affected Metric: Introduced ExecDMLEventRowsAffectedCounter to track the number of rows affected by DML operations (insert, update, delete) in the sink, providing granular insights into data changes.
  • Grafana Dashboard Integration: Added a new panel named "Sink Event Row Affected Count / s" to the ticdc_new_arch.json, ticdc_new_arch_next_gen.json, and ticdc_new_arch_with_keyspace_name.json Grafana dashboards to visualize this new metric, enhancing monitoring capabilities.
  • Enhanced DML Tracking Logic: Modified the DML preparation and execution logic in mysql_writer_dml.go to record both actual and expected affected rows for each DML type, leveraging a new rowTypes field in preparedDMLs.
  • Statistics Module Update: The pkg/metrics/statistics.go file now includes dedicated functions (RecordTotalRowsAffected and RecordRowsAffected) to accurately record these new row affected metrics, ensuring comprehensive data collection.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new metric, ticdc_sink_dml_event_affected_row_count, to monitor the number of rows affected by DML operations in the sink. The changes are well-implemented, including the definition of the new Prometheus counter, the logic in the MySQL sink to track DML row types, and the recording of both expected and actual affected row counts. The Grafana dashboards have also been updated to visualize this new metric. I have a couple of suggestions for pkg/sink/mysql/mysql_writer_dml.go to improve code maintainability and cleanliness.

@wk989898 wk989898 added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 8, 2026
@wk989898 wk989898 removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 15, 2026
@lidezhu
Copy link
Collaborator

lidezhu commented Jan 15, 2026

Please paste the screenshots of your newly added metrics in the pr description.

if err != nil {
return err
}
shouldSplit, err = event.IsUKChanged(rawEvent, p.currentTxn.CurrentDMLEvent.TableInfo)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a type like 'PkChanged'?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. If the PK changed, there should be two raw events.

Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Grafana panels for DML event types and affected rows; introduces EventService and sink Prometheus counters; threads a mode parameter through event scanning and DML processing; and propagates per-row-type metadata through MySQL sink SQL generation and tests.

Changes

Cohort / File(s) Summary
Grafana dashboards
metrics/grafana/ticdc_new_arch.json, metrics/nextgengrafana/ticdc_new_arch_next_gen.json, metrics/nextgengrafana/ticdc_new_arch_with_keyspace_name.json
Added panels: "EventService Output Different DML Event Types / s" and "Row Affected Count / s" (PromQL targets). Panels inserted in multiple dashboard sections (duplication observed).
EventService flow & raw entry classification
pkg/common/kv_entry.go, pkg/eventservice/event_scanner.go, pkg/eventservice/metrics_collector.go
Added RawKVEntry.GetType(), threaded mode int64 into scanAndMergeEvents and dmlProcessor, updated constructors/signatures to accept mode, and added helper to emit mode-aware DML-type metrics.
Prometheus metrics & statistics API
pkg/metrics/event_service.go, pkg/metrics/sink.go, pkg/metrics/statistics.go
Added EventServiceSendDMLTypeCount (labels: mode,dml_type) and ExecDMLEventRowsAffectedCounter (labels: keyspace,changefeed,type,row_type); added RecordRowsAffected and RecordTotalRowsAffected and registered the new metrics.
MySQL sink: DML generation, batching, and SQL builders
pkg/sink/mysql/...
pkg/sink/mysql/mysql_writer_dml.go, pkg/sink/mysql/mysql_writer_dml_batch.go, pkg/sink/mysql/sql_builder.go, pkg/sink/mysql/mysql_writer_dml_test.go
Propagated per-event rowTypes through SQL generation and batching functions (many signatures now return []common.RowType); added preparedDMLs.rowTypes and RowsAffected(); updated callers and tests to handle the extra return and compute rows-affected.
Other small changes
pkg/metrics/..., pkg/eventservice/...
Metric registration and emission call sites adjusted to include mode and row-type mappings; control-flow updates to ensure mode-aware metrics are emitted consistently.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Source as EventSource
  participant ES as EventService
  participant Metrics as Metrics
  participant Sink as Sink
  participant MySQL as MySQLWriter

  Source->>ES: send raw KV events (include mode)
  ES->>ES: classify event via RawKVEntry.GetType()
  ES->>Metrics: update EventServiceSendDMLTypeCount(mode,dml_type)
  ES->>Sink: forward DML events (include mode, raw type)
  Sink->>MySQL: generate batched SQLs and collect rowTypes
  MySQL->>Metrics: update ExecDMLEventRowsAffectedCounter(keyspace,changefeed,type,row_type)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • lidezhu
  • hongyunyan
  • flowbehappy

Poem

🐇 I hopped through events, mode in tow,
counted inserts, updates, deletes as they go.
Batches baked with row-types known,
metrics sing and dashboards grown,
a carrot chart to watch metrics glow. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'metrics: add DML rows affected metric' accurately summarizes the main change of introducing a new metric to track affected rows in DML operations.
Linked Issues check ✅ Passed The PR implements the requested feature #3889 by adding 'affected rows' metrics and new Grafana dashboard panels. However, 'PK/UK updated' metrics mentioned in the issue are partially addressed through GetType() method but not as a dedicated metric.
Out of Scope Changes check ✅ Passed Changes are focused on adding DML row-affected metrics and Grafana visualizations as requested in #3889. The addition of GetType() and mode propagation in event_scanner.go are in-scope for supporting metric categorization.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering all required sections: problem statement with issue reference, technical changes explanation, test approach with manual test details and screenshots, and release notes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: wk989898 <nhsmwk@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/sink/mysql/sql_builder.go (1)

144-150: ⚠️ Potential issue | 🔴 Critical

Bug: rowTypes is not cleared in reset(), causing stale data when the object is reused from dmlsPool.

Since preparedDMLs instances are pooled via dmlsPool (line 134), omitting rowTypes from reset() means a recycled object will carry over the previous caller's row types, inflating RowsAffected() and corrupting metrics.

Proposed fix
 func (d *preparedDMLs) reset() {
 	d.sqls = d.sqls[:0]
 	d.values = d.values[:0]
 	d.tsPairs = d.tsPairs[:0]
+	d.rowTypes = d.rowTypes[:0]
 	d.rowCount = 0
 	d.approximateSize = 0
 }
pkg/sink/mysql/mysql_writer_dml.go (1)

112-123: ⚠️ Potential issue | 🔴 Critical

genActiveActiveSQL does not return rowTypes — causes index-out-of-bounds in sequenceExecute.

When EnableActiveActive is true (line 112), genActiveActiveSQL returns only ([]string, [][]interface{}) without row types. The rowTypesList remains empty, but dmls.sqls is populated. Later in sequenceExecute at line 777, dmls.rowTypes[i] will panic.

Either genActiveActiveSQL needs to be updated to return []common.RowType, or sequenceExecute needs a bounds check on dmls.rowTypes.

Proposed fix — add bounds check in sequenceExecute
 		rowsAffected, err := res.RowsAffected()
 		if err != nil {
 			log.Warn("get rows affected rows failed", zap.Error(err))
-		} else {
+		} else if i < len(dmls.rowTypes) {
 			w.statistics.RecordRowsAffected(rowsAffected, dmls.rowTypes[i])
+		} else {
+			w.statistics.RecordTotalRowsAffected(rowsAffected, 1)
 		}
🤖 Fix all issues with AI agents
In `@pkg/eventservice/event_scanner.go`:
- Around line 760-778: The change causes event.IsUKChanged to be called for all
updates, changing semantics and cost when p.outputRawChangeEvent is true; revert
to the original behavior by short-circuiting when outputRawChangeEvent is true:
call updateMetricEventServiceSendDMLTypeCount(p.mode, rawType, false) and return
p.currentTxn.AppendRow(rawEvent, p.mounter.DecodeToChunk, p.filter) without
invoking event.IsUKChanged; only call event.IsUKChanged and
updateMetricEventServiceSendDMLTypeCount with the real shouldSplit when
outputRawChangeEvent is false so UK-change decoding happens only when splitting
is actually allowed.

In `@pkg/metrics/event_service.go`:
- Around line 173-178: The Help string for the Prometheus metric created in
EventServiceSendDMLTypeCount (prometheus.NewCounterVec with CounterOpts) has a
grammar mistake; update the CounterOpts.Help value from "The number of different
dml events type sent by the event service" to "The number of different dml event
types sent by the event service" so the metric help text reads correctly.

In `@pkg/metrics/statistics.go`:
- Around line 111-124: The Close() method currently omits cleanup for
ExecDMLEventRowsAffectedCounter causing metric label leaks; update Close() to
call ExecDMLEventRowsAffectedCounter.Delete(...) for the same label sets you use
elsewhere by iterating/removing combinations of keyspace
(b.changefeedID.Keyspace()), changefeedID (b.changefeedID.Name()), the
"actual"/"expected" value dimension and the four row_type values ("total",
"insert", "update", "delete") so that any time-series created by
RecordTotalRowsAffected and RecordRowsAffected are deleted when Close() runs.

In `@pkg/sink/mysql/mysql_writer_dml.go`:
- Around line 571-589: There is a duplicate method generateNormalSQLs; remove
this redundant declaration from this file and rely on the single canonical
implementation in mysql_writer_dml_batch.go (or vice versa if you prefer to keep
this one) so the build no longer sees two definitions. If you remove this file's
generateNormalSQLs, ensure any callers still resolve (they can call the
remaining generateNormalSQLs) and, if needed, replace direct logic here with a
call to the kept function (generateNormalSQLs) or to the helper
generateNormalSQL to avoid duplicating behavior; also clean up any now-unused
imports/variables in this file.
- Around line 366-371: The current conflict-detection path in
mysql_writer_dml.go logs full SQL and values with log.Info immediately before
log.Panic (see variables/functions nextRowType, compareKeys, rowKey, rowLists,
and generateBatchSQLsPerEvent), which is noisy and may leak sensitive data;
change the log.Info call to log.Error or log.Warn and remove or redact sensitive
payloads (avoid logging full values array or replace with a safe summary like
value counts/types or a hashed/redacted representation) while keeping contextual
fields (e.g., writer id, i, j, Events metadata) so the panic still includes
useful non-PII diagnostics.
- Around line 870-891: genUpdateSQL may return multiple SQL strings but the TiDB
branch only appends a single common.RowTypeUpdate per call, causing len(sqls) >
len(rowTypes) and panics in sequenceExecute; fix by appending one RowTypeUpdate
per SQL returned: after s, v := w.genUpdateSQL(rows...) and appending s and v to
sqls/values, append len(s) copies of common.RowTypeUpdate to rowTypes (e.g.,
loop or use slice multiplication) so rowTypes, sqls, and values stay aligned.
- Around line 813-818: The current code calls res.RowsAffected() and passes that
to w.statistics.RecordTotalRowsAffected, but res.RowsAffected() on
multi-statement transactions returns only the final statement's count (often 0);
change the logic to use the aggregated count from dmls.RowsAffected() (or if you
can access the driver raw result, use mysql.Result.AllRowsAffected()) when
calling RecordTotalRowsAffected so total rows across all statements are
recorded; update the block around res.RowsAffected() (referencing
res.RowsAffected(), dmls.RowsAffected(), and
w.statistics.RecordTotalRowsAffected()) to prefer dmls.RowsAffected() and fall
back to per-statement totals only if available from the raw mysql result.
🧹 Nitpick comments (7)
pkg/sink/mysql/sql_builder.go (1)

134-142: Pool's New func doesn't pre-allocate rowTypes, unlike the other slices.

For consistency and to avoid an initial nil-slice allocation on first append, consider pre-allocating rowTypes the same way sqls, values, and tsPairs are.

Proposed fix
 var dmlsPool = sync.Pool{
 	New: func() interface{} {
 		return &preparedDMLs{
 			sqls:    make([]string, 0, 128),
 			values:  make([][]interface{}, 0, 128),
+			rowTypes: make([]common.RowType, 0, 128),
 			tsPairs: make([]tsPair, 0, 128),
 		}
 	},
 }
pkg/common/kv_entry.go (1)

78-93: Minor: redundant condition in else if.

On line 84, since we already know len(v.OldValue) != 0 (the if on line 82 was false), the len(v.OldValue) > 0 check is always true here. You can simplify to just len(v.Value) > 0.

Also note: an OpTypePut with OldValue present but Value empty will fall through to "unknown". This matches the existing IsInsert/IsUpdate helpers, so it seems intentional, but worth a comment if that state is truly impossible.

Simplification
 func (v *RawKVEntry) GetType() string {
 	switch v.OpType {
 	case OpTypePut:
 		if len(v.OldValue) == 0 {
 			return "insert"
-		} else if len(v.OldValue) > 0 && len(v.Value) > 0 {
+		} else if len(v.Value) > 0 {
 			return "update"
 		}
 	case OpTypeDelete:
pkg/metrics/statistics.go (1)

111-116: Exported methods missing doc comments.

RecordTotalRowsAffected and RecordRowsAffected are exported but lack Go doc comments. The rest of the exported methods in this file (RecordBatchExecution, RecordDDLExecution, Close) all have doc comments.

Add doc comments
+// RecordTotalRowsAffected records the total actual and expected rows affected by DML execution.
 func (b *Statistics) RecordTotalRowsAffected(actualRowsAffected, expectedRowsAffected int64) {
+// RecordRowsAffected records the actual rows affected for a specific row type and updates the total counters.
 func (b *Statistics) RecordRowsAffected(rowsAffected int64, rowType common.RowType) {
pkg/sink/mysql/mysql_writer_dml.go (1)

681-685: tx.Commit() error not wrapped with errors.Trace.

Per coding guidelines, errors from library calls should be wrapped immediately. The error from tx.Commit() is returned bare.

Proposed fix
 		if err = tx.Commit(); err != nil {
-			return 0, 0, err
+			return 0, 0, errors.Trace(err)
 		}

As per coding guidelines, "When an error comes from a third-party or library call in Go, wrap it immediately with errors.Trace(err) or errors.WrapError(...) to attach a stack trace".

pkg/sink/mysql/mysql_writer_dml_test.go (2)

149-156: rowTypes values not asserted beyond length checks.

Throughout the tests, rowTypes is captured but only its length is sometimes asserted (e.g., line 152). The actual content of rowTypes (e.g., whether it contains RowTypeInsert, RowTypeDelete, RowTypeUpdate) is never verified. Adding value assertions would strengthen the test coverage for the new metric propagation.

For example, at line 152 you could additionally check:

require.Equal(t, common.RowTypeInsert, rowTypes[0])
require.Equal(t, common.RowTypeInsert, rowTypes[1])

160-163: Missing len(rowTypes) assertion for safe-mode batch SQL test.

Line 160 captures rowTypes from generateBatchSQL in safe mode, but unlike the unsafe-mode test at line 152, there is no require.Equal(t, 1, len(rowTypes)) assertion here. This should be consistent with other test cases.

Proposed fix
 	sql, args, rowTypes = writer.generateBatchSQL([]*commonEvent.DMLEvent{dmlInsertEvent, dmlInsertEvent2, dmlInsertEvent3})
 	require.Equal(t, 1, len(sql))
 	require.Equal(t, 1, len(args))
+	require.Equal(t, 1, len(rowTypes))
 	require.Equal(t, "REPLACE INTO `test`.`t` (`id`,`name`) VALUES (?,?),(?,?),(?,?)", sql[0])
pkg/eventservice/event_scanner.go (1)

143-143: Redundant mode parameter — scanAndMergeEvents already has access via receiver.

scanAndMergeEvents is a method on *eventScanner, which already stores mode as a field (s.mode). Passing it as an explicit parameter is unnecessary.

♻️ Proposed simplification
-func (s *eventScanner) scanAndMergeEvents(
-	session *session,
-	merger *eventMerger,
-	iter eventstore.EventIterator,
-	mode int64,
-) (bool, error) {
+func (s *eventScanner) scanAndMergeEvents(
+	session *session,
+	merger *eventMerger,
+	iter eventstore.EventIterator,
+) (bool, error) {
 	tableID := session.dataRange.Span.TableID
 	dispatcher := session.dispatcherStat
-	processor := newDMLProcessor(s.mounter, s.schemaGetter, dispatcher.filter, dispatcher.info.IsOutputRawChangeEvent(), mode)
+	processor := newDMLProcessor(s.mounter, s.schemaGetter, dispatcher.filter, dispatcher.info.IsOutputRawChangeEvent(), s.mode)

And update the call site:

-	interrupted, err := s.scanAndMergeEvents(sess, merger, iter, s.mode)
+	interrupted, err := s.scanAndMergeEvents(sess, merger, iter)

Also applies to: 186-194

Comment on lines +173 to +178
EventServiceSendDMLTypeCount = prometheus.NewCounterVec(prometheus.CounterOpts{
Namespace: "ticdc",
Subsystem: "event_service",
Name: "send_dml_type_count",
Help: "The number of different dml events type sent by the event service",
}, []string{"mode", "dml_type"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor grammar nit in Help text.

"The number of different dml events type sent by the event service" should be "The number of different dml event types sent by the event service".

Proposed fix
 	EventServiceSendDMLTypeCount = prometheus.NewCounterVec(prometheus.CounterOpts{
 		Namespace: "ticdc",
 		Subsystem: "event_service",
 		Name:      "send_dml_type_count",
-		Help:      "The number of different dml events type sent by the event service",
+		Help:      "The number of different dml event types sent by the event service",
 	}, []string{"mode", "dml_type"})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
EventServiceSendDMLTypeCount = prometheus.NewCounterVec(prometheus.CounterOpts{
Namespace: "ticdc",
Subsystem: "event_service",
Name: "send_dml_type_count",
Help: "The number of different dml events type sent by the event service",
}, []string{"mode", "dml_type"})
EventServiceSendDMLTypeCount = prometheus.NewCounterVec(prometheus.CounterOpts{
Namespace: "ticdc",
Subsystem: "event_service",
Name: "send_dml_type_count",
Help: "The number of different dml event types sent by the event service",
}, []string{"mode", "dml_type"})
🤖 Prompt for AI Agents
In `@pkg/metrics/event_service.go` around lines 173 - 178, The Help string for the
Prometheus metric created in EventServiceSendDMLTypeCount
(prometheus.NewCounterVec with CounterOpts) has a grammar mistake; update the
CounterOpts.Help value from "The number of different dml events type sent by the
event service" to "The number of different dml event types sent by the event
service" so the metric help text reads correctly.

Comment on lines +366 to +371
if nextRowType == common.RowTypeInsert {
if compareKeys(rowKey, rowLists[j].RowKeys) {
sql, values, rowTypes := w.generateBatchSQLsPerEvent(events)
log.Info("normal sql should be", zap.Any("sql", sql), zap.Any("values", values), zap.Any("rowTypes", rowTypes), zap.Int("writerID", w.id))
log.Panic("Here are two invalid rows with the same row type and keys", zap.Any("Events", events), zap.Any("i", i), zap.Any("j", j), zap.Int("writerID", w.id))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

log.Info with SQL values on conflict detection paths — noisy and potentially sensitive.

Lines 369 and 408 use log.Info to dump SQL queries and their values right before a log.Panic. Since log.Panic will already terminate the process, this log.Info line should use log.Error or log.Warn at minimum, and consider whether logging full SQL values is appropriate in production (may contain PII).

🤖 Prompt for AI Agents
In `@pkg/sink/mysql/mysql_writer_dml.go` around lines 366 - 371, The current
conflict-detection path in mysql_writer_dml.go logs full SQL and values with
log.Info immediately before log.Panic (see variables/functions nextRowType,
compareKeys, rowKey, rowLists, and generateBatchSQLsPerEvent), which is noisy
and may leak sensitive data; change the log.Info call to log.Error or log.Warn
and remove or redact sensitive payloads (avoid logging full values array or
replace with a safe summary like value counts/types or a hashed/redacted
representation) while keeping contextual fields (e.g., writer id, i, j, Events
metadata) so the panic still includes useful non-PII diagnostics.

Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/sink/mysql/mysql_writer_dml.go (1)

92-107: ⚠️ Potential issue | 🟠 Major

Active-active path omits rowTypes tracking — metrics will silently fail.

The active-active SQL generation path (genActiveActiveSQL and its callees) returns only ([]string, [][]interface{}) while all other paths return ([]string, [][]interface{}, []common.RowType). At line 96, genActiveActiveSQL is called but only two values are captured, leaving rowTypesList uninitialized. Line 106 then appends nil to dmls.rowTypes, creating a length mismatch: dmls.sqls will be populated while dmls.rowTypes remains empty. The RowsAffected() method (sql_builder.go:120) will iterate an empty slice and return 0, causing metrics collection to silently fail for active-active replication.

Update genActiveActiveSQL, generateActiveActiveNormalSQLs, and generateActiveActiveBatchSQL to return []common.RowType alongside the current return values, and capture all three values at line 96.

Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
@wk989898 wk989898 added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 26, 2026
Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
@wk989898 wk989898 removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 26, 2026
Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
@wk989898 wk989898 added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 27, 2026
Signed-off-by: wk989898 <nhsmwk@gmail.com>
@wk989898 wk989898 added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 28, 2026
Signed-off-by: wk989898 <nhsmwk@gmail.com>
@wk989898
Copy link
Collaborator Author

/test all

Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
@wk989898
Copy link
Collaborator Author

/test all

Signed-off-by: wk989898 <nhsmwk@gmail.com>
Signed-off-by: wk989898 <nhsmwk@gmail.com>
@wk989898 wk989898 removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 2, 2026
@ti-chi-bot ti-chi-bot bot merged commit bd20237 into pingcap:master Mar 2, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add more metrics to improve observability

4 participants