Skip to content

importinto,external: run merge serially with 64-file groups#68135

Open
joechenrh wants to merge 11 commits intopingcap:masterfrom
joechenrh:codex/limit-open-readers-64
Open

importinto,external: run merge serially with 64-file groups#68135
joechenrh wants to merge 11 commits intopingcap:masterfrom
joechenrh:codex/limit-open-readers-64

Conversation

@joechenrh
Copy link
Copy Markdown
Contributor

@joechenrh joechenrh commented Apr 30, 2026

What problem does this PR solve?

Issue Number: ref #00000

Problem Summary:

For IMPORT INTO global-sort merge-sort workloads with many overlap files, concurrent merge workers can create a high burst of object-storage reads and cause unstable throughput.

What changed and how does it work?

  • Run merge-sort tasks serially in import-into merge step:
    • NewMergeOperator(..., concurrency=1, ...)
    • MergeOverlappingFiles(..., concurrency=1, ...)
  • Reduce per-merge-task file fan-in cap:
    • MaxMergingFilesPerThread from 250 to 64

Net effect:

  • One merge task runs at a time per subtask.
  • Each merge task handles at most 64 files.
  • Read pressure is smoother and more predictable.

Check List

Tests

  • Unit test
    • go test ./pkg/lightning/backend/external -run 'TestMergeKVIter|TestMergeKVIterPassWrongParam|TestSplitDataFiles|TestByteReader|TestByteReaderAuxBuf' -count=1
    • go test ./pkg/dxf/importinto -run 'TestGetOnDupForKVGroup' -count=1
  • Integration test
  • Manual test (add detailed scripts or steps below)
    • Verify in real environment with IMPORT INTO global-sort: compare merge-sort throughput/latency stability and object-storage retry/reset behavior before/after.
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • Bug Fixes

    • Merge operations now run with enforced serial merging and smaller per-part file counts for more stable, predictable imports.
  • New Features

    • Added merge-sort read and write timing metrics for cloud storage operations.
    • Added end-to-end import runtime and per-job timing metrics, plus extra timing logged for internal data transfer steps to aid performance analysis.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 30, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Apr 30, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a3e54ea4-4ddd-4dc5-9901-d0c77bb9c5a1

📥 Commits

Reviewing files that changed from the base of the PR and between 2a8d31b and 4c583ea.

📒 Files selected for processing (1)
  • pkg/lightning/backend/external/engine.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/lightning/backend/external/engine.go

📝 Walkthrough

Walkthrough

For merge-sort, concurrency is hardcoded to 1 and MaxMergingFilesPerThread is reduced (250→64); Prometheus histograms were added to observe merge-sort read and write durations; engine send-to-out timing was added. Separately, import runtime stats were introduced and threaded through job generation and workers with final summary logging.

Changes

Merge Concurrency Serialization & I/O Metrics

Layer / File(s) Summary
Merge Configuration Constant
pkg/lightning/backend/external/merge.go
MaxMergingFilesPerThread changed from 250 to 64, altering share sizing in splitDataFiles.
Merge Task Scheduling
pkg/dxf/importinto/task_executor.go
mergeSortStepExecutor.RunSubtask now passes 1 (serial) instead of int(m.GetResource().CPU.Capacity()) to external.NewMergeOperator and external.MergeOverlappingFiles.
Merge-read instrumentation
pkg/lightning/backend/external/iter.go, pkg/lightning/backend/external/byte_reader.go
NewMergeKVIter sets byteReader.mergeSortReadHist; byteReader imports time and records reload latency to the histogram when configured.
Merge-write instrumentation
pkg/lightning/backend/external/onefile_writer.go
OneFileWriter gains mergeSortWriteHist and a durationObservedWriter wrapper; writes and closes are timed and observed; histogram initialized in InitPartSizeAndLogger.
Engine send timing
pkg/lightning/backend/external/engine.go
loadRangeBatchData records duration spent sending DataAndRanges to outCh and logs sendToOutChDur alongside existing metrics.

Local Import Runtime Stats

Layer / File(s) Summary
Stats data structure
pkg/lightning/backend/local/import_runtime_stats.go
Adds importRuntimeStats with atomic counters and helper methods; zapFields() exposes metrics as zap fields.
Worker timing instrumentation
pkg/lightning/backend/local/job_worker.go
regionJobBaseWorker gains stats *importRuntimeStats; job run, write, and ingest durations are measured and reported when enabled; getRegionJobWorkerPool accepts optional stats.
Import pipeline wiring
pkg/lightning/backend/local/local.go
doImport creates runtimeStats, passes it into job generation and worker setup; generation and job-send durations are measured; import runtime summary is logged at completion.

Sequence Diagram(s)

sequenceDiagram
  participant doImport
  participant JobGenerator
  participant JobPool
  participant Worker
  participant RuntimeStats
  doImport->>RuntimeStats: create
  doImport->>JobGenerator: pass RuntimeStats
  JobGenerator->>JobPool: send jobs (timed)
  JobPool->>Worker: create with RuntimeStats
  Worker->>RuntimeStats: record durations
  doImport->>doImport: log runtime summary from RuntimeStats
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • pingcap/tidb#66958: Also modifies pkg/dxf/importinto/task_executor.go merge execution arguments.
  • pingcap/tidb#66564: Related changes to concurrency sizing for merge/encode operations.

Suggested labels

component/import, component/lightning, ok-to-test

Suggested reviewers

  • GMHDBJD
  • Benjamin2037

Poem

🐰 I time each read and watch each write,
I shrink the batches, merge by single light,
One thread hops onward, tidy and neat,
Counters count hops and durations complete,
When import ends, the logs hum soft and sweet.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly summarizes the main changes: running merge serially with 64-file groups in importinto and external modules.
Description check ✅ Passed The description includes the required sections with clear problem statement, detailed explanation of changes, and a comprehensive checklist with unit tests and manual verification.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 30, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 30, 2026

Hi @joechenrh. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 6, 2026
@joechenrh joechenrh force-pushed the codex/limit-open-readers-64 branch 2 times, most recently from 35034c8 to 6ec846d Compare May 7, 2026 06:40
@ti-chi-bot ti-chi-bot Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 7, 2026
@joechenrh joechenrh force-pushed the codex/limit-open-readers-64 branch from 6ec846d to afd283e Compare May 7, 2026 06:50
@ti-chi-bot ti-chi-bot Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 7, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 7, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign benjamin2037 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 7, 2026
@ti-chi-bot ti-chi-bot Bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 7, 2026
@joechenrh joechenrh marked this pull request as ready for review May 7, 2026 09:40
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 7, 2026

@joechenrh I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@joechenrh joechenrh changed the title lightning: limit merge reader open concurrency to 64 importinto,external: run merge serially with 64-file groups May 7, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/dxf/importinto/task_executor.go`:
- Around line 525-533: The call to external.MergeOverlappingFiles currently
hardcodes the merge-task concurrency to 1 (the third arg), which serializes
merges; change this to use a configurable or CPU-based value (e.g.,
mergeTaskConcurrency from config or runtime.NumCPU()/max(1, cpu/…)) and keep a
separate per-task reader-open cap (the parameter that controls reader-open
bursts) set to 1 to limit startup open pressure; update the call site of
external.MergeOverlappingFiles (the invocation with wctx, sm.DataFiles, 1, ...)
to pass mergeTaskConcurrency for task-level parallelism and the reader-open
concurrency value for per-task limit so merge parallelism is preserved while
bounding reader-open bursts.

In `@pkg/lightning/backend/external/merge.go`:
- Around line 40-41: Update the in-file worked memory estimate comment to
reflect the reduced MaxMergingFilesPerThread value (MaxMergingFilesPerThread =
64) so the example matches current constants; locate the existing worked memory
estimate comment in the same file (merge.go) and change the numeric example from
250 to an appropriate value derived from 64 (or recalc the example using
MaxMergingFilesPerThread) and clarify the units so future readers/operators see
an accurate, maintainable performance trade-off tied to
MaxMergingFilesPerThread.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dccadb49-3c70-4be0-b39a-c0978ed7cf67

📥 Commits

Reviewing files that changed from the base of the PR and between 84d8269 and 770d10e.

📒 Files selected for processing (2)
  • pkg/dxf/importinto/task_executor.go
  • pkg/lightning/backend/external/merge.go

Comment on lines +525 to +533
1,
false,
onDup,
)

if err = external.MergeOverlappingFiles(
wctx,
sm.DataFiles,
int(m.GetResource().CPU.Capacity()), // the concurrency used to split subtask
1, // run merge tasks serially
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This change serializes merge execution, not just reader-open bursts.

Setting both concurrencies to 1 fully disables merge parallelism and can significantly increase merge wall time. If the goal is to cap startup open pressure to 64, keep merge-task concurrency configurable (or CPU-based) and only bound per-task reader-open concurrency.

As per coding guidelines, important performance trade-offs should be explicit and maintainable for future readers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dxf/importinto/task_executor.go` around lines 525 - 533, The call to
external.MergeOverlappingFiles currently hardcodes the merge-task concurrency to
1 (the third arg), which serializes merges; change this to use a configurable or
CPU-based value (e.g., mergeTaskConcurrency from config or
runtime.NumCPU()/max(1, cpu/…)) and keep a separate per-task reader-open cap
(the parameter that controls reader-open bursts) set to 1 to limit startup open
pressure; update the call site of external.MergeOverlappingFiles (the invocation
with wctx, sm.DataFiles, 1, ...) to pass mergeTaskConcurrency for task-level
parallelism and the reader-open concurrency value for per-task limit so merge
parallelism is preserved while bounding reader-open bursts.

Comment on lines +40 to +41
// single thread.
MaxMergingFilesPerThread = 64
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the in-file memory example to match the new merge-file cap.

After reducing MaxMergingFilesPerThread to 64, the worked memory estimate comment (later in this file) still uses 250, which is now misleading for operators/debugging.

As per coding guidelines, comments should capture important performance trade-offs and stay maintainable for future readers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/lightning/backend/external/merge.go` around lines 40 - 41, Update the
in-file worked memory estimate comment to reflect the reduced
MaxMergingFilesPerThread value (MaxMergingFilesPerThread = 64) so the example
matches current constants; locate the existing worked memory estimate comment in
the same file (merge.go) and change the numeric example from 250 to an
appropriate value derived from 64 (or recalc the example using
MaxMergingFilesPerThread) and clarify the units so future readers/operators see
an accurate, maintainable performance trade-off tied to
MaxMergingFilesPerThread.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 61.60000% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.2375%. Comparing base (9d1ac81) to head (4c583ea).
⚠️ Report is 18 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68135        +/-   ##
================================================
- Coverage   77.7620%   77.2375%   -0.5246%     
================================================
  Files          1990       1985         -5     
  Lines        551774     558722      +6948     
================================================
+ Hits         429071     431543      +2472     
- Misses       121783     126386      +4603     
+ Partials        920        793       -127     
Flag Coverage Δ
integration 42.7462% <61.6000%> (+2.9443%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4888% <ø> (ø)
parser ∅ <ø> (∅)
br 50.0597% <ø> (-13.0338%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ti-chi-bot ti-chi-bot Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 7, 2026
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-tests-checked size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 7, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/lightning/backend/local/import_runtime_stats.go (1)

41-47: 💤 Low value

sendCnt parameter is always identical to jobCnt at the call site — consider collapsing.

In local.go the call is always addGenerateStats(len(jobs), generateDur, time.Since(sendStart), len(jobs)), meaning generateJobCnt and sendJobToWorkerChCnt in zapFields() will always be equal, which adds noise to the log output without providing additional signal. Either drop the separate sendCnt tracking or document why the two counts can diverge.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/lightning/backend/local/import_runtime_stats.go` around lines 41 - 47,
The addGenerateStats function records both generateJobCnt and
sendJobToWorkerChCnt but sendCnt is always passed as len(jobs) at the call site,
so collapse the duplicate metric: remove the sendCnt parameter from
addGenerateStats and either (a) increment sendJobToWorkerChCnt from jobCnt
inside addGenerateStats (use the existing jobCnt value) or (b) drop
sendJobToWorkerChCnt entirely and remove its usage in zapFields (and any related
metric registration). Update all callers (e.g., the call in local.go that
currently does addGenerateStats(len(jobs), generateDur, time.Since(sendStart),
len(jobs))) to the new signature and adjust zapFields to avoid logging a
redundant count or add a short comment if you intentionally keep both because
they may diverge in future.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/lightning/backend/external/engine.go`:
- Around line 414-419: The logged "loadedKVs" value is taken from
len(e.memKVsAndBuffers.kvs) after that slice is cleared, so it will always be
zero; fix by capturing the count before the cleanup (e.g., assign loadedKVsCount
:= len(e.memKVsAndBuffers.kvs) before the slice is cleared) and then use that
captured loadedKVsCount in the log call that builds "load range batch done"
(replace the current len(...) usage with the saved variable).

In `@pkg/lightning/backend/local/import_runtime_stats.go`:
- Around line 1-88: Add the new source file import_runtime_stats.go to the
go_library srcs in the pkg/lightning/backend/local BUILD.bazel so Bazel can see
the definitions (e.g., importRuntimeStats, addGenerateStats, zapFields); open
the BUILD.bazel's go_library that currently lists "iterator.go" and
"job_worker.go" and insert "import_runtime_stats.go", into the srcs list in
alphabetical order between "iterator.go" and "job_worker.go" and then run bazel
build to verify the undefined symbol errors are resolved.

---

Nitpick comments:
In `@pkg/lightning/backend/local/import_runtime_stats.go`:
- Around line 41-47: The addGenerateStats function records both generateJobCnt
and sendJobToWorkerChCnt but sendCnt is always passed as len(jobs) at the call
site, so collapse the duplicate metric: remove the sendCnt parameter from
addGenerateStats and either (a) increment sendJobToWorkerChCnt from jobCnt
inside addGenerateStats (use the existing jobCnt value) or (b) drop
sendJobToWorkerChCnt entirely and remove its usage in zapFields (and any related
metric registration). Update all callers (e.g., the call in local.go that
currently does addGenerateStats(len(jobs), generateDur, time.Since(sendStart),
len(jobs))) to the new signature and adjust zapFields to avoid logging a
redundant count or add a short comment if you intentionally keep both because
they may diverge in future.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1e759cda-62cf-410a-8a6f-adf2758441aa

📥 Commits

Reviewing files that changed from the base of the PR and between 2236406 and 2a8d31b.

📒 Files selected for processing (4)
  • pkg/lightning/backend/external/engine.go
  • pkg/lightning/backend/local/import_runtime_stats.go
  • pkg/lightning/backend/local/job_worker.go
  • pkg/lightning/backend/local/local.go

Comment thread pkg/lightning/backend/external/engine.go
Comment on lines +1 to +88
// Copyright 2026 PingCAP, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package local

import (
"time"

"go.uber.org/atomic"
"go.uber.org/zap"
)

// importRuntimeStats tracks ingest runtime breakdown to locate bottlenecks.
type importRuntimeStats struct {
generateBatchCnt atomic.Int64
generateJobCnt atomic.Int64
generateJobsDuration atomic.Int64
sendJobToWorkerChCnt atomic.Int64
sendJobToWorkerChDur atomic.Int64
workerJobRunCnt atomic.Int64
workerJobRunDuration atomic.Int64
workerWriteCnt atomic.Int64
workerWriteDuration atomic.Int64
workerIngestCnt atomic.Int64
workerIngestDuration atomic.Int64
retryableWriteErrorCnt atomic.Int64
retryableIngestErrorCnt atomic.Int64
}

func (s *importRuntimeStats) addGenerateStats(jobCnt int, genDur, sendDur time.Duration, sendCnt int) {
s.generateBatchCnt.Inc()
s.generateJobCnt.Add(int64(jobCnt))
s.generateJobsDuration.Add(genDur.Nanoseconds())
s.sendJobToWorkerChCnt.Add(int64(sendCnt))
s.sendJobToWorkerChDur.Add(sendDur.Nanoseconds())
}

func (s *importRuntimeStats) addWorkerJobRunDur(d time.Duration) {
s.workerJobRunCnt.Inc()
s.workerJobRunDuration.Add(d.Nanoseconds())
}

func (s *importRuntimeStats) addWriteDur(d time.Duration) {
s.workerWriteCnt.Inc()
s.workerWriteDuration.Add(d.Nanoseconds())
}

func (s *importRuntimeStats) addIngestDur(d time.Duration) {
s.workerIngestCnt.Inc()
s.workerIngestDuration.Add(d.Nanoseconds())
}

func (s *importRuntimeStats) incRetryableWriteErr() {
s.retryableWriteErrorCnt.Inc()
}

func (s *importRuntimeStats) incRetryableIngestErr() {
s.retryableIngestErrorCnt.Inc()
}

func (s *importRuntimeStats) zapFields() []zap.Field {
return []zap.Field{
zap.Int64("generate-batches", s.generateBatchCnt.Load()),
zap.Int64("generated-jobs", s.generateJobCnt.Load()),
zap.Duration("generate-jobs-total", time.Duration(s.generateJobsDuration.Load())),
zap.Int64("send-job-to-worker-ch-count", s.sendJobToWorkerChCnt.Load()),
zap.Duration("send-job-to-worker-ch-total", time.Duration(s.sendJobToWorkerChDur.Load())),
zap.Int64("worker-job-run-count", s.workerJobRunCnt.Load()),
zap.Duration("worker-job-run-total", time.Duration(s.workerJobRunDuration.Load())),
zap.Int64("worker-write-count", s.workerWriteCnt.Load()),
zap.Duration("worker-write-total", time.Duration(s.workerWriteDuration.Load())),
zap.Int64("worker-ingest-count", s.workerIngestCnt.Load()),
zap.Duration("worker-ingest-total", time.Duration(s.workerIngestDuration.Load())),
zap.Int64("retryable-write-errors", s.retryableWriteErrorCnt.Load()),
zap.Int64("retryable-ingest-errors", s.retryableIngestErrorCnt.Load()),
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find BUILD.bazel in the local package and check if import_runtime_stats.go is listed
fd -e bazel "BUILD" pkg/lightning/backend/local/ | xargs grep -l "go_library" 2>/dev/null | head -5

echo "--- current srcs in BUILD.bazel ---"
fd -e bazel "BUILD" pkg/lightning/backend/local/ | xargs grep -A 80 "go_library" | grep "\.go"

Repository: pingcap/tidb

Length of output: 590


🏁 Script executed:

cat pkg/lightning/backend/local/BUILD.bazel | grep -A 200 "go_library" | head -150

Repository: pingcap/tidb

Length of output: 5023


🏁 Script executed:

ls -la pkg/lightning/backend/local/import_runtime_stats.go 2>&1

Repository: pingcap/tidb

Length of output: 157


Add import_runtime_stats.go to the srcs list in pkg/lightning/backend/local/BUILD.bazel.

The file exists but is missing from the go_library targets' srcs declaration, causing Bazel builds to fail with undefined symbol errors. Insert "import_runtime_stats.go", in alphabetical order between "iterator.go" and "job_worker.go".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/lightning/backend/local/import_runtime_stats.go` around lines 1 - 88, Add
the new source file import_runtime_stats.go to the go_library srcs in the
pkg/lightning/backend/local BUILD.bazel so Bazel can see the definitions (e.g.,
importRuntimeStats, addGenerateStats, zapFields); open the BUILD.bazel's
go_library that currently lists "iterator.go" and "job_worker.go" and insert
"import_runtime_stats.go", into the srcs list in alphabetical order between
"iterator.go" and "job_worker.go" and then run bazel build to verify the
undefined symbol errors are resolved.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 7, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.


Notice: To remove the do-not-merge/needs-tests-checked label, please finished the tests then check the finished items in description.

For example:

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

📖 For more info, you can check the "Contribute Code" section in the development guide.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 7, 2026

@joechenrh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-build-next-gen 4c583ea link true /test pull-build-next-gen
idc-jenkins-ci-tidb/build 4c583ea link true /test build
idc-jenkins-ci-tidb/unit-test 4c583ea link true /test unit-test
pull-unit-test-next-gen 4c583ea link true /test pull-unit-test-next-gen
pull-lightning-integration-test 4c583ea link true /test pull-lightning-integration-test
idc-jenkins-ci-tidb/check_dev_2 4c583ea link true /test check-dev2
pull-integration-realcluster-test-next-gen 4c583ea link true /test pull-integration-realcluster-test-next-gen

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant