workload: add json_zstd workload by lidezhu · Pull Request #4236 · pingcap/ticdc

lidezhu · 2026-02-22T12:27:53Z

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

Release Notes

New Features
- Added JSON Zstd workload type for generating and managing large JSON-like payloads with compression-friendly characteristics.
- Added --batch-in-txn flag to wrap each batch in an explicit transaction.
- Added --json-payload-mode flag with options (const, zstd, random) to control payload generation.
Documentation
- Updated configuration guides with new workload type and flag options.
Tests
- Added comprehensive test coverage for the JSON Zstd workload.

ti-chi-bot · 2026-02-22T12:28:07Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hongyunyan for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2026-02-22T12:28:10Z

Summary of Changes

Hello @lidezhu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the workload testing tool by adding a new workload type specifically designed for simulating large-row upsert scenarios. This new 'staging_forward_index' workload will be instrumental in performance testing and benchmarking systems that frequently handle updates to substantial binary data fields, ensuring robust performance under such conditions.

Highlights

New Workload Type: Introduced a new workload type named staging_forward_index to simulate large-row upsert operations.
Large Row Upsert Simulation: This workload generates INSERT ... ON DUPLICATE KEY UPDATE statements for tables with VARBINARY, MEDIUMBLOB, and BLOB columns, mimicking a 'staging forward index' pattern.
Configurable Row Sizing: The new workload allows configuration of row-size and update-key-space to control the size of data and the range of keys for updates.
Workload Integration: The new workload is integrated into the workload application, allowing it to be selected via the -workload-type flag for prepare, insert, and update actions.
Documentation and Tests: Added documentation for the new workload in readme.md and included unit tests for its row sizing and SQL generation logic.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

tools/workload/app.go
- Imported the new pforwardindex package.
- Added staging_forward_index as a new constant for workload types.
- Integrated the new workload into the createWorkload function to instantiate StagingForwardIndexWorkload.
- Extended the doInsert function to handle inserts for the stagingForwardIndex workload type.
tools/workload/config.go
- Updated the workload-type command-line flag description to include staging_forward_index as a valid option.
tools/workload/readme.md
- Added a new section detailing the 'Staging Forward Index Upsert Workload', including its purpose and an example usage command.
tools/workload/schema/forwardindex/staging_forward_index.go
- Created a new file defining the StagingForwardIndexWorkload struct and its associated methods.
- Implemented logic for creating the staging_forward_index table with VARBINARY, MEDIUMBLOB, and BLOB columns.
- Provided methods to build INSERT ... ON DUPLICATE KEY UPDATE SQL statements for both direct SQL and parameterized queries.
- Included methods for generating random payload data and managing random number generation.
tools/workload/schema/forwardindex/staging_forward_index_test.go
- Created a new test file for StagingForwardIndexWorkload.
- Added tests to verify correct row sizing and clamping behavior for debugInfoSize, debugInfoHistorySize, and adDocSize.
- Included a test for BuildInsertSqlWithValues to ensure correct SQL generation and value population.
tools/workload/update.go
- Imported the new pforwardindex package.
- Modified the executeUpdate function to dispatch updates for the stagingForwardIndex workload type.
- Added executeStagingForwardIndexUpdate function to handle update logic specific to the new workload.

Activity

The pull request is marked as [DNM] (Do Not Merge), indicating it is likely a work-in-progress or a draft.
The description body contains boilerplate text with placeholders for issue numbers and details, suggesting no specific human activity or detailed explanation has been provided by the author yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-22T12:28:16Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Introduces a new JSON Zstd workload type for compression-friendly payload testing. Adds transaction wrapping via BatchInTxn configuration flag. Refactors insert/update/delete execution flows to use transactional wrappers (runTransaction) that capture and accumulate flushed row counts. Includes comprehensive implementation, tests, and documentation.

Changes

Cohort / File(s)	Summary
JSON Zstd Workload Implementation `tools/workload/schema/jsonzstd/json_zstd.go`, `tools/workload/schema/jsonzstd/json_zstd_test.go`, `tools/workload/schema/jsonzstd/readme.md`	New workload generator supporting JSON payloads with ZSTD/heavy/random compression modes. Includes entity and batch metadata table variants, payload generation, and SQL builders for create/insert/update/delete. Comprehensive test coverage for initialization, SQL generation, and payload integrity across table configurations.
Transaction Support Infrastructure `tools/workload/txn.go`	New file introducing transaction lifecycle helpers: runTransaction (wraps operations with optional explicit transactions), beginTransaction, commitTransaction, rollbackTransaction. Respects BatchInTxn flag; includes logging on failures via plog.
Configuration Extensions `tools/workload/config.go`	Adds BatchInTxn (bool) and JSONPayloadMode (string) fields to WorkloadConfig with defaults (false, "const"). New CLI flags --batch-in-txn and --json-payload-mode. Updates workload-type description to include json_zstd option.
Insert Path Refactor `tools/workload/app.go`	Replaces doInsert with doInsertOnce returning (uint64, error). Insert loop now uses runTransaction wrapper to capture flushed row counts. Adds support for JSONZstd workload type. Accumulates flushed rows into Stats per transaction.
Update Path Refactor `tools/workload/update.go`	Adds executeJSONZstdUpdate handler. Reworks update execution with runTransaction wrapper replacing direct doUpdate. Updates method signatures: doUpdateOnce, processUpdateTask, executeUpdate, and all per-workload handlers (executeBank2Update, executeSysbenchUpdate, executeRegularUpdate) now accept \*updateTask and return (uint64, error). Tracks generatedSQL for logging.
Delete Path Refactor `tools/workload/delete.go`	Replaces direct doDelete with transactional wrapper via runTransaction. Introduces doDeleteOnce. Updates processDeleteTask, executeDelete, handleDeleteError, processDeleteResult signatures to use \*deleteTask and return (uint64, error). Captures and accumulates flushed row counts per transaction.
Documentation `tools/workload/readme.md`, `tools/workload/schema/bankupdate/readme.md`, `tools/workload/schema/sysbench/readme.md`	Adds JSON Zstd workload section with example invocation and payload mode notes. Extends existing schema readmes with -batch-in-txn configuration option documentation.

Sequence Diagram(s)

sequenceDiagram
    participant Worker as Insert/Update/Delete Worker
    participant App as WorkloadApp
    participant Txn as Transaction Manager
    participant DB as Database Connection
    
    Worker->>App: runTransaction(conn, operation)
    alt BatchInTxn enabled
        App->>Txn: beginTransaction(conn)
        Txn->>DB: BEGIN
        DB-->>Txn: Success
        Txn-->>App: Transaction started
        App->>App: doInsertOnce/doUpdateOnce/doDeleteOnce()
        App->>DB: Execute SQL
        DB-->>App: Result + row count
        App->>Txn: commitTransaction(conn)
        Txn->>DB: COMMIT
        DB-->>Txn: Success
        Txn-->>App: Committed
        App-->>Worker: (flushedRows, nil)
    else BatchInTxn disabled
        App->>App: doInsertOnce/doUpdateOnce/doDeleteOnce()
        App->>DB: Execute SQL
        DB-->>App: Result + row count
        App-->>Worker: (flushedRows, nil)
    end
    
    alt On Error
        App->>Txn: rollbackTransaction(conn)
        Txn->>DB: ROLLBACK
        DB-->>Txn: Success
        Txn-->>App: Rolled back
        App-->>Worker: (0, error)
    end
    
    Worker->>Worker: Update Stats.FlushedRowCount if flushedRows > 0

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Suggested reviewers

3AceShowHand
asddongmen
flowbehappy

Poem

🐰 A workload hops with JSON in tow,
Zstd compression makes it go,
Transactions wrap each batch so neat,
Insert, update, delete—complete!
From schema to stats, the flow is true,
New payloads test what databases do! 📊

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is entirely filled with template placeholders and contains no concrete information about changes, implementation details, test coverage, or release notes.	Fill in all required sections: specify the linked issue, describe what changed and how it works, indicate which tests are included, answer the compatibility and documentation questions, and provide a proper release note.
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'workload: add json_zstd workload' accurately describes the main change—introducing a new JSON Zstd workload type to the workload system.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ldz/fix-puller0222

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

tools/workload/schema/forwardindex/staging_forward_index.go (3)

190-194: Same putRand inconsistency in BuildInsertSqlWithValues.

Apply defer w.putRand(r) here too (line 194 currently calls it eagerly), matching the defer pattern used in BuildDeleteSql.

♻️ Proposed fix

 func (w *StagingForwardIndexWorkload) BuildInsertSqlWithValues(tableN int, batchSize int) (string, []interface{}) {
 	tableName := getTableName(tableN)
 	r := w.getRand()
-	debugInfo, debugInfoHistory, adDoc := w.newPayload(r)
-	w.putRand(r)
+	defer w.putRand(r)
+	debugInfo, debugInfoHistory, adDoc := w.newPayload(r)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/workload/schema/forwardindex/staging_forward_index.go` around lines 190
- 194, The call to w.putRand(r) in BuildInsertSqlWithValues is done eagerly;
change it to defer to match the pattern used in BuildDeleteSql so the random
source is returned even if the function panics or returns early—specifically,
after obtaining r from w.getRand() in BuildInsertSqlWithValues, replace the
immediate w.putRand(r) call with defer w.putRand(r) so the resource is always
released; reference the BuildInsertSqlWithValues function and the
putRand/getRand helpers when making this change.

222-249: Inconsistent putRand pattern — use defer here as well.

BuildDeleteSql (line 254) uses defer w.putRand(r), but BuildUpdateSqlWithValues calls w.putRand(r) manually at line 244. If a panic occurs between getRand and putRand, the *rand.Rand leaks from the pool. For consistency and safety, prefer defer.

♻️ Proposed fix

 func (w *StagingForwardIndexWorkload) BuildUpdateSqlWithValues(opt schema.UpdateOption) (string, []interface{}) {
 	tableName := getTableName(opt.TableIndex)
 
 	r := w.getRand()
+	defer w.putRand(r)
 	debugInfo, debugInfoHistory, adDoc := w.newPayload(r)
 
 	var sb strings.Builder
@@ ...
 		values = append(values, debugInfo, debugInfoHistory, id, adDoc)
 	}
-	w.putRand(r)
 
 	sb.WriteString(strings.Join(placeholders, ","))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/workload/schema/forwardindex/staging_forward_index.go` around lines 222
- 249, In BuildUpdateSqlWithValues, after acquiring r with getRand() defer
return to the pool to avoid leaking the *rand.Rand on panic: replace the
existing manual w.putRand(r) call with an immediate defer w.putRand(r) right
after r := w.getRand(), ensuring the rest of the function (including calls to
w.newPayload(r) and use of r) remains unchanged.

190-216: All rows in a batch share the same payload byte slices.

newPayload is called once (line 193) and the same debugInfo, debugInfoHistory, and adDoc slices are appended for every row in the batch (line 210). If the SQL driver reads values lazily or if util.RandomBytes is later changed to mutate in-place for reuse, all rows would silently share or corrupt data. For a workload generator this is likely acceptable (reduces allocations), but worth a brief comment so future maintainers know it's intentional.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/workload/schema/forwardindex/staging_forward_index.go` around lines 190
- 216, BuildInsertSqlWithValues currently calls newPayload once and reuses the
same debugInfo, debugInfoHistory, and adDoc byte slices for every row in the
batch, which can cause silent corruption if drivers read values lazily or if
util.RandomBytes mutates buffers; either add a clear comment next to the
newPayload call in BuildInsertSqlWithValues explaining that reuse is intentional
and safe for this workload (mentioning debugInfo, debugInfoHistory, adDoc and
newPayload), or make the loop create per-row copies (e.g., id stays generated
per row, and use append([]byte(nil), debugInfo...), append([]byte(nil),
debugInfoHistory...), append([]byte(nil), adDoc...) before appending to values)
to avoid shared-mutable-slice issues.

tools/workload/schema/forwardindex/staging_forward_index_test.go (1)

9-89: Tests use t.Fatalf instead of testify/require.

Per coding guidelines, test files should use testify/require for assertions. This also improves readability by reducing boilerplate.

♻️ Example refactor for the first test

+import (
+	"encoding/binary"
+	"strings"
+	"testing"
+
+	"github.com/stretchr/testify/require"
+)
+
 func TestStagingForwardIndexWorkloadRowSizing512K(t *testing.T) {
 	w := NewStagingForwardIndexWorkload(512*1024, 1000).(*StagingForwardIndexWorkload)
-
-	if w.debugInfoSize != maxBlobSize {
-		t.Fatalf("debugInfoSize mismatch: got %d, want %d", w.debugInfoSize, maxBlobSize)
-	}
-	if w.debugInfoHistorySize != maxBlobSize {
-		t.Fatalf("debugInfoHistorySize mismatch: got %d, want %d", w.debugInfoHistorySize, maxBlobSize)
-	}
-
-	wantAdDocSize := 512*1024 - 2*maxBlobSize
-	if w.adDocSize != wantAdDocSize {
-		t.Fatalf("adDocSize mismatch: got %d, want %d", w.adDocSize, wantAdDocSize)
-	}
+	require.Equal(t, maxBlobSize, w.debugInfoSize, "debugInfoSize")
+	require.Equal(t, maxBlobSize, w.debugInfoHistorySize, "debugInfoHistorySize")
+	require.Equal(t, 512*1024-2*maxBlobSize, w.adDocSize, "adDocSize")
 }

Apply the same pattern to the other two tests.

As per coding guidelines, "Use unit test files named *_test.go in Go; favor deterministic tests and use testify/require".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/workload/schema/forwardindex/staging_forward_index_test.go` around
lines 9 - 89, Replace t.Fatalf assertions with testify/require calls in
TestStagingForwardIndexWorkloadRowSizing512K,
TestStagingForwardIndexWorkloadRowSizingClamped, and
TestStagingForwardIndexWorkloadBuildInsertSqlWithValues: add import
"github.com/stretchr/testify/require" and use
require.Equal/require.Len/require.Contains/require.IsType/require.True (and
require.Equal for the binary seq check) instead of t.Fatalf, converting each
failure case to the appropriate require assertion for clarity and brevity while
preserving the same checked values (e.g., compare w.debugInfoSize to
maxBlobSize, w.adDocSize to expected sizes, len(values) to expected count, type
and length checks for []byte fields, and the BigEndian sequence check).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/workload/app.go`:
- Around line 139-140: The code passes app.Config.TotalRowCount into
NewStagingForwardIndexWorkload as the updateKeySpace, creating a semantic
mismatch; add a dedicated config field (e.g., UpdateKeySpace uint64) to the
config struct and use that when constructing the workload (replace
app.Config.TotalRowCount with app.Config.UpdateKeySpace in the call to
NewStagingForwardIndexWorkload), or if you cannot add a field now, document this
intentional reuse clearly in the config declaration and the call site (comment
next to NewStagingForwardIndexWorkload) to avoid future accidental changes;
update any config initialization/parsing to populate UpdateKeySpace accordingly
and adjust tests/usage that assumed TotalRowCount semantics.

In `@tools/workload/schema/forwardindex/staging_forward_index_test.go`:
- Around line 1-7: Add the standard Apache 2.0 license header to the top of
staging_forward_index_test.go (above the "package forwardindex" declaration)
matching the header used in staging_forward_index.go (lines 1–12) so the CI
check-copyright step passes; keep the existing imports (encoding/binary,
strings, testing) and code unchanged.

In `@tools/workload/schema/forwardindex/staging_forward_index.go`:
- Around line 269-274: The repository defines a custom helper function named
min(a, b int) int that shadows Go's built-in min (Go 1.21+); remove this custom
min implementation and update its call sites (the two usages currently calling
min at the forward index batching logic) to use the built-in min instead, then
delete the min function declaration from staging_forward_index.go so the code
uses the standard library implementation.

---

Nitpick comments:
In `@tools/workload/schema/forwardindex/staging_forward_index_test.go`:
- Around line 9-89: Replace t.Fatalf assertions with testify/require calls in
TestStagingForwardIndexWorkloadRowSizing512K,
TestStagingForwardIndexWorkloadRowSizingClamped, and
TestStagingForwardIndexWorkloadBuildInsertSqlWithValues: add import
"github.com/stretchr/testify/require" and use
require.Equal/require.Len/require.Contains/require.IsType/require.True (and
require.Equal for the binary seq check) instead of t.Fatalf, converting each
failure case to the appropriate require assertion for clarity and brevity while
preserving the same checked values (e.g., compare w.debugInfoSize to
maxBlobSize, w.adDocSize to expected sizes, len(values) to expected count, type
and length checks for []byte fields, and the BigEndian sequence check).

In `@tools/workload/schema/forwardindex/staging_forward_index.go`:
- Around line 190-194: The call to w.putRand(r) in BuildInsertSqlWithValues is
done eagerly; change it to defer to match the pattern used in BuildDeleteSql so
the random source is returned even if the function panics or returns
early—specifically, after obtaining r from w.getRand() in
BuildInsertSqlWithValues, replace the immediate w.putRand(r) call with defer
w.putRand(r) so the resource is always released; reference the
BuildInsertSqlWithValues function and the putRand/getRand helpers when making
this change.
- Around line 222-249: In BuildUpdateSqlWithValues, after acquiring r with
getRand() defer return to the pool to avoid leaking the *rand.Rand on panic:
replace the existing manual w.putRand(r) call with an immediate defer
w.putRand(r) right after r := w.getRand(), ensuring the rest of the function
(including calls to w.newPayload(r) and use of r) remains unchanged.
- Around line 190-216: BuildInsertSqlWithValues currently calls newPayload once
and reuses the same debugInfo, debugInfoHistory, and adDoc byte slices for every
row in the batch, which can cause silent corruption if drivers read values
lazily or if util.RandomBytes mutates buffers; either add a clear comment next
to the newPayload call in BuildInsertSqlWithValues explaining that reuse is
intentional and safe for this workload (mentioning debugInfo, debugInfoHistory,
adDoc and newPayload), or make the loop create per-row copies (e.g., id stays
generated per row, and use append([]byte(nil), debugInfo...),
append([]byte(nil), debugInfoHistory...), append([]byte(nil), adDoc...) before
appending to values) to avoid shared-mutable-slice issues.

tools/workload/app.go

coderabbitai · 2026-02-22T12:33:22Z

tools/workload/schema/forwardindex/staging_forward_index_test.go

+package forwardindex
+
+import (
+	"encoding/binary"
+	"strings"
+	"testing"
+)


⚠️ Potential issue | 🟠 Major

Missing copyright header — pipeline failure.

The CI check-copyright step is failing because this file lacks the standard Apache 2.0 license header. All other Go files in this PR include it (e.g., staging_forward_index.go lines 1–12).

🔧 Proposed fix — add copyright header

+// Copyright 2026 PingCAP, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// See the License for the specific language governing permissions and +// limitations under the License. + package forwardindex

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

package forwardindex

import (

"encoding/binary"

"strings"

"testing"

)

// Copyright 2026 PingCAP, Inc.

//

// Licensed under the Apache License, Version 2.0 (the "License");

// you may not use this file except in compliance with the License.

// You may obtain a copy of the License at

//

// http://www.apache.org/licenses/LICENSE-2.0

//

// Unless required by applicable law or agreed to in writing, software

// distributed under the License is distributed on an "AS IS" BASIS,

// See the License for the specific language governing permissions and

// limitations under the License.

package forwardindex

import (

"encoding/binary"

"strings"

"testing"

)

🧰 Tools

🪛 GitHub Actions: PR Build and Unit Test

[error] 1-1: check-copyright: The copyright information of following files is incorrect.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/workload/schema/forwardindex/staging_forward_index_test.go` around lines 1 - 7, Add the standard Apache 2.0 license header to the top of staging_forward_index_test.go (above the "package forwardindex" declaration) matching the header used in staging_forward_index.go (lines 1–12) so the CI check-copyright step passes; keep the existing imports (encoding/binary, strings, testing) and code unchanged.

tools/workload/schema/forwardindex/staging_forward_index.go

gemini-code-assist

Code Review

This pull request introduces a new staging_forward_index workload, which is a valuable addition for performance testing of large-row upserts. The implementation is generally well-structured and integrates cleanly with the existing workload framework. I've identified a couple of areas for improvement: a bug in the generation of DELETE statements that could lead to incorrect behavior, and a misleading implementation of an interface method that should be corrected for clarity and future maintainability. Addressing these points will enhance the robustness of the new workload.

gemini-code-assist · 2026-02-22T12:34:02Z

tools/workload/schema/forwardindex/staging_forward_index.go

+func (w *StagingForwardIndexWorkload) BuildDeleteSql(opts schema.DeleteOption) string {
+	tableName := getTableName(opts.TableIndex)
+	r := w.getRand()
+	defer w.putRand(r)
+
+	var sb strings.Builder
+	for i := 0; i < opts.Batch; i++ {
+		if i > 0 {
+			sb.WriteString(";")
+		}
+		seq := (uint64(r.Int63()) % w.updateKeySpace) + 1
+		id := make([]byte, pinPromotionIDSize)
+		w.fillPromotionID(id, seq)
+		sb.WriteString(fmt.Sprintf("DELETE FROM %s WHERE g_pin_promotion_id = 0x%x", tableName, id))
+	}
+	return sb.String()
+}


The current implementation of BuildDeleteSql generates multiple DELETE statements separated by semicolons. This will likely not work as expected, because database drivers typically only execute the first statement in a multi-statement string unless a special mode is enabled. This could lead to only one row being deleted per batch instead of the intended number.

To fix this and improve efficiency, you should generate a single DELETE statement with an IN clause.

func (w *StagingForwardIndexWorkload) BuildDeleteSql(opts schema.DeleteOption) string { tableName := getTableName(opts.TableIndex) r := w.getRand() defer w.putRand(r) if opts.Batch == 0 { return "" } var sb strings.Builder sb.WriteString(fmt.Sprintf("DELETE FROM %s WHERE g_pin_promotion_id IN (", tableName)) for i := 0; i < opts.Batch; i++ { if i > 0 { sb.WriteString(",") } seq := (uint64(r.Int63()) % w.updateKeySpace) + 1 id := make([]byte, pinPromotionIDSize) w.fillPromotionID(id, seq) sb.WriteString(fmt.Sprintf("0x%x", id)) } sb.WriteString(")") return sb.String() }

gemini-code-assist · 2026-02-22T12:34:02Z

tools/workload/schema/forwardindex/staging_forward_index.go

+func (w *StagingForwardIndexWorkload) BuildUpdateSql(opt schema.UpdateOption) string {
+	return w.BuildInsertSql(opt.TableIndex, opt.Batch)
+}


The BuildUpdateSql method is implemented to call BuildInsertSql, which generates new random UUIDs for the primary key. This means it will always perform inserts rather than updates, which is misleading for a method named BuildUpdateSql. While this code path may not be used for the staging_forward_index workload (which uses BuildUpdateSqlWithValues), it's part of the Workload interface and should be implemented correctly or explicitly marked as unsupported. To avoid confusion and potential bugs if this method is called in the future, I suggest making it panic.

Suggested change

func (w *StagingForwardIndexWorkload) BuildUpdateSql(opt schema.UpdateOption) string {

return w.BuildInsertSql(opt.TableIndex, opt.Batch)

}

func (w *StagingForwardIndexWorkload) BuildUpdateSql(opt schema.UpdateOption) string {

panic("BuildUpdateSql is not supported for staging_forward_index. Use BuildUpdateSqlWithValues.")

}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

tools/workload/schema/bis/bis_metadata.go (3)
499-508: newConstBytes can be replaced with bytes.Repeat.

Minor simplification — bytes.Repeat([]byte{value}, size) achieves the same result in one line using the standard library.
♻️ Proposed simplification
+import "bytes"
+
-func newConstBytes(size int, value byte) []byte {
-	if size <= 0 {
-		return nil
-	}
-	buf := make([]byte, size)
-	for i := range buf {
-		buf[i] = value
-	}
-	return buf
-}
+func newConstBytes(size int, value byte) []byte {
+	if size <= 0 {
+		return nil
+	}
+	return bytes.Repeat([]byte{value}, size)
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/workload/schema/bis/bis_metadata.go` around lines 499 - 508, The helper
newConstBytes duplicates functionality in the standard library; replace its
implementation and all callers to use bytes.Repeat (from the bytes package) to
create a slice filled with a single byte, remove the custom newConstBytes
function, and add the bytes import where needed; ensure any callers expecting
nil for non-positive sizes handle that case by using a conditional or by passing
zero to bytes.Repeat which returns an empty slice.
459-487: BuildDeleteSql uses string interpolation instead of parameterized queries — inconsistent with insert/update builders.

All other SQL-with-values builders (BuildInsertSqlWithValues, BuildUpdateSqlWithValues) use ? placeholders and return value slices. BuildDeleteSql inlines values via fmt.Sprintf with '%s' and %d. While the values are internally generated (UUIDs, formatted strings, ints), this is still a fragile pattern and inconsistent with the rest of the file. Consider adding a BuildDeleteSqlWithValues variant (if the Workload interface allows it) or at least using parameterized queries here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/workload/schema/bis/bis_metadata.go` around lines 459 - 487,
BuildDeleteSql currently inlines values using fmt.Sprintf while other builders
(BuildInsertSqlWithValues, BuildUpdateSqlWithValues) use parameterized SQL and
return value slices; change BuildDeleteSql to produce parameterized SQL and a
values slice (or add BuildDeleteSqlWithValues if interface permits). Concretely,
replace fmt.Sprintf calls that build "DELETE FROM %s WHERE (`id` = '%s')" and
the 3-column variant with statements using "?" placeholders (e.g. "DELETE FROM
%s WHERE (`id` = ?)" and "DELETE FROM %s WHERE (`entity_id` = ?) AND
(`entity_set_id` = ?) AND (`user_id` = ?)"), accumulate corresponding args (id
or entityID, entitySetID, userID) in a []interface{} and return both the SQL
string and the args slice (or add a new method BuildDeleteSqlWithValues that
mirrors BuildInsertSqlWithValues/BuildUpdateSqlWithValues); refer to
functions/vars randSeq, newID, userID, entitySetID, entityID, tableName,
deleteByID to locate where to collect values.
510-535: Replace custom min/max/minUint64/maxUint64 helpers with Go 1.21+ builtins.

This project targets Go 1.25.5, which includes generic built-in min and max functions. These custom helpers are redundant and shadow the builtins. Replace all usages with the built-in versions and remove these function definitions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/workload/schema/bis/bis_metadata.go` around lines 510 - 535, The custom
helper functions min, max, minUint64, and maxUint64 are redundant because Go
1.21+ provides generic built-in min and max; remove these four function
definitions and replace any calls to min/minUint64 and max/maxUint64 with the
corresponding built-in min and max so the code uses the standard generic
builtins (e.g., replace min(a,b) -> min(a,b) using the builtin) and run `go
vet`/`go test` to ensure no shadowing or import issues remain.
tools/workload/schema/bis/bis_metadata_test.go (1)
12-38: Prefer testify/require over raw t.Fatalf for assertions.

All test functions in this file use t.Fatalf for assertions. Per coding guidelines, tests should use testify/require for clearer, more idiomatic assertions. For example:
require.Equal(t, maxEntityMediaMetadataSize, w.entityMediaSize, "entityMediaSize mismatch")
This applies throughout the file. As per coding guidelines, "Use unit test files named *_test.go in Go; favor deterministic tests and use testify/require".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/workload/schema/bis/bis_metadata_test.go` around lines 12 - 38, Replace
all t.Fatalf assertions in TestBISMetadataWorkloadRowSizingAndClamping with
testify/require assertions: import "github.com/stretchr/testify/require" and use
require.Equal(t, expected, actual, "message") (or require.Equalf) for
comparisons of w and w2 fields (entityMediaSize, batchAuxSize, batchCjSize,
batchMetaSize, perTableUpdateKeySpace) and the clamping checks against
maxEntityMediaMetadataSize and maxBatchMetadataSize; keep the same messages but
use require so tests are idiomatic and the expected/actual argument order
matches require's signature.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/workload/schema/bis/bis_metadata_test.go`:
- Around line 1-10: The file bis_metadata_test.go is missing the required Apache
2.0 copyright header causing CI to fail; add the same multi-line copyright
header used in bis_metadata.go to the very top of bis_metadata_test.go (above
the "package bis" line) so the header format, year, and owner match the
project's standard.

In `@tools/workload/schema/bis/bis_metadata.go`:
- Around line 406-422: The bug comes from calling r.Float64() multiple times in
the switch, which skews the intended probabilities; change the switch to sample
p := r.Float64() once (e.g., at the top of the branch where the switch currently
is) and then use cumulative comparisons like "case p <
entityUpdateByPinIDWeight:", "case p <
entityUpdateByPinIDWeight+entityUpdateByCompositeKeyWeight:" to decide between
the pinID, composite-key, and default update branches (matching the approach
used in buildBatchUpdateWithValues); keep the existing SQL strings and returned
parameter lists (pinID, userID/entityID/entitySetID, id, etc.) unchanged.

---

Nitpick comments:
In `@tools/workload/schema/bis/bis_metadata_test.go`:
- Around line 12-38: Replace all t.Fatalf assertions in
TestBISMetadataWorkloadRowSizingAndClamping with testify/require assertions:
import "github.com/stretchr/testify/require" and use require.Equal(t, expected,
actual, "message") (or require.Equalf) for comparisons of w and w2 fields
(entityMediaSize, batchAuxSize, batchCjSize, batchMetaSize,
perTableUpdateKeySpace) and the clamping checks against
maxEntityMediaMetadataSize and maxBatchMetadataSize; keep the same messages but
use require so tests are idiomatic and the expected/actual argument order
matches require's signature.

In `@tools/workload/schema/bis/bis_metadata.go`:
- Around line 499-508: The helper newConstBytes duplicates functionality in the
standard library; replace its implementation and all callers to use bytes.Repeat
(from the bytes package) to create a slice filled with a single byte, remove the
custom newConstBytes function, and add the bytes import where needed; ensure any
callers expecting nil for non-positive sizes handle that case by using a
conditional or by passing zero to bytes.Repeat which returns an empty slice.
- Around line 459-487: BuildDeleteSql currently inlines values using fmt.Sprintf
while other builders (BuildInsertSqlWithValues, BuildUpdateSqlWithValues) use
parameterized SQL and return value slices; change BuildDeleteSql to produce
parameterized SQL and a values slice (or add BuildDeleteSqlWithValues if
interface permits). Concretely, replace fmt.Sprintf calls that build "DELETE
FROM %s WHERE (`id` = '%s')" and the 3-column variant with statements using "?"
placeholders (e.g. "DELETE FROM %s WHERE (`id` = ?)" and "DELETE FROM %s WHERE
(`entity_id` = ?) AND (`entity_set_id` = ?) AND (`user_id` = ?)"), accumulate
corresponding args (id or entityID, entitySetID, userID) in a []interface{} and
return both the SQL string and the args slice (or add a new method
BuildDeleteSqlWithValues that mirrors
BuildInsertSqlWithValues/BuildUpdateSqlWithValues); refer to functions/vars
randSeq, newID, userID, entitySetID, entityID, tableName, deleteByID to locate
where to collect values.
- Around line 510-535: The custom helper functions min, max, minUint64, and
maxUint64 are redundant because Go 1.21+ provides generic built-in min and max;
remove these four function definitions and replace any calls to min/minUint64
and max/maxUint64 with the corresponding built-in min and max so the code uses
the standard generic builtins (e.g., replace min(a,b) -> min(a,b) using the
builtin) and run `go vet`/`go test` to ensure no shadowing or import issues
remain.

coderabbitai · 2026-02-23T02:16:53Z

tools/workload/schema/bis/bis_metadata_test.go

+package bis
+
+import (
+	"math/rand"
+	"strings"
+	"sync"
+	"testing"
+
+	"workload/schema"
+)


⚠️ Potential issue | 🔴 Critical

Fix missing copyright header — CI is failing.

The pipeline reports: "Copyright information is incorrect in this file as part of 'check-copyright' step." Add the standard Apache 2.0 copyright header at the top, matching the format in bis_metadata.go.

🐛 Proposed fix

+// Copyright 2026 PingCAP, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// See the License for the specific language governing permissions and +// limitations under the License. + package bis

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

package bis

import (

"math/rand"

"strings"

"sync"

"testing"

"workload/schema"

)

// Copyright 2026 PingCAP, Inc.

//

// Licensed under the Apache License, Version 2.0 (the "License");

// you may not use this file except in compliance with the License.

// You may obtain a copy of the License at

//

// http://www.apache.org/licenses/LICENSE-2.0

//

// Unless required by applicable law or agreed to in writing, software

// distributed under the License is distributed on an "AS IS" BASIS,

// See the License for the specific language governing permissions and

// limitations under the License.

package bis

import (

"math/rand"

"strings"

"sync"

"testing"

"workload/schema"

)

🧰 Tools

🪛 GitHub Actions: PR Build and Unit Test

[error] 1-1: Copyright information is incorrect in this file as part of 'check-copyright' step. Please fix copyright headers.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/workload/schema/bis/bis_metadata_test.go` around lines 1 - 10, The file bis_metadata_test.go is missing the required Apache 2.0 copyright header causing CI to fail; add the same multi-line copyright header used in bis_metadata.go to the very top of bis_metadata_test.go (above the "package bis" line) so the header format, year, and owner match the project's standard.

coderabbitai · 2026-02-23T02:16:53Z

tools/workload/schema/jsonzstd/json_zstd.go

+	switch {
+	case r.Float64() < entityUpdateByPinIDWeight:
+		pinID := w.pinID(tableIndex, seq)
+		sql := fmt.Sprintf("UPDATE %s SET `delete_after` = ?, `updated_at` = ?, `pin_id` = ? WHERE (`pin_id` = ?) AND `delete_after` IS NULL", tableName)
+		return sql, []interface{}{nil, now, pinID, pinID}
+	case r.Float64() < entityUpdateByPinIDWeight+entityUpdateByCompositeKeyWeight:
+		userID := w.userID(tableIndex, seq)
+		entitySetID := w.entitySetID(seq)
+		entityID := w.entityID(seq)
+		sql := fmt.Sprintf("UPDATE %s SET `updated_at` = ?, `user_id` = ?, `content_hash` = ?, `entity_id` = ?, `entity_set_id` = ? WHERE (`entity_id` = ?) AND (`entity_set_id` = ?) AND (`user_id` = ?) AND `delete_after` IS NULL", tableName)
+		contentHash := w.contentHash(tableIndex, uint64(now.UnixNano()))
+		return sql, []interface{}{now, userID, contentHash, entityID, entitySetID, entityID, entitySetID, userID}
+	default:
+		id := w.newID('e', tableIndex, seq)
+		sql := fmt.Sprintf("UPDATE %s SET `id` = ?, `delete_after` = ?, `updated_at` = ? WHERE (`id` = ?) AND `delete_after` IS NULL", tableName)
+		return sql, []interface{}{id, nil, now, id}
+	}


⚠️ Potential issue | 🟠 Major

Bug: probability distribution for entity update types is incorrect due to multiple r.Float64() draws.

Lines 407 and 411 each call r.Float64() independently, meaning the second case evaluates a new random value against a cumulative threshold. This skews the distribution away from what the named weights suggest (0.66 / 0.25 / 0.09):

P(pinID) = 0.66

P(composite) ≈ (1−0.66) × 0.91 ≈ 0.31

P(default) ≈ (1−0.66) × 0.09 ≈ 0.03

Contrast with buildBatchUpdateWithValues (line 436) which correctly draws p := r.Float64() once and uses cumulative thresholds. Apply the same pattern here.

🐛 Proposed fix

func (w *BISMetadataWorkload) buildEntityUpdateWithValues(tableIndex int, r *rand.Rand) (string, []interface{}) { tableName := getEntityTableName(tableIndex) now := time.Now() slot := w.slot(tableIndex) upper := minUint64(w.perTableUpdateKeySpace, w.entitySeq[slot].Load()) seq := randSeq(r, upper) + p := r.Float64() switch { - case r.Float64() < entityUpdateByPinIDWeight: + case p < entityUpdateByPinIDWeight: pinID := w.pinID(tableIndex, seq) sql := fmt.Sprintf("UPDATE %s SET `delete_after` = ?, `updated_at` = ?, `pin_id` = ? WHERE (`pin_id` = ?) AND `delete_after` IS NULL", tableName) return sql, []interface{}{nil, now, pinID, pinID} - case r.Float64() < entityUpdateByPinIDWeight+entityUpdateByCompositeKeyWeight: + case p < entityUpdateByPinIDWeight+entityUpdateByCompositeKeyWeight: userID := w.userID(tableIndex, seq)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

switch {

case r.Float64() < entityUpdateByPinIDWeight:

pinID := w.pinID(tableIndex, seq)

sql := fmt.Sprintf("UPDATE %s SET `delete_after` = ?, `updated_at` = ?, `pin_id` = ? WHERE (`pin_id` = ?) AND `delete_after` IS NULL", tableName)

return sql, []interface{}{nil, now, pinID, pinID}

case r.Float64() < entityUpdateByPinIDWeight+entityUpdateByCompositeKeyWeight:

userID := w.userID(tableIndex, seq)

entitySetID := w.entitySetID(seq)

entityID := w.entityID(seq)

sql := fmt.Sprintf("UPDATE %s SET `updated_at` = ?, `user_id` = ?, `content_hash` = ?, `entity_id` = ?, `entity_set_id` = ? WHERE (`entity_id` = ?) AND (`entity_set_id` = ?) AND (`user_id` = ?) AND `delete_after` IS NULL", tableName)

contentHash := w.contentHash(tableIndex, uint64(now.UnixNano()))

return sql, []interface{}{now, userID, contentHash, entityID, entitySetID, entityID, entitySetID, userID}

default:

id := w.newID('e', tableIndex, seq)

sql := fmt.Sprintf("UPDATE %s SET `id` = ?, `delete_after` = ?, `updated_at` = ? WHERE (`id` = ?) AND `delete_after` IS NULL", tableName)

return sql, []interface{}{id, nil, now, id}

}

p := r.Float64()

switch {

case p < entityUpdateByPinIDWeight:

pinID := w.pinID(tableIndex, seq)

sql := fmt.Sprintf("UPDATE %s SET `delete_after` = ?, `updated_at` = ?, `pin_id` = ? WHERE (`pin_id` = ?) AND `delete_after` IS NULL", tableName)

return sql, []interface{}{nil, now, pinID, pinID}

case p < entityUpdateByPinIDWeight+entityUpdateByCompositeKeyWeight:

userID := w.userID(tableIndex, seq)

entitySetID := w.entitySetID(seq)

entityID := w.entityID(seq)

sql := fmt.Sprintf("UPDATE %s SET `updated_at` = ?, `user_id` = ?, `content_hash` = ?, `entity_id` = ?, `entity_set_id` = ? WHERE (`entity_id` = ?) AND (`entity_set_id` = ?) AND (`user_id` = ?) AND `delete_after` IS NULL", tableName)

contentHash := w.contentHash(tableIndex, uint64(now.UnixNano()))

return sql, []interface{}{now, userID, contentHash, entityID, entitySetID, entityID, entitySetID, userID}

default:

id := w.newID('e', tableIndex, seq)

sql := fmt.Sprintf("UPDATE %s SET `id` = ?, `delete_after` = ?, `updated_at` = ? WHERE (`id` = ?) AND `delete_after` IS NULL", tableName)

return sql, []interface{}{id, nil, now, id}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/workload/schema/bis/bis_metadata.go` around lines 406 - 422, The bug comes from calling r.Float64() multiple times in the switch, which skews the intended probabilities; change the switch to sample p := r.Float64() once (e.g., at the top of the branch where the switch currently is) and then use cumulative comparisons like "case p < entityUpdateByPinIDWeight:", "case p < entityUpdateByPinIDWeight+entityUpdateByCompositeKeyWeight:" to decide between the pinID, composite-key, and default update branches (matching the approach used in buildBatchUpdateWithValues); keep the existing SQL strings and returned parameter lists (pinID, userID/entityID/entitySetID, id, etc.) unchanged.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tools/workload/delete.go (1)

80-113: ⚠️ Potential issue | 🟡 Minor

fmt.Println with zap.Error(err) prints struct literal, not the error message.

Lines 87 and 96 pass zap.Error(err) to fmt.Println. Since zap.Error() returns a zap.Field struct, this will print something like {error error <nil>} rather than the actual error. Use plog with zap fields consistently, or use fmt.Println(..., err).

Proposed fix

-					fmt.Println("connection error detected, reconnecting", zap.Error(err))
+					plog.Info("connection error detected, reconnecting", zap.Error(err))

-						fmt.Println("reconnection failed, wait 5 seconds and retry", zap.Error(err))
+						plog.Info("reconnection failed, wait 5 seconds and retry", zap.Error(err))

As per coding guidelines, "Use structured logs via github.com/pingcap/log with zap fields in Go".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/workload/delete.go` around lines 80 - 113, Replace the fmt.Println(...)
calls that currently pass zap.Error(err) (inside the loop around
runTransaction/doDeleteOnce and reconnection logic) with structured logging
using the project's logger (plog) and zap fields; specifically, where the code
does fmt.Println("connection error detected, reconnecting", zap.Error(err)) and
fmt.Println("reconnection failed, wait 5 seconds and retry", zap.Error(err)),
change them to plog.Error or plog.Info with the same descriptive message and
include zap.Error(err) as a field, and keep the existing reconnection/backoff
logic and calls to conn.Close(), context.WithTimeout, and app.isConnectionError
unchanged.

tools/workload/update.go (1)

84-114: ⚠️ Potential issue | 🟡 Minor

Same fmt.Println + zap.Error() issue as in delete.go.

Lines 91 and 100 pass zap.Error(err) to fmt.Println, which will print the raw zap.Field struct instead of the error message. Use plog consistently.

Proposed fix

-					fmt.Println("connection error detected, reconnecting", zap.Error(err))
+					plog.Info("connection error detected, reconnecting", zap.Error(err))

-						fmt.Println("reconnection failed, wait 5 seconds and retry", zap.Error(err))
+						plog.Info("reconnection failed, wait 5 seconds and retry", zap.Error(err))

As per coding guidelines, "Use structured logs via github.com/pingcap/log with zap fields in Go".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/workload/update.go` around lines 84 - 114, Replace the two fmt.Println
calls that pass zap.Error(err) (inside the reconnect loop around
app.runTransaction) with structured plog logging calls: use plog.Info or
plog.Warn (consistent with surrounding logs like plog.Info in this function) to
log the descriptive message and include zap.Error(err) as a zap.Field; update
the reconnection-failed message and the connection-error-detected message to use
plog and zap.Error(err) instead of fmt.Println so the error is logged as
structured output (these occur around the app.isConnectionError(err) branch and
the reconnection attempt after db.DB.Conn).

🧹 Nitpick comments (4)

tools/workload/txn.go (2)
47-67: Transaction failures logged at Info level — consider Warn.

beginTransaction, commitTransaction, and rollbackTransaction all log failures at Info level. Transaction failures typically indicate an unexpected condition and may be more appropriately logged at Warn to improve visibility during troubleshooting.
Proposed fix
 func (app *WorkloadApp) beginTransaction(conn *sql.Conn) error {
 	_, err := conn.ExecContext(context.Background(), "BEGIN")
 	if err != nil {
-		plog.Info("begin transaction failed", zap.Error(err))
+		plog.Warn("begin transaction failed", zap.Error(err))
 	}
 	return err
 }
 
 func (app *WorkloadApp) commitTransaction(conn *sql.Conn) error {
 	_, err := conn.ExecContext(context.Background(), "COMMIT")
 	if err != nil {
-		plog.Info("commit transaction failed", zap.Error(err))
+		plog.Warn("commit transaction failed", zap.Error(err))
 	}
 	return err
 }
 
 func (app *WorkloadApp) rollbackTransaction(conn *sql.Conn) {
 	if _, err := conn.ExecContext(context.Background(), "ROLLBACK"); err != nil {
-		plog.Info("rollback transaction failed", zap.Error(err))
+		plog.Warn("rollback transaction failed", zap.Error(err))
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/workload/txn.go` around lines 47 - 67, Change the logging level for
transaction errors from Info to Warn: in beginTransaction, commitTransaction,
and rollbackTransaction replace calls to plog.Info("... failed", zap.Error(err))
with plog.Warn("... failed", zap.Error(err)) so transaction
begin/commit/rollback failures are logged at Warn level (update the three
functions: beginTransaction, commitTransaction, rollbackTransaction).
24-45: Consider using conn.BeginTx() / tx.Commit() / tx.Rollback() instead of raw SQL.

The idiomatic Go approach is to use database/sql's Tx type, which provides automatic rollback safety on context cancellation and proper connection state tracking. The current raw-SQL approach works but bypasses these built-in safeguards.

That said, if the raw *sql.Conn is intentionally shared for statement caching across the worker loop, this may be a deliberate trade-off.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/workload/txn.go` around lines 24 - 45, The runTransaction function
currently uses raw SQL via helper methods beginTransaction, commitTransaction,
rollbackTransaction on a *sql.Conn; refactor to use database/sql's Tx by calling
conn.BeginTx(ctx, opts) to create a *sql.Tx, run doOne within the transaction
context, and call tx.Commit() on success or tx.Rollback() on any error
(including context cancellation) to get automatic rollback safety and proper
connection state tracking; replace
beginTransaction/commitTransaction/rollbackTransaction calls with tx lifecycle
management in runTransaction (preserve the existing flushedRows return behavior)
and ensure you pass a context and handle errors from BeginTx/Commit/Rollback
appropriately.
tools/workload/delete.go (1)
131-134: doDeleteOnce will block indefinitely if channel is closed or empty — no select/done-channel.

The blocking receive task := <-input on a never-closed channel means this goroutine can only be stopped by process exit. This is consistent with the existing infinite-loop pattern in genDeleteTask, but worth noting: if the channel were closed (e.g., for graceful shutdown), this would silently return a zero-value deleteTask and proceed to execute it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/workload/delete.go` around lines 131 - 134, doDeleteOnce currently
blocks on a direct receive from the input channel and will either hang forever
or silently process a zero-value deleteTask if the channel is closed; change
doDeleteOnce to detect closed/finished input by using the comma-ok receive
(e.g., task, ok := <-input) or a select with a done/ctx channel, and if the
channel is closed/ok==false return immediately (or return a sentinel error)
instead of calling processDeleteTask; update callers (e.g., genDeleteTask) to
propagate or handle the returned error/stop signal as needed.
tools/workload/update.go (1)
184-194: Unchecked type assertions on app.Workload could panic.

app.Workload.(*pforwardindex.StagingForwardIndexWorkload) and app.Workload.(*pbis.BISMetadataWorkload) will panic if the concrete type doesn't match. While the dispatch in executeUpdate is gated on app.Config.WorkloadType, a mismatch between the config and the actual workload instance would cause a runtime crash.

Consider using the two-value assertion form (w, ok := ...) with a descriptive error, consistent with defensive practice:
Proposed fix (example for one handler)
 func (app *WorkloadApp) executeStagingForwardIndexUpdate(conn *sql.Conn, task *updateTask) (sql.Result, error) {
-	updateSQL, values := app.Workload.(*pforwardindex.StagingForwardIndexWorkload).BuildUpdateSqlWithValues(task.UpdateOption)
+	w, ok := app.Workload.(*pforwardindex.StagingForwardIndexWorkload)
+	if !ok {
+		return nil, fmt.Errorf("workload type mismatch: expected StagingForwardIndexWorkload, got %T", app.Workload)
+	}
+	updateSQL, values := w.BuildUpdateSqlWithValues(task.UpdateOption)
 	task.generatedSQL = updateSQL
 	return app.executeWithValues(conn, updateSQL, task.UpdateOption.TableIndex, values)
 }
Note: The same pattern applies to executeBank2Update (line 179) and executeSysbenchUpdate (line 198), which also use unchecked assertions—but those are pre-existing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/workload/update.go` around lines 184 - 194, The functions
executeStagingForwardIndexUpdate and executeBISMetadataUpdate perform unchecked
type assertions on app.Workload which can panic if the concrete type differs;
change each to use the two-value assertion (e.g., w, ok :=
app.Workload.(*pforwardindex.StagingForwardIndexWorkload) and w, ok :=
app.Workload.(*pbis.BISMetadataWorkload)), check ok and return a descriptive
error (including the actual app.Config.WorkloadType and a message like
"unexpected workload type for
executeStagingForwardIndexUpdate"/"executeBISMetadataUpdate") before proceeding,
then use the asserted variable (w) to call BuildUpdateSqlWithValues and set
task.generatedSQL and call executeWithValues.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tools/workload/delete.go`:
- Around line 80-113: Replace the fmt.Println(...) calls that currently pass
zap.Error(err) (inside the loop around runTransaction/doDeleteOnce and
reconnection logic) with structured logging using the project's logger (plog)
and zap fields; specifically, where the code does fmt.Println("connection error
detected, reconnecting", zap.Error(err)) and fmt.Println("reconnection failed,
wait 5 seconds and retry", zap.Error(err)), change them to plog.Error or
plog.Info with the same descriptive message and include zap.Error(err) as a
field, and keep the existing reconnection/backoff logic and calls to
conn.Close(), context.WithTimeout, and app.isConnectionError unchanged.

In `@tools/workload/update.go`:
- Around line 84-114: Replace the two fmt.Println calls that pass zap.Error(err)
(inside the reconnect loop around app.runTransaction) with structured plog
logging calls: use plog.Info or plog.Warn (consistent with surrounding logs like
plog.Info in this function) to log the descriptive message and include
zap.Error(err) as a zap.Field; update the reconnection-failed message and the
connection-error-detected message to use plog and zap.Error(err) instead of
fmt.Println so the error is logged as structured output (these occur around the
app.isConnectionError(err) branch and the reconnection attempt after
db.DB.Conn).

---

Nitpick comments:
In `@tools/workload/delete.go`:
- Around line 131-134: doDeleteOnce currently blocks on a direct receive from
the input channel and will either hang forever or silently process a zero-value
deleteTask if the channel is closed; change doDeleteOnce to detect
closed/finished input by using the comma-ok receive (e.g., task, ok := <-input)
or a select with a done/ctx channel, and if the channel is closed/ok==false
return immediately (or return a sentinel error) instead of calling
processDeleteTask; update callers (e.g., genDeleteTask) to propagate or handle
the returned error/stop signal as needed.

In `@tools/workload/txn.go`:
- Around line 47-67: Change the logging level for transaction errors from Info
to Warn: in beginTransaction, commitTransaction, and rollbackTransaction replace
calls to plog.Info("... failed", zap.Error(err)) with plog.Warn("... failed",
zap.Error(err)) so transaction begin/commit/rollback failures are logged at Warn
level (update the three functions: beginTransaction, commitTransaction,
rollbackTransaction).
- Around line 24-45: The runTransaction function currently uses raw SQL via
helper methods beginTransaction, commitTransaction, rollbackTransaction on a
*sql.Conn; refactor to use database/sql's Tx by calling conn.BeginTx(ctx, opts)
to create a *sql.Tx, run doOne within the transaction context, and call
tx.Commit() on success or tx.Rollback() on any error (including context
cancellation) to get automatic rollback safety and proper connection state
tracking; replace beginTransaction/commitTransaction/rollbackTransaction calls
with tx lifecycle management in runTransaction (preserve the existing
flushedRows return behavior) and ensure you pass a context and handle errors
from BeginTx/Commit/Rollback appropriately.

In `@tools/workload/update.go`:
- Around line 184-194: The functions executeStagingForwardIndexUpdate and
executeBISMetadataUpdate perform unchecked type assertions on app.Workload which
can panic if the concrete type differs; change each to use the two-value
assertion (e.g., w, ok :=
app.Workload.(*pforwardindex.StagingForwardIndexWorkload) and w, ok :=
app.Workload.(*pbis.BISMetadataWorkload)), check ok and return a descriptive
error (including the actual app.Config.WorkloadType and a message like
"unexpected workload type for
executeStagingForwardIndexUpdate"/"executeBISMetadataUpdate") before proceeding,
then use the asserted variable (w) to call BuildUpdateSqlWithValues and set
task.generatedSQL and call executeWithValues.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1e42ac and b575ebb.

📒 Files selected for processing (8)

tools/workload/app.go
tools/workload/config.go
tools/workload/delete.go
tools/workload/readme.md
tools/workload/schema/bankupdate/readme.md
tools/workload/schema/sysbench/readme.md
tools/workload/txn.go
tools/workload/update.go

🚧 Files skipped from review as they are similar to previous changes (1)

tools/workload/app.go

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tools/workload/app.go (1)
275-275: ⚠️ Potential issue | 🟡 Minor

fmt.Println with zap.Field args produces unreadable output — use structured logger.

zap.Error(err) returns a zap.Field struct. Passing it to fmt.Println prints the raw struct representation (e.g. {error <FieldType> 0 <err>}) rather than the error message. Both call sites should use plog.Warn or plog.Info for consistency with the rest of the file.
🔧 Proposed fix
-				fmt.Println("connection error detected, reconnecting", zap.Error(err))
+				plog.Warn("connection error detected, reconnecting", zap.Error(err))
-					fmt.Println("reconnection failed, wait 5 seconds and retry", zap.Error(err))
+					plog.Warn("reconnection failed, wait 5 seconds and retry", zap.Error(err))
As per coding guidelines: "Use structured logs via github.com/pingcap/log with zap fields in Go".

Also applies to: 284-284
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/workload/app.go` at line 275, Replace the fmt.Println calls that pass
zap.Field (e.g. fmt.Println("connection error detected, reconnecting",
zap.Error(err))) with the structured logger plog (use plog.Warn or plog.Info
consistently) and pass zap.Error(err) as a field: e.g. plog.Warn("connection
error detected, reconnecting", zap.Error(err)). Update both occurrences
referenced (the fmt.Println call at the connection-reconnect site and the
similar call at the other call site) so they use plog.Warn/Info with
zap.Error(err) instead of fmt.Println.

♻️ Duplicate comments (2)

tools/workload/app.go (1)
141-142: TotalRowCount passed as updateKeySpace — semantic mismatch.

This concern was already raised in a previous review: app.Config.TotalRowCount (total rows to insert) is repurposed as updateKeySpace (key range for updates), overloading an existing config field with different semantics.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/workload/app.go` around lines 141 - 142, The stagingForwardIndex case
is passing app.Config.TotalRowCount into
pforwardindex.NewStagingForwardIndexWorkload as the updateKeySpace, which mixes
semantics; change the call to pass an explicit update key-space value instead of
TotalRowCount — either add a new config field (e.g., Config.UpdateKeySpace) and
pass that, or compute the intended updateKeySpace from existing config (not
TotalRowCount); update references in the stagingForwardIndex branch and the
NewStagingForwardIndexWorkload invocation to use the new/derived updateKeySpace
variable rather than app.Config.TotalRowCount.
tools/workload/schema/bis/bis_metadata.go (1)
576-601: Probability distribution bug in buildEntityUpdateWithValues — already flagged.

Lines 585 and 589 each draw an independent r.Float64(), skewing the intended 66/25/9 split. The fix (draw p := r.Float64() once and use cumulative thresholds, matching buildBatchUpdateWithValues) was already proposed in a prior review.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/workload/schema/bis/bis_metadata.go` around lines 576 - 601, In
buildEntityUpdateWithValues, the probability branches call r.Float64() twice
which skews the intended
entityUpdateByPinIDWeight/entityUpdateByCompositeKeyWeight distribution; fix by
sampling p := r.Float64() once at the top (e.g., after seq) and then use
cumulative threshold checks (if p < entityUpdateByPinIDWeight, else if p <
entityUpdateByPinIDWeight+entityUpdateByCompositeKeyWeight, else) so the three
branches follow the intended 66/25/9 split and remove the extra r.Float64()
calls.

🧹 Nitpick comments (1)

tools/workload/schema/bis/bis_metadata.go (1)
688-700: Custom min/max shadow Go 1.21+ built-ins — consider removing if targeting Go ≥ 1.21.

The module targets Go 1.25.5, which includes generic built-in min/max. The custom int-typed definitions here shadow them within this package scope. These functions are actively used in the file (e.g., min(rowSize, maxEntityMediaMetadataSize), max(1, opt.Batch)), so removal would require updating call sites. The minUint64/maxUint64 functions remain distinct and are necessary.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/workload/schema/bis/bis_metadata.go` around lines 688 - 700, The file
defines package-scoped int-typed helpers min and max which shadow Go 1.21+
built-in generic min/max; since the module targets Go 1.25.5 remove these custom
func min(a,b int) and func max(a,b int) and update all call sites (e.g.,
min(rowSize, maxEntityMediaMetadataSize), max(1, opt.Batch)) to use the built-in
generic min/max instead; keep the distinct minUint64/maxUint64 functions as-is.
Ensure no remaining references to the removed symbols remain.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/workload/readme.md`:
- Line 101: Update the README sentence describing bis_metadata to hyphenate the
compound modifier: change the phrase "more compression expensive (zstd)" to
"more compression-expensive (zstd)" in the line mentioning `bis_metadata` and
the `-bis-metadata-payload-mode zstd` option so the modifier correctly binds to
"payloads."

---

Outside diff comments:
In `@tools/workload/app.go`:
- Line 275: Replace the fmt.Println calls that pass zap.Field (e.g.
fmt.Println("connection error detected, reconnecting", zap.Error(err))) with the
structured logger plog (use plog.Warn or plog.Info consistently) and pass
zap.Error(err) as a field: e.g. plog.Warn("connection error detected,
reconnecting", zap.Error(err)). Update both occurrences referenced (the
fmt.Println call at the connection-reconnect site and the similar call at the
other call site) so they use plog.Warn/Info with zap.Error(err) instead of
fmt.Println.

---

Duplicate comments:
In `@tools/workload/app.go`:
- Around line 141-142: The stagingForwardIndex case is passing
app.Config.TotalRowCount into pforwardindex.NewStagingForwardIndexWorkload as
the updateKeySpace, which mixes semantics; change the call to pass an explicit
update key-space value instead of TotalRowCount — either add a new config field
(e.g., Config.UpdateKeySpace) and pass that, or compute the intended
updateKeySpace from existing config (not TotalRowCount); update references in
the stagingForwardIndex branch and the NewStagingForwardIndexWorkload invocation
to use the new/derived updateKeySpace variable rather than
app.Config.TotalRowCount.

In `@tools/workload/schema/bis/bis_metadata.go`:
- Around line 576-601: In buildEntityUpdateWithValues, the probability branches
call r.Float64() twice which skews the intended
entityUpdateByPinIDWeight/entityUpdateByCompositeKeyWeight distribution; fix by
sampling p := r.Float64() once at the top (e.g., after seq) and then use
cumulative threshold checks (if p < entityUpdateByPinIDWeight, else if p <
entityUpdateByPinIDWeight+entityUpdateByCompositeKeyWeight, else) so the three
branches follow the intended 66/25/9 split and remove the extra r.Float64()
calls.

---

Nitpick comments:
In `@tools/workload/schema/bis/bis_metadata.go`:
- Around line 688-700: The file defines package-scoped int-typed helpers min and
max which shadow Go 1.21+ built-in generic min/max; since the module targets Go
1.25.5 remove these custom func min(a,b int) and func max(a,b int) and update
all call sites (e.g., min(rowSize, maxEntityMediaMetadataSize), max(1,
opt.Batch)) to use the built-in generic min/max instead; keep the distinct
minUint64/maxUint64 functions as-is. Ensure no remaining references to the
removed symbols remain.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b575ebb and f0b9b4e.

📒 Files selected for processing (5)

tools/workload/app.go
tools/workload/config.go
tools/workload/readme.md
tools/workload/schema/bis/bis_metadata.go
tools/workload/schema/bis/bis_metadata_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

tools/workload/schema/bis/bis_metadata_test.go
tools/workload/config.go

coderabbitai · 2026-02-24T04:06:42Z

tools/workload/readme.md

 - Ensure the database is properly configured and has the necessary permissions.
 - Adjust the thread and batch-size parameters based on your needs.
+- Use `-batch-in-txn` to wrap each batch in a single explicit transaction (BEGIN/COMMIT).
+- For `bis_metadata`, use `-bis-metadata-payload-mode zstd` to generate more compression expensive (zstd) payloads, or `-bis-metadata-payload-mode random` for incompressible payloads.


⚠️ Potential issue | 🟡 Minor

Hyphenate "compression-expensive" (compound modifier before a noun).

-For `bis_metadata`, use `-bis-metadata-payload-mode zstd` to generate more compression expensive (zstd) payloads +For `bis_metadata`, use `-bis-metadata-payload-mode zstd` to generate more compression-expensive (zstd) payloads

🧰 Tools

🪛 LanguageTool

[grammar] ~101-~101: Use a hyphen to join words.
Context: ...-mode zstdto generate more compression expensive (zstd) payloads, or-bis-meta...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/workload/readme.md` at line 101, Update the README sentence describing bis_metadata to hyphenate the compound modifier: change the phrase "more compression expensive (zstd)" to "more compression-expensive (zstd)" in the line mentioning `bis_metadata` and the `-bis-metadata-payload-mode zstd` option so the modifier correctly binds to "payloads."

ti-chi-bot · 2026-02-25T08:54:41Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tools/workload/update.go (1)

84-100: ⚠️ Potential issue | 🟡 Minor

Replace fmt.Println with structured logging in update reconnect flow.

fmt.Println drops structured zap fields and makes this path inconsistent with project logging.

🪵 Proposed fix

-						fmt.Println("connection error detected, reconnecting", zap.Error(err))
+						plog.Warn("connection error detected reconnecting", zap.Error(err))
@@
-							fmt.Println("reconnection failed, wait 5 seconds and retry", zap.Error(err))
+							plog.Warn("reconnection failed wait 5 seconds and retry", zap.Error(err))

Based on learnings: Applies to **/*.go : Use structured logs via `github.com/pingcap/log` with `zap` fields in Go; log message strings should not include function names and should avoid hyphens (use spaces instead).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/workload/update.go` around lines 84 - 100, The reconnect branch uses
fmt.Println which loses structured zap fields; replace those fmt.Println calls
with structured logging via github.com/pingcap/log (e.g.,
log.L().Warn/Info/Error) and pass the error as a zap field (zap.Error(err)) and
any contextual keys (e.g., "phase":"reconnect") instead of embedding function
names or hyphens in the message; update the two occurrences in the reconnect
flow around app.runTransaction/app.doUpdateOnce/app.isConnectionError where
conn.Close(), context.WithTimeout and db.DB.Conn are used so that the first
message logs the detected connection error with zap.Error(err) and the second
logs the reconnection failure with zap.Error(err) and retry delay info.

tools/workload/app.go (1)

265-283: ⚠️ Potential issue | 🟡 Minor

Use structured logs instead of fmt.Println in reconnect path.

fmt.Println here won’t emit structured fields from zap.Error(err) and breaks log consistency.

🪵 Proposed fix

-						fmt.Println("connection error detected, reconnecting", zap.Error(err))
+						plog.Warn("connection error detected reconnecting", zap.Error(err))
 						conn.Close()
@@
-							fmt.Println("reconnection failed, wait 5 seconds and retry", zap.Error(err))
+							plog.Warn("reconnection failed wait 5 seconds and retry", zap.Error(err))
 							time.Sleep(time.Second * 5)
 							continue

As per coding guidelines: Use structured logs via `github.com/pingcap/log` with `zap` fields in Go; log message strings should not include function names and should avoid hyphens (use spaces instead).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/workload/app.go` around lines 265 - 283, Replace the fmt.Println calls
in the reconnect path with structured logs using github.com/pingcap/log (e.g.
log.L() or log.S()) and pass zap fields like zap.Error(err); specifically update
the block after app.isConnectionError(err) where conn.Close() and
db.DB.Conn(ctx) are used so the reconnect start/failure messages use
log.L().Error or log.L().Info with messages using spaces (no hyphens) and no
function names, and include the error as zap.Error(err) and any contextual
fields (e.g., "reconnecting", "reconnection failed") to preserve structured
logging.

♻️ Duplicate comments (1)

tools/workload/schema/jsonzstd/json_zstd.go (1)

584-590: ⚠️ Potential issue | 🟠 Major

Use a single random draw for entity update branch selection.

The switch currently samples r.Float64() twice, which distorts the configured weights for update type selection.

🐛 Proposed fix

 func (w *JSONZstdWorkload) buildEntityUpdateWithValues(tableIndex int, r *rand.Rand) (string, []interface{}) {
 	tableName := getEntityTableName(tableIndex)
 	now := time.Now()
 	slot := w.slot(tableIndex)

 	upper := minUint64(w.perTableUpdateKeySpace, w.entitySeq[slot].Load())
 	seq := randSeq(r, upper)

+	p := r.Float64()
 	switch {
-	case r.Float64() < entityUpdateBySecondaryIDWeight:
+	case p < entityUpdateBySecondaryIDWeight:
 		secondaryID := w.secondaryID(tableIndex, seq)
 		sql := fmt.Sprintf("UPDATE %s SET `delete_after` = ?, `updated_at` = ?, `secondary_id` = ? WHERE (`secondary_id` = ?) AND `delete_after` IS NULL", tableName)
 		return sql, []interface{}{nil, now, secondaryID, secondaryID}
-	case r.Float64() < entityUpdateBySecondaryIDWeight+entityUpdateByCompositeKeyWeight:
+	case p < entityUpdateBySecondaryIDWeight+entityUpdateByCompositeKeyWeight:
 		userID := w.userID(tableIndex, seq)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/workload/schema/jsonzstd/json_zstd.go` around lines 584 - 590, The
branch selection currently calls r.Float64() twice in the switch, skewing
weights; change the switch to draw once into a local variable (e.g., p :=
r.Float64()) and then compare that single value against
entityUpdateBySecondaryIDWeight and
entityUpdateBySecondaryIDWeight+entityUpdateByCompositeKeyWeight to choose the
UPDATE-by-secondary-id branch (using w.secondaryID) or the
UPDATE-by-composite-key branch (using w.userID), leaving the existing SQL
generation logic unchanged.

🧹 Nitpick comments (1)

tools/workload/schema/jsonzstd/json_zstd_test.go (1)

13-236: Prefer testify/require assertions in new test files.

These tests are clear, but switching to require will align with repository test conventions and reduce repetitive failure boilerplate.

As per coding guidelines: **/*_test.go: Use unit test files named *_test.go in Go; favor deterministic tests and use testify/require.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/workload/schema/jsonzstd/json_zstd_test.go` around lines 13 - 236,
Replace t.Fatalf-based assertions in this test file with testify/require calls:
import "github.com/stretchr/testify/require" and use
require.Equal/Len/True/Type/NoError as appropriate inside
TestJSONZstdWorkloadRowSizingAndClamping,
TestJSONZstdWorkloadBuildCreateTableStatement,
TestJSONZstdWorkloadBuildInsertSqlWithValues,
TestJSONZstdWorkloadBuildUpdateSqlWithValues, and
TestJSONZstdWorkloadZstdPayloadMode; update assertions that check lengths and
types (e.g., entityMediaSize, batchMetaSize, foundEntity/foundBatch
expectations, type switches on values from BuildInsertSqlWithValues,
BuildUpdateSqlWithValues, and the internal helpers
buildEntityInsertWithValues/buildBatchInsertWithValues) to use require so
failures abort the test immediately and remove repetitive t.Fatalf boilerplate.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/workload/app.go`:
- Around line 338-340: The returned raw error (the branch that does "if err !=
nil { return 0, err }") must be wrapped to preserve stack traces; change the
return to propagate a wrapped error such as "return 0, errors.Trace(err)" (or
"errors.WrapError(err, \"insert failed\")" per project conventions) so
lower-level DB/library errors carry trace information when propagated from the
insert path.

In `@tools/workload/readme.md`:
- Line 83: Update the README phrase "zstd friendly" to the hyphenated form
"zstd-friendly" in the JSON payload mode description (the line containing
"-json-payload-mode zstd" / "zstd friendly JSON-like payload") so the compound
modifier is grammatically correct; locate the exact string "zstd friendly
JSON-like payload" and replace it with "zstd-friendly JSON-like payload".

In `@tools/workload/schema/jsonzstd/json_zstd_test.go`:
- Line 1: Add the standard project copyright header at the very top of the
json_zstd_test.go file (the file declaring package jsonzstd) so CI recognizes
the file as properly licensed; prepend the canonical header comment block above
the package jsonzstd declaration and ensure it matches the repository's standard
formatting/years/owner exactly.
- Around line 77-86: The media field index is wrong when migrated_at is present:
instead of always using values[colCount-1], compute a mediaIndex that accounts
for the extra migrated_at column (e.g. if strings.Contains(sql, "migrated_at")
set mediaIndex = colCount-2 else mediaIndex = colCount-1) and then use
values[mediaIndex] when extracting media (variables: sql, colCount, values,
media; column names: migrated_at, media_metadata). Update the test to use this
mediaIndex and the same type check/assertion on media.

In `@tools/workload/update.go`:
- Around line 143-146: The error returned from the update path is being passed
raw to app.handleUpdateError and returned unchanged, losing stack context; wrap
the original error with the project tracing wrapper (e.g., errors.Trace(err) or
errors.WrapError(err, "update task")) before calling app.handleUpdateError and
before returning so the propagated error contains a stack trace. Update the code
around handleUpdateError(err, task) and the return 0, err to use the wrappedErr
variable (wrapped via errors.Trace or errors.WrapError) when logging and
returning.

---

Outside diff comments:
In `@tools/workload/app.go`:
- Around line 265-283: Replace the fmt.Println calls in the reconnect path with
structured logs using github.com/pingcap/log (e.g. log.L() or log.S()) and pass
zap fields like zap.Error(err); specifically update the block after
app.isConnectionError(err) where conn.Close() and db.DB.Conn(ctx) are used so
the reconnect start/failure messages use log.L().Error or log.L().Info with
messages using spaces (no hyphens) and no function names, and include the error
as zap.Error(err) and any contextual fields (e.g., "reconnecting", "reconnection
failed") to preserve structured logging.

In `@tools/workload/update.go`:
- Around line 84-100: The reconnect branch uses fmt.Println which loses
structured zap fields; replace those fmt.Println calls with structured logging
via github.com/pingcap/log (e.g., log.L().Warn/Info/Error) and pass the error as
a zap field (zap.Error(err)) and any contextual keys (e.g., "phase":"reconnect")
instead of embedding function names or hyphens in the message; update the two
occurrences in the reconnect flow around
app.runTransaction/app.doUpdateOnce/app.isConnectionError where conn.Close(),
context.WithTimeout and db.DB.Conn are used so that the first message logs the
detected connection error with zap.Error(err) and the second logs the
reconnection failure with zap.Error(err) and retry delay info.

---

Duplicate comments:
In `@tools/workload/schema/jsonzstd/json_zstd.go`:
- Around line 584-590: The branch selection currently calls r.Float64() twice in
the switch, skewing weights; change the switch to draw once into a local
variable (e.g., p := r.Float64()) and then compare that single value against
entityUpdateBySecondaryIDWeight and
entityUpdateBySecondaryIDWeight+entityUpdateByCompositeKeyWeight to choose the
UPDATE-by-secondary-id branch (using w.secondaryID) or the
UPDATE-by-composite-key branch (using w.userID), leaving the existing SQL
generation logic unchanged.

---

Nitpick comments:
In `@tools/workload/schema/jsonzstd/json_zstd_test.go`:
- Around line 13-236: Replace t.Fatalf-based assertions in this test file with
testify/require calls: import "github.com/stretchr/testify/require" and use
require.Equal/Len/True/Type/NoError as appropriate inside
TestJSONZstdWorkloadRowSizingAndClamping,
TestJSONZstdWorkloadBuildCreateTableStatement,
TestJSONZstdWorkloadBuildInsertSqlWithValues,
TestJSONZstdWorkloadBuildUpdateSqlWithValues, and
TestJSONZstdWorkloadZstdPayloadMode; update assertions that check lengths and
types (e.g., entityMediaSize, batchMetaSize, foundEntity/foundBatch
expectations, type switches on values from BuildInsertSqlWithValues,
BuildUpdateSqlWithValues, and the internal helpers
buildEntityInsertWithValues/buildBatchInsertWithValues) to use require so
failures abort the test immediately and remove repetitive t.Fatalf boilerplate.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0b9b4e and 358de66.

📒 Files selected for processing (7)

tools/workload/app.go
tools/workload/config.go
tools/workload/readme.md
tools/workload/schema/jsonzstd/json_zstd.go
tools/workload/schema/jsonzstd/json_zstd_test.go
tools/workload/schema/jsonzstd/readme.md
tools/workload/update.go

✅ Files skipped from review due to trivial changes (1)

tools/workload/schema/jsonzstd/readme.md

🚧 Files skipped from review as they are similar to previous changes (1)

tools/workload/config.go

coderabbitai · 2026-02-25T09:00:12Z

tools/workload/app.go

+	if err != nil {
+		return 0, err
+	}


⚠️ Potential issue | 🟡 Minor

Wrap propagated insert errors to preserve stack traces.

This new execution path returns raw errors from lower-level DB/library calls.

🧩 Proposed fix

if err != nil { - return 0, err + return 0, errors.Trace(err) }

As per coding guidelines: When an error comes from a third-party or library call in Go, wrap it immediately with `errors.Trace(err)` or `errors.WrapError(...)` to attach a stack trace; upstream callers should propagate wrapped errors without wrapping again.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if err != nil {

return 0, err

}

if err != nil {

return 0, errors.Trace(err)

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/workload/app.go` around lines 338 - 340, The returned raw error (the branch that does "if err != nil { return 0, err }") must be wrapped to preserve stack traces; change the return to propagate a wrapped error such as "return 0, errors.Trace(err)" (or "errors.WrapError(err, \"insert failed\")" per project conventions) so lower-level DB/library errors carry trace information when propagated from the insert path.

coderabbitai · 2026-02-25T09:00:12Z

tools/workload/readme.md

 - Ensure the database is properly configured and has the necessary permissions.
 - Adjust the thread and batch-size parameters based on your needs.
+- Use `-batch-in-txn` to wrap each batch in a single explicit transaction (BEGIN/COMMIT).
+- For `json_zstd`, use `-json-payload-mode zstd` to generate a zstd friendly JSON-like payload, or `-json-payload-mode random` for incompressible payloads.


⚠️ Potential issue | 🟡 Minor

Hyphenate the compound modifier.

Use zstd-friendly to keep the phrase grammatically correct.

✍️ Proposed fix

-- For `json_zstd`, use `-json-payload-mode zstd` to generate a zstd friendly JSON-like payload, or `-json-payload-mode random` for incompressible payloads. +- For `json_zstd`, use `-json-payload-mode zstd` to generate a zstd-friendly JSON-like payload, or `-json-payload-mode random` for incompressible payloads.

🧰 Tools

🪛 LanguageTool

[grammar] ~83-~83: Use a hyphen to join words.
Context: ...on-payload-mode zstdto generate a zstd friendly JSON-like payload, or-json-pa...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/workload/readme.md` at line 83, Update the README phrase "zstd friendly" to the hyphenated form "zstd-friendly" in the JSON payload mode description (the line containing "-json-payload-mode zstd" / "zstd friendly JSON-like payload") so the compound modifier is grammatically correct; locate the exact string "zstd friendly JSON-like payload" and replace it with "zstd-friendly JSON-like payload".

coderabbitai · 2026-02-25T09:00:12Z

tools/workload/schema/jsonzstd/json_zstd_test.go

@@ -0,0 +1,236 @@
+package jsonzstd


⚠️ Potential issue | 🟠 Major

Add the standard copyright header to unblock CI.

The current pipeline is already failing on this file for missing/incorrect copyright.

✅ Proposed fix

+// Copyright 2026 PingCAP, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// See the License for the specific language governing permissions and +// limitations under the License. + package jsonzstd

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

package jsonzstd

// Copyright 2026 PingCAP, Inc.

//

// Licensed under the Apache License, Version 2.0 (the "License");

// you may not use this file except in compliance with the License.

// You may obtain a copy of the License at

//

// http://www.apache.org/licenses/LICENSE-2.0

//

// Unless required by applicable law or agreed to in writing, software

// distributed under the License is distributed on an "AS IS" BASIS,

// See the License for the specific language governing permissions and

// limitations under the License.

package jsonzstd

🧰 Tools

🪛 GitHub Actions: PR Build and Unit Test

[error] 1-1: The copyright information of following files is incorrect: ./tools/workload/schema/jsonzstd/json_zstd_test.go

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/workload/schema/jsonzstd/json_zstd_test.go` at line 1, Add the standard project copyright header at the very top of the json_zstd_test.go file (the file declaring package jsonzstd) so CI recognizes the file as properly licensed; prepend the canonical header comment block above the package jsonzstd declaration and ensure it matches the repository's standard formatting/years/owner exactly.

coderabbitai · 2026-02-25T09:00:12Z

tools/workload/schema/jsonzstd/json_zstd_test.go

+			if strings.Contains(sql, "migrated_at") {
+				colCount = 11
+			}
+			if len(values) != 3*colCount {
+				t.Fatalf("entity insert values len mismatch: got %d, want %d", len(values), 3*colCount)
+			}
+			media, ok := values[colCount-1].(string)
+			if !ok {
+				t.Fatalf("entity media_metadata type mismatch: %T", values[colCount-1])
+			}


⚠️ Potential issue | 🟡 Minor

Fix media field indexing when migrated_at is included.

In the migrated branch, values[colCount-1] points to migrated_at, not media_metadata.

🐛 Proposed fix

- media, ok := values[colCount-1].(string) + // media_metadata is always the 10th column in both entity insert shapes. + media, ok := values[9].(string) if !ok { - t.Fatalf("entity media_metadata type mismatch: %T", values[colCount-1]) + t.Fatalf("entity media_metadata type mismatch: %T", values[9]) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if strings.Contains(sql, "migrated_at") {

colCount = 11

}

if len(values) != 3*colCount {

t.Fatalf("entity insert values len mismatch: got %d, want %d", len(values), 3*colCount)

}

media, ok := values[colCount-1].(string)

if !ok {

t.Fatalf("entity media_metadata type mismatch: %T", values[colCount-1])

}

if strings.Contains(sql, "migrated_at") {

colCount = 11

}

if len(values) != 3*colCount {

t.Fatalf("entity insert values len mismatch: got %d, want %d", len(values), 3*colCount)

}

// media_metadata is always the 10th column in both entity insert shapes.

media, ok := values[9].(string)

if !ok {

t.Fatalf("entity media_metadata type mismatch: %T", values[9])

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/workload/schema/jsonzstd/json_zstd_test.go` around lines 77 - 86, The media field index is wrong when migrated_at is present: instead of always using values[colCount-1], compute a mediaIndex that accounts for the extra migrated_at column (e.g. if strings.Contains(sql, "migrated_at") set mediaIndex = colCount-2 else mediaIndex = colCount-1) and then use values[mediaIndex] when extracting media (variables: sql, colCount, values, media; column names: migrated_at, media_metadata). Update the test to use this mediaIndex and the same type check/assertion on media.

coderabbitai · 2026-02-25T09:00:12Z

tools/workload/update.go

 	if err != nil {
-		return app.handleUpdateError(err, task)
+		app.handleUpdateError(err, task)
+		return 0, err
 	}


⚠️ Potential issue | 🟡 Minor

Trace update execution errors before returning.

The refactored update path now propagates raw errors and loses stack context.

🧩 Proposed fix

import ( "context" "database/sql" "fmt" "math/rand" "sync" "time" + "github.com/pingcap/errors" plog "github.com/pingcap/log" @@ if err != nil { app.handleUpdateError(err, task) - return 0, err + return 0, errors.Trace(err) }

As per coding guidelines: When an error comes from a third-party or library call in Go, wrap it immediately with `errors.Trace(err)` or `errors.WrapError(...)` to attach a stack trace; upstream callers should propagate wrapped errors without wrapping again.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/workload/update.go` around lines 143 - 146, The error returned from the update path is being passed raw to app.handleUpdateError and returned unchanged, losing stack context; wrap the original error with the project tracing wrapper (e.g., errors.Trace(err) or errors.WrapError(err, "update task")) before calling app.handleUpdateError and before returning so the propagated error contains a stack trace. Update the code around handleUpdateError(err, task) and the return 0, err to use the wrappedErr variable (wrapped via errors.Trace or errors.WrapError) when logging and returning.

add workload

0a3a5e4

ti-chi-bot bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Feb 22, 2026

ti-chi-bot bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 22, 2026

coderabbitai bot reviewed Feb 22, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 22, 2026

View reviewed changes

add more workload

e1e42ac

ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 23, 2026

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

support txn

b575ebb

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

support zstd

f0b9b4e

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

fix

e5d58a4

lidezhu changed the title ~~[DNM]~~ workload: add json_zstd workload Feb 25, 2026

add readme

358de66

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

-package jsonzstd
+// Copyright 2026 PingCAP, Inc.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// See the License for the specific language governing permissions and
+// limitations under the License.
+package jsonzstd

Conversation

lidezhu commented Feb 22, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Summary by CodeRabbit

Release Notes

Uh oh!

ti-chi-bot bot commented Feb 22, 2026

Uh oh!

gemini-code-assist bot commented Feb 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Feb 25, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

lidezhu commented Feb 22, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 22, 2026 •

edited

Loading