Skip to content

feat(e2e): benchmark mempool and memiavl#481

Open
traviolus wants to merge 11 commits intomainfrom
feat/e2e-benchmark
Open

feat(e2e): benchmark mempool and memiavl#481
traviolus wants to merge 11 commits intomainfrom
feat/e2e-benchmark

Conversation

@traviolus
Copy link
Contributor

Description

Closes: #XXXX


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title, you can find examples of the prefixes below:
  • confirmed ! in the type prefix if API or client breaking change
  • targeted the correct branch
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • included the necessary unit and integration tests
  • updated the relevant documentation or specification, including comments for documenting Go code
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

@traviolus traviolus requested a review from a team as a code owner February 25, 2026 09:47
@github-actions
Copy link

github-actions bot commented Feb 25, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

Adds a full end-to-end benchmarking suite: Makefile target, README, test harness, configs, load generators, collectors, reporters, Move benchmark sources, and cluster API extensions for measuring mempool, block, throughput, and latency across benchmark variants.

Changes

Cohort / File(s) Summary
Build
Makefile
Added benchmark-e2e target and added it to .PHONY.
Documentation
integration-tests/e2e/benchmark/README.md
New comprehensive benchmark guide (topology, scenarios, metrics, runs, env, JSON schema, examples).
Benchmark tests
integration-tests/e2e/benchmark/benchmark_test.go
New build-tagged e2e benchmark harness with many TestBenchmark* cases and orchestration helpers (setup, run, result write).
Config
integration-tests/e2e/benchmark/config.go
New Variant type and enriched BenchConfig with defaults, factory helpers, GetGasLimit, and TotalTx.
Load generation
integration-tests/e2e/benchmark/load.go
Added load patterns and helpers: BurstLoad, SequentialLoad, OutOfOrderLoad, SingleNodeLoad, Move-exec variants, pre-sign utilities, Warmup, and types TxSubmission/LoadResult.
Collection & analysis
integration-tests/e2e/benchmark/collector.go
Added MempoolPoller, CollectResults, WaitForAllIncluded, CollectInitialMetas, BenchResult, BlockStat, and latency/TPS aggregation helpers.
Reporting
integration-tests/e2e/benchmark/report.go
Added WriteResult, loaders (LoadResults, baseline loaders) and printers (PrintComparisonTable, PrintImprovementTable) plus delta formatting.
Cluster API & types
integration-tests/e2e/cluster.go, integration-tests/e2e/cluster/cluster.go
Re-exported aliases; extended ClusterOptions (MemIAVL, TimeoutCommit, ValidatorCount, MaxBlockGas, NoAllowQueued); added NodeRPCPort, LatestHeight, UnconfirmedTxCount, QueryBlock, account/address helpers, signed-tx generation/broadcast, multi-validator init and genesis utilities.
Move benchmark source
integration-tests/e2e/benchmark/move-bench/sources/BenchHeavyState.move
New Move module Publisher::BenchHeavyState with shared and per-account tables and write_mixed entry for mixed shared/local writes.

Sequence Diagram

sequenceDiagram
    participant Test as Test
    participant Cluster as Cluster
    participant Load as LoadGenerator
    participant Poller as MempoolPoller
    participant Collector as Collector
    participant Reporter as Reporter

    Test->>Cluster: Setup cluster with BenchConfig
    Test->>Cluster: CollectInitialMetas
    Test->>Test: Warmup
    Test->>Cluster: Record start height
    Test->>Poller: Start mempool polling
    Test->>Load: Execute load (Burst/Sequential/PreSigned/MoveExec)
    Load->>Cluster: Submit transactions
    Load-->>Test: Return LoadResult
    Test->>Cluster: WaitForAllIncluded (drain + finality)
    Test->>Poller: Stop (get peak)
    Test->>Collector: CollectResults (query blocks, compute latencies)
    Collector->>Cluster: QueryBlock / LatestHeight
    Collector-->>Test: Return BenchResult
    Test->>Reporter: WriteResult / PrintComparisonTable / PrintImprovementTable
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through mempools, swift and spry,
I chased each nonce beneath the sky,
I counted blocks and timed the race,
I logged the peaks and found the pace,
A rabbit cheered — benchmarks flew high.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(e2e): benchmark mempool and memiavl' clearly summarizes the main change—adding end-to-end benchmarks for mempool and MemIAVL. It uses proper conventional commit format with 'feat' prefix and concise, specific language.
Description check ✅ Passed The description uses the repository template but is incomplete, with only placeholder text (#XXXX) and unchecked checklists. However, the PR objectives indicate the PR adds end-to-end benchmarks for mempool and memiavl, confirming the description is related to the changeset even if minimal.
Docstring Coverage ✅ Passed Docstring coverage is 81.54% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/e2e-benchmark
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (5)
Makefile (1)

247-249: Parameterize the benchmark test filter and keep module mode read-only.

This target is useful as-is, but adding BENCH_RUN and -mod=readonly makes it safer and easier to run focused suites in CI/dev workflows.

♻️ Proposed improvement
+BENCH_RUN ?= TestBenchmark
+
 benchmark-e2e:
-	cd integration-tests/e2e && go test -v -tags benchmark -run TestBenchmark -timeout 30m -count=1 ./benchmark/
+	cd integration-tests/e2e && go test -mod=readonly -v -tags benchmark -run '$(BENCH_RUN)' -timeout 30m -count=1 ./benchmark/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 247 - 249, Update the benchmark-e2e Makefile target to
accept a BENCH_RUN filter and run Go in module readonly mode: change the command
invoked by the benchmark-e2e target (currently invoking go test -v -tags
benchmark -run TestBenchmark -timeout 30m -count=1 ./benchmark/) to use the
BENCH_RUN variable (e.g., -run "$(BENCH_RUN:-TestBenchmark)") and add the
-mod=readonly flag so the go test invocation becomes go test -mod=readonly -v
-tags benchmark -run "$(BENCH_RUN)" -timeout 30m -count=1 ./benchmark/; ensure
the target name benchmark-e2e and variable BENCH_RUN are used exactly as
referenced.
integration-tests/e2e/benchmark/collector.go (1)

199-199: Make the finality wait context-aware.

This fixed sleep ignores cancellation and can keep tests waiting after timeout/cancel paths.

♻️ Proposed improvement
-	// Wait one more block to ensure finality
-	time.Sleep(3 * time.Second)
+	// Wait one more block to ensure finality
+	select {
+	case <-ctx.Done():
+		return 0, ctx.Err()
+	case <-time.After(3 * time.Second):
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/collector.go` at line 199, Replace the fixed
sleep with a context-aware wait: instead of time.Sleep(3 * time.Second) use a
select that waits on time.After(3*time.Second) and ctx.Done() (where ctx is the
request/test context in scope) so the routine aborts promptly when cancelled;
ensure the surrounding function (e.g., the finality wait routine) returns or
propagates ctx.Err() on cancellation rather than blocking until the sleep
completes.
integration-tests/e2e/cluster/cluster.go (1)

294-303: Add a request-level timeout for block RPC calls.

With benchmark flows using long-lived contexts, a stalled RPC can block result collection indefinitely.

♻️ Proposed improvement
 	url := fmt.Sprintf("http://127.0.0.1:%d/block?height=%d", n.Ports.RPC, height)
-	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
+	reqCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
+	defer cancel()
+	req, err := http.NewRequestWithContext(reqCtx, http.MethodGet, url, nil)
 	if err != nil {
 		return BlockResult{}, err
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 294 - 303, The block
RPC call currently uses the long-lived ctx and can hang; wrap the request in a
short request-level timeout by creating a derived context, e.g. ctxWithTimeout,
cancel := context.WithTimeout(ctx, rpcRequestTimeout) (choose a sensible
duration constant like rpcRequestTimeout), defer cancel(), and use
ctxWithTimeout in http.NewRequestWithContext instead of ctx; ensure you
reference url, req creation, and the http.DefaultClient.Do(req) call so the
request will fail fast on timeout rather than blocking result collection.
integration-tests/e2e/benchmark/report.go (1)

18-20: Avoid overwriting previous benchmark runs with identical labels.

Using label-only filenames drops historical runs and makes trend comparisons harder.

♻️ Proposed improvement
 import (
 	"encoding/json"
 	"fmt"
 	"os"
 	"path/filepath"
 	"strings"
 	"testing"
+	"time"
 )
@@
 	safe := strings.NewReplacer("/", "_", " ", "_", "+", "_").Replace(result.Config.Label)
-	path := filepath.Join(dir, safe+".json")
+	stamp := time.Now().UTC().Format("20060102T150405Z")
+	path := filepath.Join(dir, fmt.Sprintf("%s_%s.json", safe, stamp))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/report.go` around lines 18 - 20, The current
filename uses only result.Config.Label (via safe :=
strings.NewReplacer(...).Replace(result.Config.Label) and path :=
filepath.Join(dir, safe+".json")), which overwrites previous runs with the same
label; change the filename to include a unique suffix (e.g., a timestamp from
time.Now().UTC().Format with seconds or milliseconds, or a run ID/UUID) so each
run produces a distinct file (e.g., combine safe + "_" + timestamp + ".json")
and update the code that reads/writes using the new path variable accordingly.
integration-tests/e2e/benchmark/load.go (1)

185-197: Duplicate slice initialisation in sequencePattern.

The []uint64{base+2, base, base+1} literal is initialised twice — once for the count <= 3 branch and once before the count > 3 loop. Unifying them removes the duplication.

♻️ Proposed refactor
 func sequencePattern(base uint64, count int) []uint64 {
-	if count <= 3 {
-		seqs := []uint64{base + 2, base, base + 1}
-		return seqs[:count]
-	}
-
 	seqs := []uint64{base + 2, base, base + 1}
+	if count <= 3 {
+		return seqs[:count]
+	}
 	for i := 3; i < count; i++ {
 		seqs = append(seqs, base+uint64(i))
 	}
-
 	return seqs
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/load.go` around lines 185 - 197, The function
sequencePattern duplicates the literal []uint64{base + 2, base, base + 1} in
both branches; to fix, initialize seqs once (seqs := []uint64{base + 2, base,
base + 1}), then if count <= 3 return seqs[:count], otherwise run the existing
for loop (for i := 3; i < count; i++ { seqs = append(seqs, base+uint64(i)) })
and return seqs; this removes the duplicated initialization while preserving
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/e2e/benchmark/benchmark_test.go`:
- Around line 253-256: The benchmark creates a MempoolPoller per node but each
MempoolPoller already computes the max across all nodes, causing duplicated
metrics and the saved result uses a hardcoded peak_mempool_size=0; fix by
instantiating a single MempoolPoller via NewMempoolPoller(ctx, cluster,
mempoolPollInterval) (reuse that one for the whole test instead of creating one
per node), and when persisting results read the actual max value from the poller
(e.g., poller.Max or the appropriate accessor on MempoolPoller) into
peak_mempool_size rather than 0; apply the same change for the analogous block
around the other occurrence (lines 266-271).

In `@integration-tests/e2e/benchmark/collector.go`:
- Around line 157-164: The throughput window currently uses the first and last
entries of blockStats which includes empty blocks; instead scan blockStats to
find the first and last entries that actually contain benchmark transactions
(e.g., where a transaction count field such as TxCount > 0 or a Txs slice length
> 0), assign their Time values to first/last, compute blockDuration =
last.Sub(first).Seconds(), and only set totalDuration when blockDuration > 0;
update the code around blockStats, first, last, blockDuration and totalDuration
accordingly so the TPS window excludes empty blocks.

In `@integration-tests/e2e/benchmark/load.go`:
- Line 48: Check and guard cfg.NodeCount before performing the modulo in both
BurstLoad and OutOfOrderLoad: if cfg.NodeCount is zero (or otherwise invalid)
return an error or set a safe default before entering the loop so the expression
i % cfg.NodeCount cannot panic; update the entry checks in BurstLoad and add the
same pre-loop guard to OutOfOrderLoad (the code computing viaNode using i %
cfg.NodeCount) to ensure safe execution.
- Around line 13-19: TxSubmission currently lacks an application-level response
code and non-zero res.Code checks are ignored; add a Code int (or uint32) field
to the TxSubmission struct and populate it wherever responses are processed
(e.g., in BurstLoad where res.Code is checked) so callers can distinguish
success vs app-level failures; update the response handling logic in BurstLoad
to assign res.Code into the TxSubmission entries and apply the same change to
OutOfOrderLoad and SingleNodeLoad so all load paths record the code
consistently.
- Line 135: SingleNodeLoad currently takes targetNode int and passes it straight
to SendBankTxWithSequence without bounds checking; add an entry validation in
SingleNodeLoad to ensure 0 <= targetNode < cfg.NodeCount and handle invalid
values by returning an early LoadResult/error (or logging and aborting the load)
instead of calling SendBankTxWithSequence; apply the same validation to the
other similar usage site referenced in the diff so no out-of-range targetNode
can reach SendBankTxWithSequence and corrupt the benchmark.
- Around line 44-73: The per-account goroutine loops in BurstLoad (and similarly
in OutOfOrderLoad and SingleNodeLoad) do not observe ctx cancellation and will
continue dispatching txs after the context is done; update each loop that
iterates up to cfg.TxPerAccount (the goroutine that calls
cluster.SendBankTxWithSequence and appends to result.Submissions) to check
ctx.Done() at the top of each iteration (e.g., use a non-blocking select on
ctx.Done() or check ctx.Err()) and break/return if cancelled so the goroutine
stops sending work and allows wg.Wait() to complete promptly.
- Around line 185-197: sequencePattern currently emits gaps when count < 3 (e.g.
returns [base+2] for count==1), which leaves some sequence numbers never
submitted and transactions stuck; fix by changing sequencePattern to return a
contiguous sequence for small counts (for count <= 3 or specifically count < 3)
— i.e., when count < 3 return []uint64{base, base+1, ...} truncated to count,
otherwise keep the existing out-of-order logic; alternatively (or in addition)
validate TxPerAccount in OutOfOrderLoad and enforce a minimum of 3 before
calling sequencePattern to avoid calling it with invalid counts. Ensure you
update the function sequencePattern and/or the caller
OutOfOrderLoad/TxPerAccount validation accordingly.
- Around line 200-212: The Warmup function currently ignores the result of
SendBankTxWithSequence and doesn't update the metas map, which hides
transport/app errors and leaves stale on-chain sequence numbers; modify Warmup
to capture and check the return/error from cluster.SendBankTxWithSequence, log
any failures (including context like account name and node), and on success
increment the corresponding metas[name].Sequence (or refresh metas[name] from
chain) so subsequent calls use the updated sequence; also add a brief comment in
Warmup documenting the sequence-side effect and that callers must use the
updated metas or re-fetch account metadata.

In `@integration-tests/e2e/benchmark/README.md`:
- Around line 87-92: Update the README to use one consistent default results
directory everywhere: replace occurrences of `results/` with
`benchmark/results/` (or vice versa based on project convention) so all
references (e.g., the "Results land in `benchmark/results/` as JSON." line and
the "After baseline results exist in `results/`:" line) point to the same path,
and run a quick grep to ensure no remaining mismatched occurrences remain in the
document.
- Around line 107-112: Add language identifiers (e.g., "text") to the fenced
code blocks in the README so Markdown lint (MD040) stops warning: update each
triple-backtick block that contains the table or code samples (the backtick
fences around the tabular benchmark output and the code snippets under the
"benchmark/" section) to start with ```text instead of ```; ensure all
occurrences noted in the comment (the table blocks and the code listing blocks)
are changed consistently.
- Around line 79-81: The E2E_INITIAD_BIN environment variable is currently
applied only to the cd command so go test may not see it; change how the env var
is set so go test receives it (either prefix the go test invocation with
E2E_INITIAD_BIN=./build/initiad or export E2E_INITIAD_BIN before running cd and
go test) to ensure the TestBenchmarkBaseline run uses the intended binary
(reference the E2E_INITIAD_BIN variable and the go test -run
TestBenchmarkBaseline invocation).

In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 332-336: The loop over decoded.Result.Block.Data.Txs currently
ignores base64 decode failures (using continue) which silently under-reports
txs; replace the silent continue with a fail-fast error return that includes
block context and tx index: when base64Decode(txBase64) returns decErr,
construct and return an error (or propagate decErr) that includes
decoded.Result.Block.Header.Height (or block identifier from decoded), the tx
index within the loop, and the original decErr so callers can surface the
failure and metrics remain correct; update any surrounding function signature to
return an error if needed (identify code at the loop over
decoded.Result.Block.Data.Txs and the base64Decode call).

---

Nitpick comments:
In `@integration-tests/e2e/benchmark/collector.go`:
- Line 199: Replace the fixed sleep with a context-aware wait: instead of
time.Sleep(3 * time.Second) use a select that waits on time.After(3*time.Second)
and ctx.Done() (where ctx is the request/test context in scope) so the routine
aborts promptly when cancelled; ensure the surrounding function (e.g., the
finality wait routine) returns or propagates ctx.Err() on cancellation rather
than blocking until the sleep completes.

In `@integration-tests/e2e/benchmark/load.go`:
- Around line 185-197: The function sequencePattern duplicates the literal
[]uint64{base + 2, base, base + 1} in both branches; to fix, initialize seqs
once (seqs := []uint64{base + 2, base, base + 1}), then if count <= 3 return
seqs[:count], otherwise run the existing for loop (for i := 3; i < count; i++ {
seqs = append(seqs, base+uint64(i)) }) and return seqs; this removes the
duplicated initialization while preserving behavior.

In `@integration-tests/e2e/benchmark/report.go`:
- Around line 18-20: The current filename uses only result.Config.Label (via
safe := strings.NewReplacer(...).Replace(result.Config.Label) and path :=
filepath.Join(dir, safe+".json")), which overwrites previous runs with the same
label; change the filename to include a unique suffix (e.g., a timestamp from
time.Now().UTC().Format with seconds or milliseconds, or a run ID/UUID) so each
run produces a distinct file (e.g., combine safe + "_" + timestamp + ".json")
and update the code that reads/writes using the new path variable accordingly.

In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 294-303: The block RPC call currently uses the long-lived ctx and
can hang; wrap the request in a short request-level timeout by creating a
derived context, e.g. ctxWithTimeout, cancel := context.WithTimeout(ctx,
rpcRequestTimeout) (choose a sensible duration constant like rpcRequestTimeout),
defer cancel(), and use ctxWithTimeout in http.NewRequestWithContext instead of
ctx; ensure you reference url, req creation, and the http.DefaultClient.Do(req)
call so the request will fail fast on timeout rather than blocking result
collection.

In `@Makefile`:
- Around line 247-249: Update the benchmark-e2e Makefile target to accept a
BENCH_RUN filter and run Go in module readonly mode: change the command invoked
by the benchmark-e2e target (currently invoking go test -v -tags benchmark -run
TestBenchmark -timeout 30m -count=1 ./benchmark/) to use the BENCH_RUN variable
(e.g., -run "$(BENCH_RUN:-TestBenchmark)") and add the -mod=readonly flag so the
go test invocation becomes go test -mod=readonly -v -tags benchmark -run
"$(BENCH_RUN)" -timeout 30m -count=1 ./benchmark/; ensure the target name
benchmark-e2e and variable BENCH_RUN are used exactly as referenced.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 82be464 and de2baaf.

📒 Files selected for processing (9)
  • Makefile
  • integration-tests/e2e/benchmark/README.md
  • integration-tests/e2e/benchmark/benchmark_test.go
  • integration-tests/e2e/benchmark/collector.go
  • integration-tests/e2e/benchmark/config.go
  • integration-tests/e2e/benchmark/load.go
  • integration-tests/e2e/benchmark/report.go
  • integration-tests/e2e/cluster.go
  • integration-tests/e2e/cluster/cluster.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
integration-tests/e2e/benchmark/README.md (2)

37-38: ⚠️ Potential issue | 🟡 Minor

Use one consistent default results directory path throughout the doc.

results/ and benchmark/results/ are both documented as defaults, which is ambiguous for users and CI scripts.

✏️ Suggested cleanup
-- The JSON results persist in `results/`. Subsequent `TestBenchmarkFullComparison` runs load them automatically.
+- The JSON results persist in `benchmark/results/`. Subsequent `TestBenchmarkFullComparison` runs load them automatically.
@@
-| `BENCHMARK_RESULTS_DIR` | `results/` | Output directory for JSON results |
+| `BENCHMARK_RESULTS_DIR` | `benchmark/results/` | Output directory for JSON results |

Also applies to: 87-92, 189-189

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/README.md` around lines 37 - 38, The README
documents two different default results directories ("results/" and
"benchmark/results/"), causing ambiguity for users and CI; pick one canonical
default path and update all mentions to it (including the sentence referencing
TestBenchmarkFullComparison and the other occurrences noted around lines 87-92
and 189) so the doc consistently uses the same directory throughout; ensure any
examples, scripts, and the TestBenchmarkFullComparison description reflect that
single chosen path.

107-112: ⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced blocks to satisfy MD040.

These blocks should use ```text (or another explicit language) to clear markdownlint warnings.

✏️ Suggested cleanup
-```
+```text
 Config                    | Variant      |   TPS | vs base |  P50ms | vs base |  P95ms | vs base |  P99ms | vs base | Peak Mempool
 ...

- +text
Config | Variant | TPS | P50ms | P95ms | P99ms | Max ms | Peak Mempool
...


-```
+```text
benchmark/
  config.go          Variant definitions, BenchConfig, preset constructors
...
</details>


Also applies to: 130-134, 155-163

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @integration-tests/e2e/benchmark/README.md around lines 107 - 112, The
markdown fenced code blocks triggering MD040 should include an explicit language
identifier (e.g., use text) instead of bare ; update the shown table block
and the other fenced blocks (the blocks around the snippet at lines 130-134 and
155-163) by changing their opening fences from totext (or another
appropriate language) so markdownlint MD040 no longer warns and the content
renders correctly.


</details>

</blockquote></details>
<details>
<summary>integration-tests/e2e/benchmark/collector.go (1)</summary><blockquote>

`131-137`: _⚠️ Potential issue_ | _🟠 Major_

**TPS window is still derived from block non-emptiness, not benchmark tx inclusion.**

At Line 133/160, `TxCount` tracks all txs in the block, so duration can still include blocks with no benchmark txs. The window should use first/last block times where matched benchmark txs were actually found.

 

<details>
<summary>🐛 Proposed fix</summary>

```diff
 	var (
 		blockStats []BlockStat
 		latencies  []float64
 		included   int
+		firstIncludedTime time.Time
+		lastIncludedTime  time.Time
 	)
@@
-		for _, txHash := range block.TxHashes {
+		includedInBlock := 0
+		for _, txHash := range block.TxHashes {
 			sub, ok := submitMap[txHash]
 			if !ok {
 				continue
 			}
 			included++
+			includedInBlock++
 			latencyMs := float64(block.BlockTime.Sub(sub.SubmitTime).Milliseconds())
 			if latencyMs < 0 {
 				latencyMs = 0
 			}
 			latencies = append(latencies, latencyMs)
 		}
+		if includedInBlock > 0 {
+			if firstIncludedTime.IsZero() {
+				firstIncludedTime = block.BlockTime
+			}
+			lastIncludedTime = block.BlockTime
+		}
 	}
@@
-	// use span from first to last block that actually included benchmark txs
-	var firstTxTime, lastTxTime time.Time
-	for _, bs := range blockStats {
-		if bs.TxCount == 0 {
-			continue
-		}
-		if firstTxTime.IsZero() {
-			firstTxTime = bs.Time
-		}
-		lastTxTime = bs.Time
-	}
-	if !firstTxTime.IsZero() {
-		blockDuration := lastTxTime.Sub(firstTxTime).Seconds()
+	if !firstIncludedTime.IsZero() {
+		blockDuration := lastIncludedTime.Sub(firstIncludedTime).Seconds()
 		if blockDuration > 0 {
 			totalDuration = blockDuration
 		}
 	}
```
</details>


Also applies to: 157-173

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/collector.go` around lines 131 - 137,
BlockStat currently uses TxCount: len(block.TxHashes) and block times which
causes the TPS window to include blocks that contain no benchmark transactions;
update the BlockStat population logic (where blockStats is appended) to count
only matched benchmark transactions (e.g., filter block.TxHashes by the
benchmark tx set and set TxCount to that filtered length) and also record the
timestamp of the first/last matched benchmark tx (or a flag/field indicating
presence) so that the overall window computation (the later logic around
computing start/end times and TPS between lines ~157-173) uses the timestamps of
the first and last blocks that actually contained benchmark txs rather than any
non-empty block times. Ensure references to BlockStat, TxCount, blockStats and
the window computation code are updated accordingly.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @integration-tests/e2e/benchmark/collector.go:

  • Around line 48-64: NewMempoolPoller may create a time.Ticker with a
    non-positive interval which panics; in the constructor (NewMempoolPoller) ensure
    p.interval is a positive duration before starting the goroutine—if the provided
    interval is <= 0, replace it with a sensible default (e.g., time.Second or
    another minimum > 0) so that MempoolPoller.run can safely create a
    time.NewTicker(p.interval) without panicking.
  • Around line 206-208: The post-mempool wait currently uses a blocking
    time.Sleep(3 * time.Second) which ignores ctx cancellation; replace it with a
    context-aware timer: create a time.NewTimer(3*time.Second) and use select to
    wait on either timer.C or ctx.Done(), stop the timer when appropriate, and if
    ctx.Done() fires propagate/return the context error instead of proceeding to
    call cluster.LatestHeight(ctx, 0).

In @integration-tests/e2e/cluster/cluster.go:

  • Around line 886-890: The code only sets memiavl.enable to "true" when
    c.opts.MemIAVL is true, leaving the false case to upstream defaults; update the
    logic around the conditional that calls setTOMLValue(appPath, "memiavl",
    "enable", ...) so that it always writes an explicit value: write "true" when
    c.opts.MemIAVL is true and write "false" when it's false (i.e., call
    setTOMLValue with "false" in the else branch), ensuring the memiavl.enable key
    is deterministically set regardless of upstream config.

Duplicate comments:
In @integration-tests/e2e/benchmark/collector.go:

  • Around line 131-137: BlockStat currently uses TxCount: len(block.TxHashes) and
    block times which causes the TPS window to include blocks that contain no
    benchmark transactions; update the BlockStat population logic (where blockStats
    is appended) to count only matched benchmark transactions (e.g., filter
    block.TxHashes by the benchmark tx set and set TxCount to that filtered length)
    and also record the timestamp of the first/last matched benchmark tx (or a
    flag/field indicating presence) so that the overall window computation (the
    later logic around computing start/end times and TPS between lines ~157-173)
    uses the timestamps of the first and last blocks that actually contained
    benchmark txs rather than any non-empty block times. Ensure references to
    BlockStat, TxCount, blockStats and the window computation code are updated
    accordingly.

In @integration-tests/e2e/benchmark/README.md:

  • Around line 37-38: The README documents two different default results
    directories ("results/" and "benchmark/results/"), causing ambiguity for users
    and CI; pick one canonical default path and update all mentions to it (including
    the sentence referencing TestBenchmarkFullComparison and the other occurrences
    noted around lines 87-92 and 189) so the doc consistently uses the same
    directory throughout; ensure any examples, scripts, and the
    TestBenchmarkFullComparison description reflect that single chosen path.
  • Around line 107-112: The markdown fenced code blocks triggering MD040 should
    include an explicit language identifier (e.g., use text) instead of bare ;
    update the shown table block and the other fenced blocks (the blocks around the
    snippet at lines 130-134 and 155-163) by changing their opening fences from totext (or another appropriate language) so markdownlint MD040 no longer
    warns and the content renders correctly.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

**Configuration used**: Repository UI

**Review profile**: CHILL

**Plan**: Pro

**Cache: Disabled due to data retention organization setting**

**Knowledge base: Disabled due to data retention organization setting**

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between de2baaf1144434fe86ea58dc43a0d066cde3821c and e9fab6663dd0532eb038f3353c191049f39ae171.

</details>

<details>
<summary>📒 Files selected for processing (5)</summary>

* `integration-tests/e2e/benchmark/README.md`
* `integration-tests/e2e/benchmark/benchmark_test.go`
* `integration-tests/e2e/benchmark/collector.go`
* `integration-tests/e2e/benchmark/load.go`
* `integration-tests/e2e/cluster/cluster.go`

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (2)</summary>

* integration-tests/e2e/benchmark/benchmark_test.go
* integration-tests/e2e/benchmark/load.go

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
integration-tests/e2e/cluster/cluster.go (2)

886-892: Past review comment addressed correctly.

memiavl.enable is now always written explicitly, eliminating dependence on upstream defaults for both variants.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 886 - 892, The code
now always sets memiavl.enable explicitly; ensure the call to setTOMLValue
writes the string value derived from c.opts.MemIAVL by keeping memIAVLEnable and
the if block that sets it based on c.opts.MemIAVL, then pass memIAVLEnable to
setTOMLValue(appPath, "memiavl", "enable", memIAVLEnable) so memiavl.enable is
unambiguous regardless of upstream defaults (verify in the function where
memIAVLEnable, c.opts.MemIAVL, appPath and setTOMLValue are used).

332-336: Past review comment addressed correctly.

The fail-fast decode error with height and idx context directly addresses the previous concern about silent tx under-reporting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 332 - 336, The loop
already fails fast on decode errors and includes contextual info; no change
needed—keep the current base64Decode handling inside the for loop that returns
fmt.Errorf("failed to decode tx at height=%d index=%d: %w", height, idx, decErr)
so failures surface with height/idx context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 886-892: The code now always sets memiavl.enable explicitly;
ensure the call to setTOMLValue writes the string value derived from
c.opts.MemIAVL by keeping memIAVLEnable and the if block that sets it based on
c.opts.MemIAVL, then pass memIAVLEnable to setTOMLValue(appPath, "memiavl",
"enable", memIAVLEnable) so memiavl.enable is unambiguous regardless of upstream
defaults (verify in the function where memIAVLEnable, c.opts.MemIAVL, appPath
and setTOMLValue are used).
- Around line 332-336: The loop already fails fast on decode errors and includes
contextual info; no change needed—keep the current base64Decode handling inside
the for loop that returns fmt.Errorf("failed to decode tx at height=%d index=%d:
%w", height, idx, decErr) so failures surface with height/idx context.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between e9fab66 and 753bfca.

📒 Files selected for processing (2)
  • integration-tests/e2e/benchmark/collector.go
  • integration-tests/e2e/cluster/cluster.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • integration-tests/e2e/benchmark/collector.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
integration-tests/e2e/cluster/cluster.go (1)

886-890: ⚠️ Potential issue | 🟠 Major

Explicitly set memiavl.enable for both true and false paths.

At Line 886, only the true path is written. When c.opts.MemIAVL is false, config falls back to upstream defaults, which can skew baseline vs. variant comparisons.

🔧 Proposed fix
-		if c.opts.MemIAVL {
-			if err := setTOMLValue(appPath, "memiavl", "enable", "true"); err != nil {
-				return err
-			}
-		}
+		memIAVLEnable := "false"
+		if c.opts.MemIAVL {
+			memIAVLEnable = "true"
+		}
+		if err := setTOMLValue(appPath, "memiavl", "enable", memIAVLEnable); err != nil {
+			return err
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 886 - 890, The code
only sets memiavl.enable when c.opts.MemIAVL is true, causing defaults to leak
when false; update the conditional around c.opts.MemIAVL to explicitly call
setTOMLValue(appPath, "memiavl", "enable", "true") when true and call
setTOMLValue(appPath, "memiavl", "enable", "false") when false (preserve error
handling/returns), referencing c.opts.MemIAVL, setTOMLValue and appPath so the
configuration is deterministic for both branches.
integration-tests/e2e/benchmark/collector.go (1)

160-176: ⚠️ Potential issue | 🟠 Major

TPS window is still keyed off total block txs, not benchmark-matched txs.

Line 163 checks bs.TxCount, which includes non-benchmark traffic. That can shift first/last timestamps and distort TPS for this benchmark run.

🔧 Proposed fix
-	// use span from first to last block that actually included benchmark txs
-	var firstTxTime, lastTxTime time.Time
-	for _, bs := range blockStats {
-		if bs.TxCount == 0 {
-			continue
-		}
-		if firstTxTime.IsZero() {
-			firstTxTime = bs.Time
-		}
-		lastTxTime = bs.Time
-	}
+	// use span from first to last block that included matched benchmark txs
+	var firstTxTime, lastTxTime time.Time
+	for h := startHeight; h <= endHeight; h++ {
+		block, err := cluster.QueryBlock(ctx, 0, h)
+		if err != nil {
+			return BenchResult{}, fmt.Errorf("query block %d: %w", h, err)
+		}
+
+		matched := 0
+		for _, txHash := range block.TxHashes {
+			if _, ok := submitMap[txHash]; ok {
+				matched++
+			}
+		}
+		if matched == 0 {
+			continue
+		}
+		if firstTxTime.IsZero() {
+			firstTxTime = block.BlockTime
+		}
+		lastTxTime = block.BlockTime
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/collector.go` around lines 160 - 176, The TPS
window selection currently uses bs.TxCount (which includes non-benchmark
traffic) to find firstTxTime/lastTxTime; change the check to use the count of
benchmark-matched transactions for each block instead (e.g., compute
matchedCount from a field like bs.MatchedTxCount or by filtering bs.TxResults
for the benchmark marker/flag), skip blocks when matchedCount == 0, and use
those matched blocks to set firstTxTime/lastTxTime and compute
blockDuration/totalDuration (update the condition that referenced bs.TxCount and
any related logic in the loop that sets firstTxTime, lastTxTime, blockDuration
and totalDuration).
🧹 Nitpick comments (1)
integration-tests/e2e/benchmark/report.go (1)

118-137: Consider sorting loaded results for deterministic reporting output.

os.ReadDir ordering is filesystem-dependent; sorting results (or filenames) before returning will keep table ordering stable across environments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/report.go` around lines 118 - 137, The code
relies on os.ReadDir(dir) ordering which is filesystem-dependent; make the
output deterministic by sorting the entries slice by entry.Name() right after
calling os.ReadDir (before the for _, entry := range entries loop) using
sort.Slice or sort.SliceStable, so the subsequent reads/unmarshals produce a
stable results order (refer to the entries variable and the loop that reads
files into results).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/e2e/benchmark/report.go`:
- Around line 88-92: The check uses only r.Config.Variant so it hides deltas for
any row sharing that Variant; instead detect the actual baseline row itself.
Replace the condition at the block that sets tpsDelta/p50Delta/p95Delta so it
compares the selected row to the baseline (e.g., r == baseline) or compares the
full config identity (e.g., reflect.DeepEqual(r.Config, baseline.Config) or
compare a unique ID field) rather than r.Config.Variant; keep the rest of the
delta assignment logic unchanged and only set "-" when the row equals the
baseline.

---

Duplicate comments:
In `@integration-tests/e2e/benchmark/collector.go`:
- Around line 160-176: The TPS window selection currently uses bs.TxCount (which
includes non-benchmark traffic) to find firstTxTime/lastTxTime; change the check
to use the count of benchmark-matched transactions for each block instead (e.g.,
compute matchedCount from a field like bs.MatchedTxCount or by filtering
bs.TxResults for the benchmark marker/flag), skip blocks when matchedCount == 0,
and use those matched blocks to set firstTxTime/lastTxTime and compute
blockDuration/totalDuration (update the condition that referenced bs.TxCount and
any related logic in the loop that sets firstTxTime, lastTxTime, blockDuration
and totalDuration).

In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 886-890: The code only sets memiavl.enable when c.opts.MemIAVL is
true, causing defaults to leak when false; update the conditional around
c.opts.MemIAVL to explicitly call setTOMLValue(appPath, "memiavl", "enable",
"true") when true and call setTOMLValue(appPath, "memiavl", "enable", "false")
when false (preserve error handling/returns), referencing c.opts.MemIAVL,
setTOMLValue and appPath so the configuration is deterministic for both
branches.

---

Nitpick comments:
In `@integration-tests/e2e/benchmark/report.go`:
- Around line 118-137: The code relies on os.ReadDir(dir) ordering which is
filesystem-dependent; make the output deterministic by sorting the entries slice
by entry.Name() right after calling os.ReadDir (before the for _, entry := range
entries loop) using sort.Slice or sort.SliceStable, so the subsequent
reads/unmarshals produce a stable results order (refer to the entries variable
and the loop that reads files into results).

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 753bfca and 6bbb4a3.

📒 Files selected for processing (4)
  • integration-tests/e2e/benchmark/collector.go
  • integration-tests/e2e/benchmark/load.go
  • integration-tests/e2e/benchmark/report.go
  • integration-tests/e2e/cluster/cluster.go

@codecov
Copy link

codecov bot commented Feb 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 37.99%. Comparing base (d64a135) to head (48b8195).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #481   +/-   ##
=======================================
  Coverage   37.99%   37.99%           
=======================================
  Files         325      325           
  Lines       30762    30762           
=======================================
  Hits        11689    11689           
  Misses      17189    17189           
  Partials     1884     1884           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
integration-tests/e2e/cluster/cluster.go (1)

938-942: ⚠️ Potential issue | 🟠 Major

memiavl.enable is still only set when MemIAVL is true — baseline runs rely on upstream defaults.

The code only writes memiavl.enable=true when the flag is set, but never explicitly writes false. For benchmark correctness, both variants must be deterministically configured. If a node's default app.toml ships with memiavl.enable=true, baseline and mempool-only runs would silently use MemIAVL, invalidating comparisons.

Proposed fix
-		if c.opts.MemIAVL {
-			if err := setTOMLValue(appPath, "memiavl", "enable", "true"); err != nil {
-				return err
-			}
+		memIAVLEnable := "false"
+		if c.opts.MemIAVL {
+			memIAVLEnable = "true"
+		}
+		if err := setTOMLValue(appPath, "memiavl", "enable", memIAVLEnable); err != nil {
+			return err
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 938 - 942, The code
only sets memiavl.enable when c.opts.MemIAVL is true, so baseline runs may
inherit an upstream default; change the block so setTOMLValue(appPath,
"memiavl", "enable", ...) is always called with "true" if c.opts.MemIAVL else
"false" (i.e., unconditionally write the explicit setting), and propagate/return
any error from setTOMLValue; update the branch around c.opts.MemIAVL to call
setTOMLValue regardless of its value.
integration-tests/e2e/benchmark/report.go (1)

83-92: ⚠️ Potential issue | 🟡 Minor

Baseline row detection compares Variant, not identity — suppresses deltas for all rows sharing the baseline variant.

If multiple rows have VariantBaseline (e.g., loaded baseline + a re-run), all of them will show "-" instead of actual deltas. Compare the specific row, not just the variant.

Proposed fix
-	for _, r := range results {
+	for i := range results {
+		r := results[i]
 		tpsDelta := deltaStr(baseline.TxPerSecond, r.TxPerSecond)
 		p50Delta := deltaStr(baseline.P50LatencyMs, r.P50LatencyMs)
 		p95Delta := deltaStr(baseline.P95LatencyMs, r.P95LatencyMs)
 
-		if r.Config.Variant == baseline.Config.Variant {
+		if &results[i] == baseline {
 			tpsDelta = "-"
 			p50Delta = "-"
 			p95Delta = "-"
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/report.go` around lines 83 - 92, The code
currently suppresses deltas by comparing r.Config.Variant to
baseline.Config.Variant, which hides deltas for any row sharing the same
variant; change the baseline detection to compare the specific row identity
instead (e.g., compare the result object or its unique run ID) so only the exact
baseline row sets tpsDelta/p50Delta/p95Delta = "-". Update the conditional in
the loop that references results, baseline, and r.Config.Variant to use r ==
baseline or r.RunID/ResultID equality (whichever unique identifier exists) to
locate the baseline row.
🧹 Nitpick comments (5)
integration-tests/e2e/benchmark/load.go (2)

346-357: sequencePattern still produces invalid sequences for count < 3, but the caller guards against it.

The guard in OutOfOrderLoad (line 160-162) ensures count >= 3, so this is safe in practice. Adding a defensive comment or panic inside sequencePattern itself for count < 3 would prevent future misuse if someone calls it directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/load.go` around lines 346 - 357, The
sequencePattern function can return invalid slices when called with count < 3;
add a defensive check at the start of sequencePattern to validate count (e.g.,
panic or return an explicit error/empty slice) and document the requirement in a
comment; reference the function name sequencePattern and note the existing
caller OutOfOrderLoad enforces count >= 3 but sequencePattern should enforce or
clearly document this precondition to avoid future misuse.

30-88: Significant structural duplication across load functions.

BurstLoad, SequentialLoad, and SingleNodeLoad share ~80% of their body (NodeCount guard, mutex/wg setup, goroutine-per-account with ctx check, submission recording). The only meaningful differences are how viaNode is computed and whether submissions are sequential or concurrent within an account. Consider extracting a common runLoad helper parameterized by a viaNodeFn and concurrency mode to reduce maintenance burden.

Not blocking for this PR — the duplication is manageable at the current scale.

Also applies to: 94-152, 220-277

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/load.go` around lines 30 - 88, BurstLoad,
SequentialLoad, and SingleNodeLoad duplicate most logic (NodeCount guard,
mutex/wg setup, goroutine-per-account, ctx check, submission recording);
refactor by extracting a shared helper runLoad that takes a viaNodeFn (func(i
int) int) and a concurrency mode (sequential vs concurrent within an account),
then have BurstLoad/SequentialLoad/SingleNodeLoad call runLoad with the
appropriate viaNodeFn and mode; ensure runLoad performs the NodeCount check,
sets up mu/wg, iterates cluster.AccountNames(), computes seq using
meta.Sequence+uint64(i), calls cluster.SendBankTxWithSequence, and appends
TxSubmission/Errors to result so callers keep the same behavior and signatures.
integration-tests/e2e/cluster/cluster.go (2)

1223-1230: Helpers are fine, but base64Decode is a trivial one-liner wrapper.

base64Decode just calls base64.StdEncoding.DecodeString. Inlining the call at the one usage site in QueryBlock (line 333) would be simpler. Minor nit — not blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 1223 - 1230, The
helper base64Decode is a trivial one-liner wrapper around
base64.StdEncoding.DecodeString; remove the base64Decode function and replace
its single usage in QueryBlock with a direct call to
base64.StdEncoding.DecodeString, updating any error handling there accordingly
(ensure QueryBlock imports encoding/base64 is available and adjust variable
names if needed).

628-678: SendMoveExecuteJSONWithGas is largely duplicated from MoveExecuteJSONWithSequence.

The only difference is that gas estimation is skipped and gasLimit is passed directly. Consider extracting the shared CLI-invocation logic into a private helper to reduce the ~50 lines of duplication. Not blocking, but worth addressing to keep maintenance manageable as the CLI surface grows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 628 - 678,
SendMoveExecuteJSONWithGas duplicates most of MoveExecuteJSONWithSequence;
factor the shared CLI invocation, JSON marshalling, c.exec call,
parseTxResultFromOutput and logging into a private helper (e.g.,
execMoveExecuteJSON or sendMoveExecuteJSONCommon) that accepts parameters for
moduleAddress,moduleName,functionName,typeArgs,args,fromName,accountNumber,sequence,gasLimit,viaNode
and any flags like "--offline" vs estimation behavior; have both
SendMoveExecuteJSONWithGas and MoveExecuteJSONWithSequence call that helper
(preserving their public signatures) and only keep the differing gas-handling
logic in each caller. Ensure the new helper uses c.exec, parseTxResultFromOutput
and the same log format as in the original functions.
integration-tests/e2e/benchmark/README.md (1)

103-111: Fenced code block missing language identifier.

Static analysis (MD040) flags the code block at line 103. Adding text as the language identifier would satisfy the linter.

Proposed fix
-```
+```text
 benchmark/
   config.go          Variant definitions, BenchConfig, preset constructors
   ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/README.md` around lines 103 - 111, The fenced
code block starting with the benchmark/ listing is missing a language identifier
and triggers MD040; update the fence that opens the block (the triple backticks
before "benchmark/") to include the language identifier "text" (i.e., change ```
to ```text) so the linter accepts it while preserving the block contents (the
section that lists config.go, load.go, collector.go, report.go,
benchmark_test.go, results/).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/e2e/benchmark/benchmark_test.go`:
- Around line 31-47: setupCluster currently starts the cluster with
cluster.Start() but does not register a cleanup, so if require.NoError(t,
cluster.WaitForReady(...)) fails the test will Goexit and leave node processes
running; fix by calling t.Cleanup(func(){ cluster.Close(context.Background()) })
(or t.Cleanup(func(){ _ = cluster.Close(ctx) }) if context is available)
immediately after successful e2e.NewCluster and before calling
cluster.Start/WaitForReady, ensuring Close is always invoked on test teardown,
and then remove manual defer cluster.Close() calls from runBenchmark,
TestBenchmarkMemIAVLMoveExec, and TestBenchmarkGossipPropagation.

In `@integration-tests/e2e/benchmark/report.go`:
- Around line 12-28: WriteResult can produce a hidden ".json" file when
result.Config.Label is empty; after computing safe :=
strings.NewReplacer(...).Replace(result.Config.Label) add a guard that if safe
== "" you assign a deterministic fallback (e.g. "result-"+timestamp or
"result-"+UnixNano) to avoid collisions, then continue to build path using that
safe value; reference symbols: function WriteResult, variable safe, and
result.Config.Label to locate and update the logic.

---

Duplicate comments:
In `@integration-tests/e2e/benchmark/report.go`:
- Around line 83-92: The code currently suppresses deltas by comparing
r.Config.Variant to baseline.Config.Variant, which hides deltas for any row
sharing the same variant; change the baseline detection to compare the specific
row identity instead (e.g., compare the result object or its unique run ID) so
only the exact baseline row sets tpsDelta/p50Delta/p95Delta = "-". Update the
conditional in the loop that references results, baseline, and r.Config.Variant
to use r == baseline or r.RunID/ResultID equality (whichever unique identifier
exists) to locate the baseline row.

In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 938-942: The code only sets memiavl.enable when c.opts.MemIAVL is
true, so baseline runs may inherit an upstream default; change the block so
setTOMLValue(appPath, "memiavl", "enable", ...) is always called with "true" if
c.opts.MemIAVL else "false" (i.e., unconditionally write the explicit setting),
and propagate/return any error from setTOMLValue; update the branch around
c.opts.MemIAVL to call setTOMLValue regardless of its value.

---

Nitpick comments:
In `@integration-tests/e2e/benchmark/load.go`:
- Around line 346-357: The sequencePattern function can return invalid slices
when called with count < 3; add a defensive check at the start of
sequencePattern to validate count (e.g., panic or return an explicit error/empty
slice) and document the requirement in a comment; reference the function name
sequencePattern and note the existing caller OutOfOrderLoad enforces count >= 3
but sequencePattern should enforce or clearly document this precondition to
avoid future misuse.
- Around line 30-88: BurstLoad, SequentialLoad, and SingleNodeLoad duplicate
most logic (NodeCount guard, mutex/wg setup, goroutine-per-account, ctx check,
submission recording); refactor by extracting a shared helper runLoad that takes
a viaNodeFn (func(i int) int) and a concurrency mode (sequential vs concurrent
within an account), then have BurstLoad/SequentialLoad/SingleNodeLoad call
runLoad with the appropriate viaNodeFn and mode; ensure runLoad performs the
NodeCount check, sets up mu/wg, iterates cluster.AccountNames(), computes seq
using meta.Sequence+uint64(i), calls cluster.SendBankTxWithSequence, and appends
TxSubmission/Errors to result so callers keep the same behavior and signatures.

In `@integration-tests/e2e/benchmark/README.md`:
- Around line 103-111: The fenced code block starting with the benchmark/
listing is missing a language identifier and triggers MD040; update the fence
that opens the block (the triple backticks before "benchmark/") to include the
language identifier "text" (i.e., change ``` to ```text) so the linter accepts
it while preserving the block contents (the section that lists config.go,
load.go, collector.go, report.go, benchmark_test.go, results/).

In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 1223-1230: The helper base64Decode is a trivial one-liner wrapper
around base64.StdEncoding.DecodeString; remove the base64Decode function and
replace its single usage in QueryBlock with a direct call to
base64.StdEncoding.DecodeString, updating any error handling there accordingly
(ensure QueryBlock imports encoding/base64 is available and adjust variable
names if needed).
- Around line 628-678: SendMoveExecuteJSONWithGas duplicates most of
MoveExecuteJSONWithSequence; factor the shared CLI invocation, JSON marshalling,
c.exec call, parseTxResultFromOutput and logging into a private helper (e.g.,
execMoveExecuteJSON or sendMoveExecuteJSONCommon) that accepts parameters for
moduleAddress,moduleName,functionName,typeArgs,args,fromName,accountNumber,sequence,gasLimit,viaNode
and any flags like "--offline" vs estimation behavior; have both
SendMoveExecuteJSONWithGas and MoveExecuteJSONWithSequence call that helper
(preserving their public signatures) and only keep the differing gas-handling
logic in each caller. Ensure the new helper uses c.exec, parseTxResultFromOutput
and the same log format as in the original functions.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 6bbb4a3 and 7f1f614.

📒 Files selected for processing (5)
  • integration-tests/e2e/benchmark/README.md
  • integration-tests/e2e/benchmark/benchmark_test.go
  • integration-tests/e2e/benchmark/load.go
  • integration-tests/e2e/benchmark/report.go
  • integration-tests/e2e/cluster/cluster.go

@traviolus
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
integration-tests/e2e/cluster/cluster.go (1)

1172-1176: ⚠️ Potential issue | 🟠 Major

Set memiavl.enable explicitly for both benchmark variants.

Only writing "true" leaves the "false" case dependent on upstream defaults, which can invalidate baseline vs variant comparisons.

💡 Proposed fix
-		if c.opts.MemIAVL {
-			if err := setTOMLValue(appPath, "memiavl", "enable", "true"); err != nil {
-				return err
-			}
-		}
+		memIAVLEnable := "false"
+		if c.opts.MemIAVL {
+			memIAVLEnable = "true"
+		}
+		if err := setTOMLValue(appPath, "memiavl", "enable", memIAVLEnable); err != nil {
+			return err
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 1172 - 1176, The
current code only sets memiavl.enable to "true" when c.opts.MemIAVL is set,
leaving the "false" case to upstream defaults; update the logic around the
setTOMLValue call so that memiavl.enable is explicitly written for both cases
(use setTOMLValue(appPath, "memiavl", "enable", "true") when c.opts.MemIAVL is
true, and setTOMLValue(appPath, "memiavl", "enable", "false") when it is false)
so both benchmark variants have an explicit memiavl.enable value; keep using the
existing appPath and setTOMLValue helper.
integration-tests/e2e/benchmark/benchmark_test.go (1)

31-49: ⚠️ Potential issue | 🟠 Major

Register cleanup inside setupCluster to prevent orphaned node processes on setup failure.

If Start succeeds but WaitForReady fails, require.NoError exits before callers can run defer cluster.Close().

💡 Proposed fix
 func setupCluster(t *testing.T, ctx context.Context, cfg BenchConfig) *e2e.Cluster {
 	t.Helper()

 	cluster, err := e2e.NewCluster(ctx, t, e2e.ClusterOptions{
 		...
 	})
 	require.NoError(t, err)
+	t.Cleanup(cluster.Close)

 	require.NoError(t, cluster.Start(ctx))
 	require.NoError(t, cluster.WaitForReady(ctx, clusterReadyTimeout))

 	return cluster
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/benchmark_test.go` around lines 31 - 49,
After successfully starting the cluster in setupCluster, register a cleanup to
always close the cluster so orphaned processes aren't left if WaitForReady (or
any later require) fails: after require.NoError(t, cluster.Start(ctx)) call
t.Cleanup with a function that calls cluster.Close(ctx) (or cluster.Close and
ignore its error) so the cluster is cleaned up even when setupCluster exits
early; ensure this registration happens before calling cluster.WaitForReady.
🧹 Nitpick comments (1)
integration-tests/e2e/cluster/cluster.go (1)

366-367: Return a copy from ValidatorAddresses to protect internal state.

Returning the backing slice allows external mutation of cluster internals.

💡 Proposed fix
 func (c *Cluster) ValidatorAddresses() []string {
-	return c.valAddresses
+	return append([]string(nil), c.valAddresses...)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 366 - 367, The
ValidatorAddresses method currently returns the backing slice c.valAddresses
allowing callers to mutate internal state; change Cluster.ValidatorAddresses to
return a defensive copy by allocating a new slice of the same length, copying
c.valAddresses into it (using copy) and returning that new slice so the original
c.valAddresses remains unmodified.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/e2e/benchmark/load.go`:
- Around line 612-621: The pre-sign loop that calls GenerateSignedBankTx keeps
running after context cancellation; modify the loop(s) (the one iterating i <
cfg.TxPerAccount and the similar loop at the other location) to check ctx.Done()
each iteration and break/return immediately when the context is canceled (e.g.,
use select { case <-ctx.Done(): break/return; default: } before calling
GenerateSignedBankTx) so no further work or logging happens after cancellation.

In `@integration-tests/e2e/benchmark/README.md`:
- Around line 107-113: The table is missing the "Included" column referenced by
the sentence about tx drops; update the table header and each row in the block
containing "Config | Variant | TPS ..." so it matches other tables (add an
"Included" column between "Peak Mempool" and the rest or where consistent), then
populate the column values (e.g., set the clist/iavl/seq-move-exec row to show
"2705/5000" to support the claim and add corresponding included counts for
proxy+priority and proxy+priority/memiavl rows) so the text "CList only included
2705/5000 txs..." is backed by the table. Ensure column alignment and any header
separators match the existing markdown table format used elsewhere in the
README.

In `@integration-tests/e2e/cluster/cluster.go`:
- Line 1119: The call to os.WriteFile that writes genesis data using
os.WriteFile(genesisPath, out, 0o644) is flagged by lint (G306) for overly
permissive file mode; change the mode argument to 0o600 (i.e.,
os.WriteFile(genesisPath, out, 0o600)) to restrict permissions, or if
intentional, add a clear `#nosec` comment next to the os.WriteFile call to justify
the exception; update the call referencing genesisPath and out accordingly.
- Around line 1102-1113: The patch that sets max_gas silently no-ops if neither
genesis["consensus"] nor genesis["consensus_params"] exist; update the logic in
cluster.go to detect whether max_gas was actually applied and fail loudly if
not: introduce a boolean (e.g., modified) and set it to true when you assign
block["max_gas"] in either the consensus or consensus_params branches, then
after those checks return an error or call the test failure helper (so the test
run fails) if modified is false, referencing the existing genesis map and the
"consensus"/"consensus_params" keys and the block["max_gas"] assignment spots to
locate where to add the flag and the failure path.

---

Duplicate comments:
In `@integration-tests/e2e/benchmark/benchmark_test.go`:
- Around line 31-49: After successfully starting the cluster in setupCluster,
register a cleanup to always close the cluster so orphaned processes aren't left
if WaitForReady (or any later require) fails: after require.NoError(t,
cluster.Start(ctx)) call t.Cleanup with a function that calls cluster.Close(ctx)
(or cluster.Close and ignore its error) so the cluster is cleaned up even when
setupCluster exits early; ensure this registration happens before calling
cluster.WaitForReady.

In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 1172-1176: The current code only sets memiavl.enable to "true"
when c.opts.MemIAVL is set, leaving the "false" case to upstream defaults;
update the logic around the setTOMLValue call so that memiavl.enable is
explicitly written for both cases (use setTOMLValue(appPath, "memiavl",
"enable", "true") when c.opts.MemIAVL is true, and setTOMLValue(appPath,
"memiavl", "enable", "false") when it is false) so both benchmark variants have
an explicit memiavl.enable value; keep using the existing appPath and
setTOMLValue helper.

---

Nitpick comments:
In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 366-367: The ValidatorAddresses method currently returns the
backing slice c.valAddresses allowing callers to mutate internal state; change
Cluster.ValidatorAddresses to return a defensive copy by allocating a new slice
of the same length, copying c.valAddresses into it (using copy) and returning
that new slice so the original c.valAddresses remains unmodified.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ecc99e73-b5e1-4eb4-93cb-98f0c146a704

📥 Commits

Reviewing files that changed from the base of the PR and between 7f1f614 and c819956.

⛔ Files ignored due to path filters (1)
  • integration-tests/e2e/benchmark/move-bench/Move.toml is excluded by !**/*.toml
📒 Files selected for processing (8)
  • Makefile
  • integration-tests/e2e/benchmark/README.md
  • integration-tests/e2e/benchmark/benchmark_test.go
  • integration-tests/e2e/benchmark/config.go
  • integration-tests/e2e/benchmark/load.go
  • integration-tests/e2e/benchmark/move-bench/sources/BenchHeavyState.move
  • integration-tests/e2e/cluster.go
  • integration-tests/e2e/cluster/cluster.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • Makefile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
integration-tests/e2e/cluster/cluster.go (2)

1172-1176: ⚠️ Potential issue | 🟠 Major

Set memiavl.enable deterministically for both variants.

At Line 1172, config is only written when enabled. For benchmark parity, write explicit "false" when MemIAVL is off.

💡 Proposed fix
-		if c.opts.MemIAVL {
-			if err := setTOMLValue(appPath, "memiavl", "enable", "true"); err != nil {
-				return err
-			}
-		}
+		memIAVLEnable := "false"
+		if c.opts.MemIAVL {
+			memIAVLEnable = "true"
+		}
+		if err := setTOMLValue(appPath, "memiavl", "enable", memIAVLEnable); err != nil {
+			return err
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 1172 - 1176, The code
only calls setTOMLValue(appPath, "memiavl", "enable", "true") when
c.opts.MemIAVL is true, so the TOML key is absent for the false variant; always
write the key explicitly by calling setTOMLValue(appPath, "memiavl", "enable",
"true") when c.opts.MemIAVL is true and setTOMLValue(appPath, "memiavl",
"enable", "false") when c.opts.MemIAVL is false (use the existing c.opts.MemIAVL
check to choose which literal to write), ensuring deterministic memiavl.enable
across both variants.

1102-1113: ⚠️ Potential issue | 🟠 Major

Fail when max_gas is not actually patched.

At Line 1102 and Line 1109, both candidate paths are optional; if neither exists, this silently succeeds and benchmarks may run with unexpected block gas settings.

💡 Proposed fix
 func patchGenesisBlockGas(genesisPath string, maxGas int64) error {
@@
 	gasStr := strconv.FormatInt(maxGas, 10)
+	patched := false
 
 	if cp, ok := genesis["consensus"].(map[string]interface{}); ok {
 		if params, ok := cp["params"].(map[string]interface{}); ok {
 			if block, ok := params["block"].(map[string]interface{}); ok {
 				block["max_gas"] = gasStr
+				patched = true
 			}
 		}
 	}
 	if cp, ok := genesis["consensus_params"].(map[string]interface{}); ok {
 		if block, ok := cp["block"].(map[string]interface{}); ok {
 			block["max_gas"] = gasStr
+			patched = true
 		}
 	}
+	if !patched {
+		return fmt.Errorf("max_gas path not found in genesis: %s", genesisPath)
+	}
@@
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 1102 - 1113, The
current patching logic for genesis["consensus"] and genesis["consensus_params"]
may silently do nothing if neither optional path exists; add a check that
ensures max_gas was actually set by introducing a boolean (e.g., patched :=
false) and set it to true inside the branches where block["max_gas"] = gasStr is
assigned (referencing cp, params, block, gasStr); after both attempts, if
patched is still false return or panic with a clear error indicating neither
consensus nor consensus_params contained a block entry to modify so the test
fails fast rather than proceeding with unpatched gas.
🧹 Nitpick comments (1)
integration-tests/e2e/cluster/cluster.go (1)

366-368: Return a defensive copy from ValidatorAddresses().

At Line 367, returning the internal slice allows callers to mutate cluster state accidentally.

♻️ Proposed refactor
 func (c *Cluster) ValidatorAddresses() []string {
-	return c.valAddresses
+	out := make([]string, len(c.valAddresses))
+	copy(out, c.valAddresses)
+	return out
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 366 - 368,
ValidatorAddresses currently returns the internal slice c.valAddresses which
allows external mutation; change ValidatorAddresses() to return a defensive copy
of the slice (allocate a new slice of len(c.valAddresses), copy the contents,
and return the new slice) so callers cannot modify the Cluster's internal state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 1172-1176: The code only calls setTOMLValue(appPath, "memiavl",
"enable", "true") when c.opts.MemIAVL is true, so the TOML key is absent for the
false variant; always write the key explicitly by calling setTOMLValue(appPath,
"memiavl", "enable", "true") when c.opts.MemIAVL is true and
setTOMLValue(appPath, "memiavl", "enable", "false") when c.opts.MemIAVL is false
(use the existing c.opts.MemIAVL check to choose which literal to write),
ensuring deterministic memiavl.enable across both variants.
- Around line 1102-1113: The current patching logic for genesis["consensus"] and
genesis["consensus_params"] may silently do nothing if neither optional path
exists; add a check that ensures max_gas was actually set by introducing a
boolean (e.g., patched := false) and set it to true inside the branches where
block["max_gas"] = gasStr is assigned (referencing cp, params, block, gasStr);
after both attempts, if patched is still false return or panic with a clear
error indicating neither consensus nor consensus_params contained a block entry
to modify so the test fails fast rather than proceeding with unpatched gas.

---

Nitpick comments:
In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 366-368: ValidatorAddresses currently returns the internal slice
c.valAddresses which allows external mutation; change ValidatorAddresses() to
return a defensive copy of the slice (allocate a new slice of
len(c.valAddresses), copy the contents, and return the new slice) so callers
cannot modify the Cluster's internal state.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b64c0786-b9c1-4b8b-ae71-0b0344d29487

📥 Commits

Reviewing files that changed from the base of the PR and between c819956 and e2e9045.

📒 Files selected for processing (1)
  • integration-tests/e2e/cluster/cluster.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
integration-tests/e2e/benchmark/benchmark_test.go (3)

287-292: Redundant defer cluster.Close() calls throughout Move exec tests.

Similar to runBenchmark, these tests have redundant defer cluster.Close() calls since setupCluster already registers t.Cleanup(cluster.Close). This pattern appears in multiple tests:

  • Line 288
  • Line 317, 332 (TestBenchmarkSeqComparisonMoveExec subtests)
  • Line 360, 375 (TestBenchmarkBurstComparisonMoveExec subtests)
  • And several others

The redundancy is harmless due to Close() being idempotent, but removing these would clean up the code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/benchmark_test.go` around lines 287 - 292,
Remove redundant explicit defer calls to cluster.Close() in Move exec benchmark
tests because setupCluster already registers t.Cleanup(cluster.Close). For each
test using setupCluster (e.g., where variables and functions like setupCluster,
setupMoveExecLoad, moveExecSequential, runBenchmarkWithCluster are used), delete
the line defer cluster.Close() and rely on setupCluster's t.Cleanup registration
to close the cluster; ensure no other callers require an explicit defer and keep
the single cleanup mechanism provided by setupCluster.

209-241: Results accumulation across subtests may be unreliable.

The results slice is declared in the parent test and appended to from subtests. In Go's t.Run, subtests run sequentially by default (unless t.Parallel() is called), so this works correctly here. However, if subtests were ever run in parallel, this would be a data race.

The current code is correct, but consider adding a comment noting this dependency:

var results []BenchResult // accumulated sequentially across subtests

// load baseline from JSON...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/benchmark_test.go` around lines 209 - 241,
The results slice in TestBenchmarkSeqComparison is appended to from subtests and
relies on sequential execution; add a brief comment above the declaration of
results (the var results []BenchResult in TestBenchmarkSeqComparison) stating
that results are accumulated sequentially across subtests and that running
subtests in parallel would cause a data race (so do not call t.Parallel() in
these subtests).

107-115: Redundant defer cluster.Close() call.

The setupCluster function already registers t.Cleanup(cluster.Close), and since Cluster.Close() is idempotent, this defer is safe but unnecessary.

 func runBenchmark(t *testing.T, cfg BenchConfig, loadFn func(ctx context.Context, cluster *e2e.Cluster, cfg BenchConfig, metas map[string]e2e.AccountMeta) LoadResult) BenchResult {
 	t.Helper()
 	ctx := context.Background()

 	cluster := setupCluster(t, ctx, cfg)
-	defer cluster.Close()

 	return runBenchmarkWithCluster(t, ctx, cluster, cfg, loadFn)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/benchmark_test.go` around lines 107 - 115,
The function runBenchmark contains an unnecessary defer cluster.Close() because
setupCluster already registers t.Cleanup(cluster.Close); remove the redundant
defer from runBenchmark to avoid duplicate cleanup calls—locate the cluster
variable in runBenchmark, delete the line "defer cluster.Close()", and leave the
call to runBenchmarkWithCluster(t, ctx, cluster, cfg, loadFn) intact; ensure
Cluster.Close remains idempotent and no other cleanup logic is removed.
integration-tests/e2e/cluster/cluster.go (1)

1754-1760: Consider simplifying the base64 round-trip.

The code decodes txBase64 to txBytes then immediately re-encodes it back to base64 for the request body. Since txBase64 is already base64-encoded, you could use it directly:

-	txBytes, err := base64.StdEncoding.DecodeString(txBase64)
-	if err != nil {
-		return TxResult{}, fmt.Errorf("decode tx base64: %w", err)
-	}
-
-	reqBody := fmt.Sprintf(`{"jsonrpc":"2.0","id":1,"method":"broadcast_tx_sync","params":{"tx":"%s"}}`, base64.StdEncoding.EncodeToString(txBytes))
+	reqBody := fmt.Sprintf(`{"jsonrpc":"2.0","id":1,"method":"broadcast_tx_sync","params":{"tx":"%s"}}`, txBase64)

The current approach does provide implicit validation that the input is valid base64, so keeping it is defensible if that validation is desired.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 1754 - 1760, The code
decodes txBase64 to txBytes then immediately re-encodes it into the request body
(variables txBase64, txBytes, reqBody); simplify by using the original txBase64
directly in the JSON-RPC payload (i.e., insert txBase64 into fmt.Sprintf) or, if
you want to retain validation, decode into txBytes but use the original txBase64
for reqBody instead of re-encoding; update error handling accordingly (remove
unnecessary decode/encode or keep decode only for validation) and ensure reqBody
uses the validated base64 string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@integration-tests/e2e/benchmark/benchmark_test.go`:
- Around line 287-292: Remove redundant explicit defer calls to cluster.Close()
in Move exec benchmark tests because setupCluster already registers
t.Cleanup(cluster.Close). For each test using setupCluster (e.g., where
variables and functions like setupCluster, setupMoveExecLoad,
moveExecSequential, runBenchmarkWithCluster are used), delete the line defer
cluster.Close() and rely on setupCluster's t.Cleanup registration to close the
cluster; ensure no other callers require an explicit defer and keep the single
cleanup mechanism provided by setupCluster.
- Around line 209-241: The results slice in TestBenchmarkSeqComparison is
appended to from subtests and relies on sequential execution; add a brief
comment above the declaration of results (the var results []BenchResult in
TestBenchmarkSeqComparison) stating that results are accumulated sequentially
across subtests and that running subtests in parallel would cause a data race
(so do not call t.Parallel() in these subtests).
- Around line 107-115: The function runBenchmark contains an unnecessary defer
cluster.Close() because setupCluster already registers t.Cleanup(cluster.Close);
remove the redundant defer from runBenchmark to avoid duplicate cleanup
calls—locate the cluster variable in runBenchmark, delete the line "defer
cluster.Close()", and leave the call to runBenchmarkWithCluster(t, ctx, cluster,
cfg, loadFn) intact; ensure Cluster.Close remains idempotent and no other
cleanup logic is removed.

In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 1754-1760: The code decodes txBase64 to txBytes then immediately
re-encodes it into the request body (variables txBase64, txBytes, reqBody);
simplify by using the original txBase64 directly in the JSON-RPC payload (i.e.,
insert txBase64 into fmt.Sprintf) or, if you want to retain validation, decode
into txBytes but use the original txBase64 for reqBody instead of re-encoding;
update error handling accordingly (remove unnecessary decode/encode or keep
decode only for validation) and ensure reqBody uses the validated base64 string.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1f40aba7-def7-43ae-ada7-009175aa768c

📥 Commits

Reviewing files that changed from the base of the PR and between e2e9045 and 48b8195.

📒 Files selected for processing (4)
  • integration-tests/e2e/benchmark/README.md
  • integration-tests/e2e/benchmark/benchmark_test.go
  • integration-tests/e2e/benchmark/report.go
  • integration-tests/e2e/cluster/cluster.go

@traviolus traviolus requested a review from beer-1 March 16, 2026 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant