feat(e2e): add benchmark suite for mempool, and state db by traviolus · Pull Request #149 · initia-labs/miniwasm

traviolus · 2026-03-19T09:16:00Z

Description

Closes: #XXXX

Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

included the correct type prefix in the PR title, you can find examples of the prefixes below:

confirmed ! in the type prefix if API or client breaking change
targeted the correct branch
provided a link to the relevant issue or specification
reviewed "Files changed" and left comments if necessary
included the necessary unit and integration tests
updated the relevant documentation or specification, including comments for documenting Go code
confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

confirmed the correct type prefix in the PR title
confirmed all author checklist items have been addressed
reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

Summary by CodeRabbit

Tests
- Added a comprehensive end-to-end benchmarking suite: cluster harness, many benchmark scenarios, varied load patterns (sequential, burst, pre-signed, queued/gap), metrics collection, peak mempool polling, and result persistence/comparison.
New Features
- Added a lightweight benchmark smart contract for controlled Wasm state workloads, plus load generators and pre-signing workflows (bank/wasm flows, wasm exec variants).
Documentation
- Added a detailed benchmarking README with topology, scenarios, execution phases, metrics, and example result tables.
Chores
- Introduced integration test module configuration and supporting utilities for ports/TOML handling.

coderabbitai · 2026-03-19T09:16:16Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a reproducible end-to-end benchmarking suite: a CosmWasm benchmark contract, a multi-node cluster harness, multiple load generators (including pre-signed flows), result collection/reporting utilities, and many benchmark tests comparing mempool, state DB, queued behavior, gossip, and Wasm execution scenarios.

Changes

Cohort / File(s)	Summary
Documentation `integration-tests/e2e/benchmark/README.md`	New README describing topology, benchmark groups, three-phase workflow, metrics, workloads, expected outcomes, and example result tables.
CosmWasm contract `integration-tests/e2e/benchmark/bench_heavy_state/Cargo.toml`, `integration-tests/e2e/benchmark/bench_heavy_state/src/{lib.rs,msg.rs,state.rs,error.rs}`	New `bench_heavy_state` contract providing `WriteMixed` exec, per-sender nonces, shared/local state maps, query for nonce, and optimized release profile.
Benchmark tests (orchestration) `integration-tests/e2e/benchmark/benchmark_test.go`	New build-tagged Go benchmark suite that boots clusters, prepares/instantiates wasm, runs many benchmark scenarios (baseline, comparisons, pre-signed, queued, gossip, Wasm variants), and persists JSON results.
Load generators & signing `integration-tests/e2e/benchmark/load.go`	Multiple load strategies (burst, sequential, out-of-order, single-node, queued-flood/gap), pre-signing helpers for bank/wasm txs, pre-signed submission flows, wasm load factories, and warmup logic.
Benchmark config & collection `integration-tests/e2e/benchmark/config.go`, `integration-tests/e2e/benchmark/collector.go`	BenchConfig/Variant types and constructors; BlockStat/BenchResult types; MempoolPoller; CollectResults, WaitForAllIncluded, CollectInitialMetas, and latency/percentile helpers.
Reporting utilities `integration-tests/e2e/benchmark/report.go`	Result persistence (JSON), directory loading, comparison and improvement tables, baseline filtering and printing helpers.
Cluster harness & utilities `integration-tests/e2e/cluster/cluster.go`, `integration-tests/e2e/cluster/ports.go`, `integration-tests/e2e/cluster/toml.go`	New cluster harness: bootstrap/start nodes, port allocation, TOML patching, genesis/config updates, REST/CLI wrappers for bank/wasm actions, pre-sign/generate helpers, block/account/tx query helpers, and broadcast for pre-signed txs.
Module `integration-tests/go.mod`	New Go module for integration-tests with dependencies and multiple `replace` directives for pinned/forked components.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Runner
    participant Cluster as Cluster Harness
    participant Nodes as Node(s)
    participant Load as Load Generator
    participant Collector as Collector
    participant Reporter as Reporter

    Test->>Cluster: NewCluster() / Start()
    Cluster->>Nodes: initialize & start nodes
    Test->>Load: start load (Burst/Sequential/PreSigned)
    Load->>Nodes: submit txs (BankSend / WasmExec / Broadcast)
    Nodes->>Nodes: mempool accept & gossip
    Load-->>Collector: record submissions & timestamps
    Test->>Cluster: WaitForAllIncluded()
    Collector->>Nodes: query blocks per height
    Collector-->>Reporter: BenchResult (metrics, peak)
    Test->>Reporter: WriteResult / PrintComparisonTable

sequenceDiagram
    participant Test as Test Runner
    participant Cluster as Cluster Harness
    participant Chain as Blockchain
    participant Contract as bench_heavy_state

    Test->>Cluster: StoreWasmContract(wasm)
    Cluster->>Chain: submit store tx
    Chain-->>Cluster: return CodeID
    Test->>Cluster: InstantiateWasmContract(CodeID)
    Chain-->>Cluster: return ContractAddr
    Test->>Cluster: ExecuteWasmContract(contractAddr, WriteMixed)
    Contract->>Contract: load nonce → inc → write SHARED_STATE & LOCAL_STATE
    Contract-->>Chain: execution response (events)
    Chain->>Collector: include tx in block (queried later)

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped to run a mempool race,

I scribbled nonces in shared space,
Nodes hummed, they gossiped, counters climbed,
I counted TPS in careful time,
A tiny benchmark — a rabbit primed!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 53.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(e2e): add benchmark suite for mempool, and state db' directly describes the main change—adding a comprehensive benchmark suite for testing mempool and state database performance across multiple configurations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/e2e-benchmark

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

integration-tests/e2e/benchmark/benchmark_test.go (2)

96-104: Potential double-close of cluster.

setupCluster registers t.Cleanup(cl.Close) at Line 48, but runBenchmark also calls defer cl.Close() at Line 101. Since Cluster.Close() has internal guarding (c.closed flag), this won't cause a crash, but the explicit defer is redundant.

♻️ Remove redundant defer

 func runBenchmark(t *testing.T, cfg BenchConfig, loadFn func(ctx context.Context, cl *cluster.Cluster, cfg BenchConfig, metas map[string]cluster.AccountMeta) LoadResult) BenchResult {
 	t.Helper()
 	ctx := context.Background()

 	cl := setupCluster(t, ctx, cfg)
-	defer cl.Close()

 	return runBenchmarkWithCluster(t, ctx, cl, cfg, loadFn)
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/benchmark_test.go` around lines 96 - 104, The
runBenchmark function redundantly defers cl.Close() even though setupCluster
already registers t.Cleanup(cl.Close); remove the explicit defer cl.Close() from
runBenchmark so cleanup is only handled via setupCluster's t.Cleanup
registration (referencing runBenchmark and setupCluster and Cluster.Close to
locate the code).

184-186: Consider replacing time.Sleep with deterministic waits.

The 3-second sleeps after WaitForMempoolEmpty are intended to ensure the tx is fully committed. This can be flaky in slow CI environments. Consider polling QueryTxResult until success, or waiting for a specific block height advancement.

Also applies to: 199-201

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/benchmark_test.go` around lines 184 - 186,
Replace the ad-hoc time.Sleep(3 * time.Second) after cl.WaitForMempoolEmpty with
a deterministic poll that waits until the transaction is committed: call
cl.QueryTxResult (or the client method that queries tx by hash) in a loop with a
short backoff and timeout (e.g., check every 200–500ms up to a few seconds) and
break when the query returns success/confirmed; do this for the occurrence after
WaitForMempoolEmpty and the same pattern at the other location (around lines
199–201) so the test waits for concrete tx confirmation or block height
advancement instead of a fixed sleep.

integration-tests/e2e/benchmark/README.md (1)

264-269: Add language identifiers to fenced code blocks.

Static analysis flagged these code blocks as missing language specifiers. For the table example, use text or plaintext; for the directory structure, use text as well.

🔧 Suggested fix

Line 264:

-```
+```text
 Config                    | Variant      |     TPS | vs base |   P50ms | vs base |   P95ms | vs base | Peak Mempool

Line 301:

-```
+```text
 benchmark/
   config.go          Variant definitions, BenchConfig, preset constructors

Also applies to: 301-310

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/README.md` around lines 264 - 269, Update the
fenced code blocks in the README so they include a language identifier (use
"text" or "plaintext"); specifically change the triple-backtick markers before
the performance table line starting with "Config                    | Variant"
to ```text and likewise change the triple-backtick before the directory/listing
block that begins with "benchmark/" to ```text so static analysis no longer
flags missing language specifiers.

integration-tests/e2e/cluster/cluster.go (1)

318-321: Consider adding timeout to HTTP client for reliability.

http.DefaultClient has no timeout by default. While acceptable for local E2E tests, a hung RPC endpoint could block indefinitely. Consider using a client with a timeout.
♻️ Optional: Add timeout
var httpClient = &http.Client{
    Timeout: 30 * time.Second,
}
Then replace http.DefaultClient.Do(req) with httpClient.Do(req) throughout.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 318 - 321, The code
uses http.DefaultClient.Do(req) which has no timeout and can hang; create a
package-level or file-level httpClient variable (e.g., httpClient :=
&http.Client{Timeout: 30 * time.Second}) and replace calls to
http.DefaultClient.Do(req) with httpClient.Do(req) (update any functions that
call DefaultClient, e.g., the function containing the resp, err :=
http.DefaultClient.Do(req) line) so HTTP requests fail fast instead of blocking
indefinitely.

integration-tests/e2e/benchmark/report.go (1)

138-153: Consider logging or returning errors from LoadBaselineResultsByLabel.

Silently returning nil when LoadResults fails (Line 141-143) makes it indistinguishable from "no matching baselines found." Callers may want to know if the directory couldn't be read vs. simply being empty.

♻️ Optional: Return error or log

-func LoadBaselineResultsByLabel(dir, label string) []BenchResult {
+func LoadBaselineResultsByLabel(dir, label string) ([]BenchResult, error) {
 	results, err := LoadResults(dir)
 	if err != nil {
-		return nil
+		return nil, err
 	}

 	var matched []BenchResult
 	for _, r := range results {
 		if r.Config.Variant == VariantBaseline && r.Config.Label == label {
 			matched = append(matched, r)
 		}
 	}

-	return matched
+	return matched, nil
 }

Alternatively, keep the current signature but log the error:

 func LoadBaselineResultsByLabel(dir, label string) []BenchResult {
 	results, err := LoadResults(dir)
 	if err != nil {
+		// Log to help diagnose missing baselines vs. read failures
+		fmt.Fprintf(os.Stderr, "warning: failed to load results from %s: %v\n", dir, err)
 		return nil
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/report.go` around lines 138 - 153,
LoadBaselineResultsByLabel currently swallows errors from LoadResults by
returning nil, making failures indistinguishable from “no matches”; change its
signature to return ([]BenchResult, error) so you can propagate the error from
LoadResults (i.e., return nil, err when LoadResults fails) and update all
callers to handle the error, or if you must keep the signature instead log the
LoadResults error before returning an empty slice; locate the function
LoadBaselineResultsByLabel and the call to LoadResults to implement the chosen
fix and adjust any caller code accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/e2e/benchmark/config.go`:
- Line 28: The JSON tag for the struct field TimeoutCommit currently declares
milliseconds but time.Duration (TimeoutCommit) is marshaled as nanoseconds,
causing unit mismatch; fix by making the JSON tag reflect nanoseconds (e.g.,
change `json:"timeout_commit_ms,omitempty"` to
`json:"timeout_commit_ns,omitempty"`) or alternatively implement custom
MarshalJSON/UnmarshalJSON on the type that converts between milliseconds and
time.Duration so the external JSON unit matches the tag; locate the
TimeoutCommit field and update the tag or add custom (Un)MarshalJSON methods for
the type to perform the proper unit conversion.

In `@integration-tests/e2e/cluster/ports.go`:
- Around line 23-30: The computed start port (start := basePort + (index *
stride)) can yield values where start+3 > 65535, producing invalid ports; add an
upper-bound check after computing start in the function that builds and returns
NodePorts (using variables basePort, index, stride, start and the NodePorts
return) and if start < 0 or start+3 > 65535 return a clear error instead of
constructing NodePorts; ensure the function returns that error so callers fail
early rather than attempting to bind invalid ports.

In `@integration-tests/go.mod`:
- Around line 155-157: Update the pinned grpc module version from
google.golang.org/grpc v1.65.0 to the patched release v1.79.3 in go.mod to
address GHSA-p77j-4mvh-x3m3; after updating the version string for the module
dependency, run dependency cleanup (e.g., go mod tidy) and run integration tests
to verify compatibility with the cosmos/relayer code paths that previously
required v1.65.0.

---

Nitpick comments:
In `@integration-tests/e2e/benchmark/benchmark_test.go`:
- Around line 96-104: The runBenchmark function redundantly defers cl.Close()
even though setupCluster already registers t.Cleanup(cl.Close); remove the
explicit defer cl.Close() from runBenchmark so cleanup is only handled via
setupCluster's t.Cleanup registration (referencing runBenchmark and setupCluster
and Cluster.Close to locate the code).
- Around line 184-186: Replace the ad-hoc time.Sleep(3 * time.Second) after
cl.WaitForMempoolEmpty with a deterministic poll that waits until the
transaction is committed: call cl.QueryTxResult (or the client method that
queries tx by hash) in a loop with a short backoff and timeout (e.g., check
every 200–500ms up to a few seconds) and break when the query returns
success/confirmed; do this for the occurrence after WaitForMempoolEmpty and the
same pattern at the other location (around lines 199–201) so the test waits for
concrete tx confirmation or block height advancement instead of a fixed sleep.

In `@integration-tests/e2e/benchmark/README.md`:
- Around line 264-269: Update the fenced code blocks in the README so they
include a language identifier (use "text" or "plaintext"); specifically change
the triple-backtick markers before the performance table line starting with
"Config                    | Variant" to ```text and likewise change the
triple-backtick before the directory/listing block that begins with "benchmark/"
to ```text so static analysis no longer flags missing language specifiers.

In `@integration-tests/e2e/benchmark/report.go`:
- Around line 138-153: LoadBaselineResultsByLabel currently swallows errors from
LoadResults by returning nil, making failures indistinguishable from “no
matches”; change its signature to return ([]BenchResult, error) so you can
propagate the error from LoadResults (i.e., return nil, err when LoadResults
fails) and update all callers to handle the error, or if you must keep the
signature instead log the LoadResults error before returning an empty slice;
locate the function LoadBaselineResultsByLabel and the call to LoadResults to
implement the chosen fix and adjust any caller code accordingly.

In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 318-321: The code uses http.DefaultClient.Do(req) which has no
timeout and can hang; create a package-level or file-level httpClient variable
(e.g., httpClient := &http.Client{Timeout: 30 * time.Second}) and replace calls
to http.DefaultClient.Do(req) with httpClient.Do(req) (update any functions that
call DefaultClient, e.g., the function containing the resp, err :=
http.DefaultClient.Do(req) line) so HTTP requests fail fast instead of blocking
indefinitely.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 48b9cf25-2b87-43c2-92e8-d29a5e83d40e

📥 Commits

Reviewing files that changed from the base of the PR and between 9f89693 and e5eae99.

⛔ Files ignored due to path filters (2)

integration-tests/e2e/benchmark/bench_heavy_state/Cargo.lock is excluded by !**/*.lock
integration-tests/go.sum is excluded by !**/*.sum

📒 Files selected for processing (15)

integration-tests/e2e/benchmark/README.md
integration-tests/e2e/benchmark/bench_heavy_state/Cargo.toml
integration-tests/e2e/benchmark/bench_heavy_state/src/error.rs
integration-tests/e2e/benchmark/bench_heavy_state/src/lib.rs
integration-tests/e2e/benchmark/bench_heavy_state/src/msg.rs
integration-tests/e2e/benchmark/bench_heavy_state/src/state.rs
integration-tests/e2e/benchmark/benchmark_test.go
integration-tests/e2e/benchmark/collector.go
integration-tests/e2e/benchmark/config.go
integration-tests/e2e/benchmark/load.go
integration-tests/e2e/benchmark/report.go
integration-tests/e2e/cluster/cluster.go
integration-tests/e2e/cluster/ports.go
integration-tests/e2e/cluster/toml.go
integration-tests/go.mod

integration-tests/e2e/benchmark/config.go

integration-tests/e2e/cluster/ports.go

integration-tests/go.mod

codecov · 2026-03-19T09:23:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.03%. Comparing base (9f89693) to head (a667d53).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #149   +/-   ##
=======================================
  Coverage   42.03%   42.03%           
=======================================
  Files          49       49           
  Lines        3245     3245           
=======================================
  Hits         1364     1364           
  Misses       1743     1743           
  Partials      138      138

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/e2e/benchmark/benchmark_test.go`:
- Around line 141-146: When WaitForAllIncluded returns an error you currently
call cl.LatestHeight and ignore its error, which can leave endHeight==0 and
cause CollectResults to run with a bogus range; change the fallback to check the
error returned by cl.LatestHeight(ctx, 0) and fail the test (t.Fatalf or return
the error) if LatestHeight also fails, otherwise assign endHeight from the
successful result; reference WaitForAllIncluded, cl.LatestHeight and
CollectResults to locate the logic and ensure the fallback does not silently
swallow errors.
- Line 70: Tests create a MempoolPoller via NewMempoolPoller (e.g., in
runBenchmarkWithCluster, runPreSignedBenchmark, TestBenchmarkGossipPropagation,
TestBenchmarkGapEviction) but never defer poller.Stop(), so the poller goroutine
can leak if require.NoError(WaitForAllIncluded(...)) fails; fix this by calling
defer poller.Stop() immediately after each NewMempoolPoller(...) invocation so
the background sampling goroutine is always cleaned up regardless of test
failures.
- Around line 518-519: SingleNodeLoad is currently targeting the validator (node
0), bypassing gossip; update the call to target an edge fullnode by passing
cfg.ValidatorCount as the target index (i.e., replace the final argument 0 with
cfg.ValidatorCount), or alternatively modify SingleNodeLoad to internally call
edgeNodeIndex(target, cfg) (the same logic used by BurstLoad and SequentialLoad)
so when cfg.ValidatorCount > 0 and < cfg.NodeCount it selects the first edge
fullnode instead of a validator.

In `@integration-tests/e2e/benchmark/README.md`:
- Around line 54-55: Update the README so the stress-case description and
reported metrics match the actual test implementation: either change the
`TestBenchmarkPreSignedSeqWasmExecStress` entry in the table to state "20 accts
x 200 txs, 100 writes/tx" (because the test uses sharedWrites=20 and
localWrites=80) or change the test to use 200 writes/tx; also correct the P50
values quoted near the paragraph around lines 137-140 so they match the table
above, and make the same consistency fix at the second occurrence near lines
344-345; reference the test name `TestBenchmarkPreSignedSeqWasmExecStress` and
the implementation parameters `sharedWrites` and `localWrites` when making the
change.
- Around line 160-165: The README omits copying the compiled Wasm into the
expected filename and path, causing setupWasmExecCluster to fail looking for
bench_heavy_state/contract.wasm; after building with cargo (target
wasm32-unknown-unknown --release), add an instruction to copy or move
target/wasm32-unknown-unknown/release/bench_heavy_state.wasm to
bench_heavy_state/contract.wasm (or create bench_heavy_state/contract.wasm by
renaming the artifact) so the setupWasmExecCluster lookup succeeds; mention this
step in both the earlier snippet and the similar block around lines 346-350
referencing bench_heavy_state and contract.wasm.
- Around line 264-269: The fenced code blocks containing the benchmark tables
(the blocks starting with "Config | Variant | TPS | vs base | P50ms ..." in
integration-tests/e2e/benchmark/README.md) are missing a language tag and
trigger markdownlint MD040; update both fenced blocks to include a language
identifier such as "text" or "plaintext" (e.g., ```text) to silence the lint
warning and keep formatting, and apply the same change to the other fenced block
mentioned (the table at lines 301-310).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c28797e1-0d70-4827-ab29-058d73b152d5

📥 Commits

Reviewing files that changed from the base of the PR and between e5eae99 and 354c9b1.

📒 Files selected for processing (2)

integration-tests/e2e/benchmark/README.md
integration-tests/e2e/benchmark/benchmark_test.go

integration-tests/e2e/benchmark/benchmark_test.go

integration-tests/e2e/benchmark/README.md

coderabbitai

♻️ Duplicate comments (4)

integration-tests/e2e/benchmark/README.md (4)
264-269: ⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks.

Both example output blocks (lines 264-269 and 301-310) are missing language specifiers, triggering markdownlint MD040 warnings. Add text or plaintext to silence the linter.
📝 Proposed fix
-```
+```text
 Config                    | Variant      |     TPS | vs base |   P50ms | vs base |   P95ms | vs base | Peak Mempool
Apply the same fix to the fenced block at lines 301-310.
Also applies to: 301-310
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/README.md` around lines 264 - 269, Add a
language identifier (e.g., "text" or "plaintext") to the fenced code blocks that
contain the benchmark tables in integration-tests/e2e/benchmark/README.md so
markdownlint MD040 is silenced; specifically update the three-backtick openings
for the table blocks beginning around the lines showing "Config | Variant | TPS
..." (the two fenced blocks that display the benchmark tables) to use ```text
and ensure both the block at the one starting with "Config                    |
Variant      |     TPS ..." and the second similar block further down are
updated the same way.
137-140: ⚠️ Potential issue | 🟡 Minor

Correct the P50 latency values to match the table above.

Line 139 quotes P50 values as "(1934 vs 3854ms)" but the table at lines 134-135 shows IAVL P50 = 2047ms and MemIAVL P50 = 1322ms. The quoted values do not match the measured results.
📝 Proposed fix
-At 4000 txs (100 writes/tx): **+96.7% TPS** (795 vs 404) and **-35.4% P50 latency** (1934 vs 3854ms).
+At 4000 txs (100 writes/tx): **+96.7% TPS** (795 vs 404) and **-35.4% P50 latency** (1322 vs 2047ms).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/README.md` around lines 137 - 140, Update the
P50 latency numbers in the sentence that currently reads "(1934 vs 3854ms)" so
they match the benchmark table above: replace that fragment with "(2047 vs
1322ms)" to reflect IAVL P50 = 2047ms and MemIAVL P50 = 1322ms; ensure the order
corresponds to "IAVL vs MemIAVL" as used elsewhere in the paragraph.
160-165: ⚠️ Potential issue | 🟠 Major

Add the contract.wasm copy step after building.

The build instructions show cargo build but omit the required step to copy the artifact to the expected location. The test harness (setupWasmExecCluster) looks for bench_heavy_state/contract.wasm, but the cargo build leaves the artifact at target/wasm32-unknown-unknown/release/bench_heavy_state.wasm. Without this step, all Wasm exec benchmarks will fail at cluster setup.
📝 Proposed fix
 cd e2e/benchmark/bench_heavy_state
 cargo build --target wasm32-unknown-unknown --release
+cp target/wasm32-unknown-unknown/release/bench_heavy_state.wasm contract.wasm
Apply the same fix at lines 346-350.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/README.md` around lines 160 - 165, After
building the CosmWasm contract with cargo in bench_heavy_state, add a step to
copy the built artifact from
target/wasm32-unknown-unknown/release/bench_heavy_state.wasm to
bench_heavy_state/contract.wasm so the test harness (setupWasmExecCluster) can
find it; add the same copy instruction in the other README section that mirrors
this one to ensure both places produce bench_heavy_state/contract.wasm for the
Wasm exec benchmarks.
344-344: ⚠️ Potential issue | 🟡 Minor

Fix the inconsistent stress test configuration.

Line 344 states "Stress tests use write_mixed{40, 160} = 200 writes/tx", but Line 55 in the comparison matrix documents TestBenchmarkPreSignedSeqWasmExecStress as using "100 writes/tx". This inconsistency will confuse users trying to understand or reproduce the stress benchmarks.
📝 Proposed fix
-CLI-based tests use `write_mixed{5, 25}` = 30 writes/tx. Pre-signed HTTP tests use `write_mixed{20, 80}` = 100 writes/tx. Stress tests use `write_mixed{40, 160}` = 200 writes/tx.
+CLI-based tests use `write_mixed{5, 25}` = 30 writes/tx. Pre-signed HTTP tests use `write_mixed{20, 80}` = 100 writes/tx. Stress tests use `write_mixed{20, 80}` = 100 writes/tx.
Alternatively, if stress tests are meant to use 200 writes/tx, update Line 55 instead:
-| `TestBenchmarkPreSignedSeqWasmExecStress` | Sequential, Wasm exec, HTTP (stress) | 20 accts x 200 txs, 100 writes/tx |
+| `TestBenchmarkPreSignedSeqWasmExecStress` | Sequential, Wasm exec, HTTP (stress) | 20 accts x 200 txs, 200 writes/tx |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/README.md` at line 344, The README has
inconsistent documentation for stress test write counts: the description "Stress
tests use `write_mixed{40, 160}` = 200 writes/tx" conflicts with the comparison
matrix entry for TestBenchmarkPreSignedSeqWasmExecStress which lists 100
writes/tx; update the documentation so both places match by choosing the correct
intended value and editing either the README line mentioning `write_mixed{40,
160}` or the comparison matrix entry for TestBenchmarkPreSignedSeqWasmExecStress
to the same write count (100 or 200), ensuring the string `write_mixed{...}` and
the numeric "writes/tx" are consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@integration-tests/e2e/benchmark/README.md`:
- Around line 264-269: Add a language identifier (e.g., "text" or "plaintext")
to the fenced code blocks that contain the benchmark tables in
integration-tests/e2e/benchmark/README.md so markdownlint MD040 is silenced;
specifically update the three-backtick openings for the table blocks beginning
around the lines showing "Config | Variant | TPS ..." (the two fenced blocks
that display the benchmark tables) to use ```text and ensure both the block at
the one starting with "Config                    | Variant      |     TPS ..."
and the second similar block further down are updated the same way.
- Around line 137-140: Update the P50 latency numbers in the sentence that
currently reads "(1934 vs 3854ms)" so they match the benchmark table above:
replace that fragment with "(2047 vs 1322ms)" to reflect IAVL P50 = 2047ms and
MemIAVL P50 = 1322ms; ensure the order corresponds to "IAVL vs MemIAVL" as used
elsewhere in the paragraph.
- Around line 160-165: After building the CosmWasm contract with cargo in
bench_heavy_state, add a step to copy the built artifact from
target/wasm32-unknown-unknown/release/bench_heavy_state.wasm to
bench_heavy_state/contract.wasm so the test harness (setupWasmExecCluster) can
find it; add the same copy instruction in the other README section that mirrors
this one to ensure both places produce bench_heavy_state/contract.wasm for the
Wasm exec benchmarks.
- Line 344: The README has inconsistent documentation for stress test write
counts: the description "Stress tests use `write_mixed{40, 160}` = 200
writes/tx" conflicts with the comparison matrix entry for
TestBenchmarkPreSignedSeqWasmExecStress which lists 100 writes/tx; update the
documentation so both places match by choosing the correct intended value and
editing either the README line mentioning `write_mixed{40, 160}` or the
comparison matrix entry for TestBenchmarkPreSignedSeqWasmExecStress to the same
write count (100 or 200), ensuring the string `write_mixed{...}` and the numeric
"writes/tx" are consistent.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9414d419-9409-4ae0-96a9-25c93db6d690

📥 Commits

Reviewing files that changed from the base of the PR and between 354c9b1 and d588529.

📒 Files selected for processing (1)

integration-tests/e2e/benchmark/README.md

coderabbitai

🧹 Nitpick comments (4)

integration-tests/e2e/benchmark/benchmark_test.go (2)

683-700: MemIAVL stress subtest missing DrainTimeoutOverride.

The IAVL subtest (line 667) sets DrainTimeoutOverride = 10 * time.Minute, but the MemIAVL subtest doesn't. With the default formula (180s + 4000/20s = 380s ≈ 6.3min), MemIAVL gets a shorter timeout window.

If this is intentional because MemIAVL is expected to complete faster, consider adding a comment. Otherwise, for consistency:

 	t.Run("MemIAVL", func(t *testing.T) {
 		cfg := CombinedConfig()
 		cfg.AccountCount = 20
 		cfg.TxPerAccount = 200
+		cfg.DrainTimeoutOverride = 10 * time.Minute
 		cfg.Label = "presigned-stress/memiavl/seq-wasm-exec"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/benchmark_test.go` around lines 683 - 700,
The MemIAVL subtest ("MemIAVL") omits setting DrainTimeoutOverride on the
CombinedConfig returned by CombinedConfig(), causing it to use a shorter default
timeout; update the MemIAVL test to set cfg.DrainTimeoutOverride = 10 *
time.Minute (same as the IAVL subtest) to match timeouts, or if shorter timeout
is intentional, add a one-line comment above the MemIAVL block explaining why it
differs; refer to the local variable cfg in the MemIAVL test and the
CombinedConfig/DrainTimeoutOverride configuration option to locate where to
apply the change.

96-104: Redundant defer cl.Close() call.

setupCluster already registers cl.Close via t.Cleanup (line 48), so the explicit defer cl.Close() here is redundant. Assuming Close() is idempotent this won't cause issues, but it adds noise.

♻️ Suggested cleanup

 func runBenchmark(t *testing.T, cfg BenchConfig, loadFn func(ctx context.Context, cl *cluster.Cluster, cfg BenchConfig, metas map[string]cluster.AccountMeta) LoadResult) BenchResult {
 	t.Helper()
 	ctx := context.Background()

 	cl := setupCluster(t, ctx, cfg)
-	defer cl.Close()

 	return runBenchmarkWithCluster(t, ctx, cl, cfg, loadFn)
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/benchmark_test.go` around lines 96 - 104, In
runBenchmark, remove the redundant explicit defer cl.Close() because
setupCluster already registers cl.Close via t.Cleanup; locate the function
runBenchmark and delete the line "defer cl.Close()" so cleanup relies on the
t.Cleanup registration from setupCluster (no other changes needed).

integration-tests/e2e/benchmark/README.md (2)

302-311: Add language identifier to fenced code block.

Same issue as the previous block—this directory structure listing needs a language specifier.

-```
+```text
 benchmark/
   config.go          Variant definitions, BenchConfig, preset constructors

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/README.md` around lines 302 - 311, The fenced
code block that shows the benchmark/ directory listing in README.md lacks a
language identifier; update the opening fence for the block that begins with
"benchmark/" to use a language specifier (e.g., change ``` to ```text) so the
block becomes a text-formatted code block in the README.

265-270: Add language identifier to fenced code block.

Per static analysis (MD040), this fenced code block needs a language specifier. Since it's tabular text output, text or plaintext is appropriate.

-```
+```text
 Config                    | Variant      |     TPS | vs base |   P50ms | vs base |   P95ms | vs base | Peak Mempool

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/README.md` around lines 265 - 270, The fenced
code block containing the benchmark table in README.md is missing a language
identifier; update the opening fence (the triple backticks that precede the
table) to include a language specifier such as text or plaintext (e.g., change
``` to ```text) so the block passes MD040 static analysis and renders correctly;
locate the fenced code block that contains the "Config | Variant | TPS ..."
table and add the language identifier to its opening backticks.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@integration-tests/e2e/benchmark/benchmark_test.go`:
- Around line 683-700: The MemIAVL subtest ("MemIAVL") omits setting
DrainTimeoutOverride on the CombinedConfig returned by CombinedConfig(), causing
it to use a shorter default timeout; update the MemIAVL test to set
cfg.DrainTimeoutOverride = 10 * time.Minute (same as the IAVL subtest) to match
timeouts, or if shorter timeout is intentional, add a one-line comment above the
MemIAVL block explaining why it differs; refer to the local variable cfg in the
MemIAVL test and the CombinedConfig/DrainTimeoutOverride configuration option to
locate where to apply the change.
- Around line 96-104: In runBenchmark, remove the redundant explicit defer
cl.Close() because setupCluster already registers cl.Close via t.Cleanup; locate
the function runBenchmark and delete the line "defer cl.Close()" so cleanup
relies on the t.Cleanup registration from setupCluster (no other changes
needed).

In `@integration-tests/e2e/benchmark/README.md`:
- Around line 302-311: The fenced code block that shows the benchmark/ directory
listing in README.md lacks a language identifier; update the opening fence for
the block that begins with "benchmark/" to use a language specifier (e.g.,
change ``` to ```text) so the block becomes a text-formatted code block in the
README.
- Around line 265-270: The fenced code block containing the benchmark table in
README.md is missing a language identifier; update the opening fence (the triple
backticks that precede the table) to include a language specifier such as text
or plaintext (e.g., change ``` to ```text) so the block passes MD040 static
analysis and renders correctly; locate the fenced code block that contains the
"Config | Variant | TPS ..." table and add the language identifier to its
opening backticks.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0f3c5f9f-de84-470e-a48f-d07a0dbc129c

📥 Commits

Reviewing files that changed from the base of the PR and between d588529 and 0549ad7.

📒 Files selected for processing (2)

integration-tests/e2e/benchmark/README.md
integration-tests/e2e/benchmark/benchmark_test.go

coderabbitai

♻️ Duplicate comments (2)

integration-tests/e2e/benchmark/README.md (2)
302-311: ⚠️ Potential issue | 🟡 Minor

Add language identifier to the fenced code block.

This fenced block is missing a language specifier, triggering markdownlint MD040. Adding text or plaintext will resolve the lint warning.
📝 Suggested fix
-```
+```text
 benchmark/
As per coding guidelines, this was flagged in a previous review but remains unaddressed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/README.md` around lines 302 - 311, The fenced
directory listing in the README.md is missing a language identifier causing
markdownlint MD040; update the fenced code block around the directory tree (the
block that starts with ``` and contains "benchmark/  config.go  load.go ...") by
adding a language specifier such as "text" or "plaintext" (e.g., change the
opening backticks to ```text) so the linter stops flagging MD040.
265-270: ⚠️ Potential issue | 🟡 Minor

Add language identifier to the fenced code block.

This fenced block is missing a language specifier, triggering markdownlint MD040. Adding text or plaintext will resolve the lint warning.
📝 Suggested fix
-```
+```text
 Config                    | Variant      |     TPS | vs base |   P50ms | vs base |   P95ms | vs base | Peak Mempool
As per coding guidelines, this was flagged in a previous review but remains unaddressed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/README.md` around lines 265 - 270, The
README's fenced code block that starts with "Config                    | Variant
|     TPS ..." is missing a language identifier, causing markdownlint MD040;
edit the fenced block in integration-tests/e2e/benchmark/README.md and add a
language specifier such as "text" or "plaintext" immediately after the opening
backticks (e.g., ```text) so the table block is recognized and the lint warning
is resolved.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@integration-tests/e2e/benchmark/README.md`:
- Around line 302-311: The fenced directory listing in the README.md is missing
a language identifier causing markdownlint MD040; update the fenced code block
around the directory tree (the block that starts with ``` and contains
"benchmark/  config.go  load.go ...") by adding a language specifier such as
"text" or "plaintext" (e.g., change the opening backticks to ```text) so the
linter stops flagging MD040.
- Around line 265-270: The README's fenced code block that starts with "Config  
| Variant      |     TPS ..." is missing a language identifier, causing
markdownlint MD040; edit the fenced block in
integration-tests/e2e/benchmark/README.md and add a language specifier such as
"text" or "plaintext" immediately after the opening backticks (e.g., ```text) so
the table block is recognized and the lint warning is resolved.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 11afa1ce-7519-43f0-a000-9e61827b0f48

📥 Commits

Reviewing files that changed from the base of the PR and between 0549ad7 and e31325c.

📒 Files selected for processing (3)

integration-tests/e2e/benchmark/README.md
integration-tests/e2e/benchmark/benchmark_test.go
integration-tests/e2e/benchmark/config.go

✅ Files skipped from review due to trivial changes (1)

integration-tests/e2e/benchmark/benchmark_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

integration-tests/e2e/benchmark/config.go

feat(e2e): add benchmark suite for mempool, and state db

e5eae99

traviolus requested a review from beer-1 March 19, 2026 09:16

traviolus requested a review from a team as a code owner March 19, 2026 09:16

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

integration-tests/e2e/benchmark/config.go Show resolved Hide resolved

integration-tests/e2e/cluster/ports.go Show resolved Hide resolved

integration-tests/go.mod Show resolved Hide resolved

feat(e2e): update stress test

354c9b1

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

feat(e2e): update stress test

d588529

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

feat(e2e): update test and readme doc

0549ad7

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

feat(e2e): update stress test

e31325c

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

traviolus added 2 commits March 24, 2026 12:53

fix(e2e): sequencer-only mempool check, reorder poller

8f5dd07

fix(e2e): bump go mod

a667d53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(e2e): add benchmark suite for mempool, and state db#149

feat(e2e): add benchmark suite for mempool, and state db#149
traviolus wants to merge 7 commits intomainfrom
feat/e2e-benchmark

traviolus commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 19, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

traviolus commented Mar 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Author Checklist

Reviewers Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

traviolus commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 19, 2026 •

edited

Loading

codecov bot commented Mar 19, 2026 •

edited

Loading