Skip to content

refactor: deduplicate storage helpers, eliminate redundant hash computations, and consolidate test utilities#172

Merged
pablodeymo merged 7 commits intomainfrom
simplify-review-cleanup
Mar 2, 2026
Merged

refactor: deduplicate storage helpers, eliminate redundant hash computations, and consolidate test utilities#172
pablodeymo merged 7 commits intomainfrom
simplify-review-cleanup

Conversation

@pablodeymo
Copy link
Collaborator

@pablodeymo pablodeymo commented Mar 2, 2026

Summary

Code quality cleanup after the devnet 3 merge (#107). Ran systematic reviews (code reuse, quality, efficiency) across all changed files and fixed the actionable findings. Net result: no behavior changes, reduced I/O round-trips on hot paths.

  • Deduplicate copy-pasted storage methods for aggregated payloads
  • Eliminate 4 redundant tree_hash_root() computations on hot paths
  • Remove duplicate test helpers in favor of existing public APIs
  • Add a data_root cache to avoid N+1 DB lookups in attestation extraction
  • Single-pass over known aggregated payloads during block production (was scanning the table twice)
  • Move Checkpoint to its own module with direct imports (no re-exports)
  • Batch aggregated payload inserts (single commit instead of N round-trips per validator)
  • Deduplicate prune functions via shared prune_by_slot helper
  • Narrow extract_latest_attestations API to accept only keys (avoids cloning large proof data)
  • Extract ~250 lines of duplicated test fixture types into shared ethlambda-test-fixtures crate

Changes in detail

1. Storage method deduplication (crates/storage/src/store.rs)

iter_known_aggregated_payloads / iter_new_aggregated_payloads and insert_known_aggregated_payload / insert_new_aggregated_payload were character-for-character identical except for the Table enum variant.

Fix: Extracted two private helpers parameterized by Table:

  • iter_aggregated_payloads(table) — shared iterator logic
  • insert_aggregated_payload(table, key, payload) — shared read-modify-write logic

2. Redundant tree_hash_root() elimination (crates/blockchain/src/store.rs)

Four call sites were computing tree_hash_root() on data that had already been hashed:

Location What was redundant
on_gossip_aggregated_attestation aggregated.data.tree_hash_root() computed twice
on_block_core block.tree_hash_root() computed before verification, then recomputed on the clone after
aggregate_committee_signatures data.tree_hash_root() recomputed on data already fetched by its data_root key
on_gossip_attestationinsert_gossip_signature attestation.data.tree_hash_root() computed in caller, then recomputed inside

Fix: Reuse the pre-computed value in each case. For insert_gossip_signature, changed the signature to accept (data_root, slot) instead of &AttestationData.

3. Test helper deduplication

forkchoice_spectests.rs: Removed 20-line duplicate extract_attestations() — calls the public Store::extract_latest_attestations() directly.

rpc/fork_choice.rs + rpc/lib.rs: Extracted shared test_utils::create_test_state().

4. data_root cache in extract_latest_attestations (crates/storage/src/store.rs)

Many validators attest to the same data (often just 1-3 distinct roots). Added a HashMap<H256, Option<AttestationData>> cache to deduplicate get_attestation_data_by_root lookups.

5. Single-pass in produce_block_with_signatures (crates/blockchain/src/store.rs)

Was calling store.iter_known_aggregated_payloads() twice — once for attestation data, then again for proofs. Now collects once and derives both from it.

6. Move Checkpoint to dedicated module (crates/common/types/src/checkpoint.rs)

Extracted Checkpoint struct and deser_dec_str helper from state.rs. All import sites updated to use checkpoint::Checkpoint directly — no pub use re-exports.

7. Batch aggregated payload inserts (crates/storage/src/store.rs)

Each insert_aggregated_payload call performed its own read-modify-write-commit cycle. When storing an aggregated attestation covering N validators, this meant N separate storage round-trips.

Fix: New insert_aggregated_payloads_batch performs a single read-write-commit cycle for all entries. Groups by key to handle duplicate keys correctly. Updated all loop callers in aggregate_committee_signatures, on_gossip_aggregated_attestation, and on_block_core.

8. Deduplicate prune functions (crates/storage/src/store.rs)

prune_gossip_signatures and prune_attestation_data_by_root were structurally identical (iterate table, decode, check slot, collect keys, batch delete). Extracted shared prune_by_slot(table, finalized_slot, get_slot) helper parameterized by a slot-extraction closure.

9. Narrow extract_latest_attestations API

Changed from impl Iterator<Item = (SignatureKey, Vec<StoredAggregatedPayload>)> to impl Iterator<Item = SignatureKey>. The function only uses the keys — passing full payloads forced produce_block_with_signatures to clone all Vec<StoredAggregatedPayload> (containing multi-KB XMSS proof data) unnecessarily.

10. Extract shared test fixture types into ethlambda-test-fixtures crate

blockchain/tests/common.rs and state_transition/tests/types.rs contained ~250 lines of identical serde Deserialize structs and From impls for converting JSON test fixtures into domain types. Since these are integration tests in different crates, they couldn't share code via mod.

Fix: New crates/common/test-fixtures/ crate holds the 12 shared types (Container, Config, Checkpoint, BlockHeader, Validator, TestState, Block, BlockBody, AggregatedAttestation, AggregationBits, AttestationData, TestInfo) plus the deser_pubkey_hex helper. Both consumer files now re-export from the crate and keep only their test-specific types (ProposerAttestation in forkchoice, StateTransitionTestVector/PostState in STF).

11. Stale comment fix (crates/blockchain/src/lib.rs)

Comment said "intervals 0/3" but the 5-interval model promotes attestations at intervals 0 and 4.

How to test

All existing tests cover these changes — no new behavior was introduced.

cargo test --workspace --release
cargo clippy --workspace -- -D warnings
cargo fmt --all -- --check

All 102 tests pass.

…, and consolidate test utilities

- Extract iter_aggregated_payloads() and insert_aggregated_payload() private helpers
  parameterized by Table, replacing 4 copy-pasted methods in storage/store.rs
- Remove duplicate extract_attestations() test helper in forkchoice_spectests.rs,
  use public Store::extract_latest_attestations() directly
- Consolidate create_test_state() into a shared test_utils module in the rpc crate
- Reuse pre-computed tree_hash_root values in on_gossip_aggregated_attestation,
  on_block_core, aggregate_committee_signatures, and insert_gossip_signature
- Cache data_root lookups in extract_latest_attestations to avoid N+1 DB reads
- Single-pass over iter_known_aggregated_payloads in produce_block_with_signatures
  instead of scanning the table twice
- Fix stale comment: intervals 0/3 -> 0/4 to match the 5-interval model
@pablodeymo pablodeymo requested a review from MegaRedHand March 2, 2026 15:10
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

🤖 Kimi Code Review

Review Summary

This PR introduces several improvements to attestation aggregation and storage efficiency. The changes are generally correct and improve code maintainability, but there are a few areas that need attention.

Issues Found

  1. Missing variable declarations in on_block_core (crates/blockchain/src/store.rs:533-534)

    • Lines 533-534 remove the block_root and slot variable declarations, but these variables are still used on line 610 in the info! macro call. This will cause a compilation error.
  2. Inefficient cloning in extract_latest_attestations (crates/storage/src/store.rs:711)

    • The function clones AttestationData even when it might not be necessary. Consider using references or Arc for better performance.
  3. Potential race condition in storage operations

    • The new helper functions iter_aggregated_payloads and insert_aggregated_payloads use expect which could panic. While this follows the existing pattern, consider more graceful error handling.

Positive Changes

  1. Performance improvement: The introduction of data_cache in extract_latest_attestations reduces redundant database lookups for the same attestation data.

  2. Code deduplication: The new helper methods iter_aggregated_payloads and insert_aggregated_payload eliminate duplicated code between known/new aggregated payload handling.

  3. API improvement: Changing insert_gossip_signature to take data_root and slot directly instead of computing them internally is more efficient and follows the principle of doing work at the call site when the values are already available.

  4. Test code cleanup: Moving create_test_state to a shared test utilities module is good practice.

Suggested Fixes

  1. Fix the compilation error:

    // Add back these variables in on_block_core
    let block_root = block.tree_hash_root();
    let slot = block.slot;
  2. Consider optimizing the clone:

    // Instead of cloning, consider:
    result.insert(validator_id, data.clone());
    // Could be optimized with Arc or Cow if performance becomes an issue

The PR is nearly ready but requires fixing the compilation error before merging.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

🤖 Codex Code Review

  1. Medium - avoid full payload cloning in block production hot path
    In crates/blockchain/src/store.rs:754-760, known_payloads is collected, then each Vec<StoredAggregatedPayload> is cloned just to call extract_latest_attestations, even though that function ignores the payload list (_payload_list) in crates/storage/src/store.rs:697.
    This adds avoidable CPU+memory pressure in a consensus-critical path (produce_block_with_signatures) and can hurt slot-time margins under load.
    Suggested fix: change extract_latest_attestations to accept an iterator of keys (or borrowed entries) so it only consumes (validator_id, data_root) without cloning payload vectors.

No correctness or security regressions stood out in the signature-root changes (data_root usage) or table-helper refactor. Those changes look consistent with existing XMSS verification flow and attestation storage semantics.

I could not run tests locally due sandboxed rustup temp-file permission errors (/home/runner/.rustup/tmp/... permission denied).


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

🤖 Claude Code Review

Now I have enough context to write the review. Let me compile my findings.


Code Review: PR #172 — Storage deduplication, hash elimination, test utilities consolidation

Overall: Clean, well-scoped cleanup. The deduplication and single-pass optimizations are correct. A few minor points below.


crates/blockchain/src/store.rs

aggregate_committee_signatures (line 170):
The removal of let message = data.tree_hash_root() and reuse of data_root is correct — the data was fetched by content-address (get_attestation_data_by_root(&data_root)), so data_root == data.tree_hash_root() is invariant-guaranteed by the storage layer. The comment explaining this is good.

insert_gossip_signature API change (line 897):
Signature changed from (&AttestationData, validator_id, sig) to (data_root, slot, validator_id, sig). The two call sites pass values that were just computed from attestation.data, so they're correct. However, the public API now has an implicit precondition (data_root == attestation_data.tree_hash_root()) that isn't enforced by the type system. If this function is expected to stay public and evolve independently of its callers, consider a doc comment describing the invariant.

produce_block_with_signatures (lines 756-759) — unnecessary clone:

let known_attestations = store.extract_latest_attestations(
    known_payloads
        .iter()
        .map(|(key, payloads)| (*key, payloads.clone())),  // ← clone
);

extract_latest_attestations immediately discards the payload list (_payload_list). The payloads.clone() allocates a full Vec<StoredAggregatedPayload> per entry only to throw it away. The simplest fix: pass an empty vec or change extract_latest_attestations to accept impl Iterator<Item = SignatureKey>. The current code is functionally correct but has avoidable heap allocations on the block-production hot path.

on_block_core — block_root deduplication:
The PR moves the block_root and slot computation to the borrow-not-owned block = &signed_block.message.block phase before the clone. Since Clone doesn't mutate the value, tree_hash_root() produces the same result, so eliminating the post-clone recomputation is correct.

on_gossip_aggregated_attestation (line 434→446):
Eliminating the duplicate tree_hash_root() call by reusing the first result as both the verification message and the storage key is clean and correct. The rename from message to data_root makes the intent clearer.


crates/storage/src/store.rs

iter_aggregated_payloads / insert_aggregated_payload deduplication:
Mechanically clean. Both helpers are fn (non-pub), insert_aggregated_payload takes &mut self — the visibility and mutability rules are preserved correctly across all four delegating public methods.

extract_latest_attestations cache (lines 695, 698-700):
The HashMap<H256, Option<AttestationData>> cache is correct. Caching None explicitly (so a missing root isn't looked up repeatedly) is a good detail. The or_insert_with(|| self.get_attestation_data_by_root(&data_root)) pattern avoids redundant DB reads when multiple validators attest to the same data — which is the common case.

One note: let Some(data) = data else { continue; } where data: &mut Option<AttestationData> gives data: &mut AttestationData, and data.clone() is then correct. This is idiomatic but the lifetime interaction with the HashMap::entry borrow is non-obvious on first read. No bug, just mildly dense.


crates/blockchain/src/lib.rs

Comment fix (line 182): "intervals 0/3" → "intervals 0/4":
Correct. INTERVALS_PER_SLOT = 5, so intervals are 0–4. accept_new_attestations is called at intervals 0 (with proposal) and 4 (end-of-slot). The CLAUDE.md still documents 4 intervals — that's a documentation inconsistency outside this PR's scope but worth a follow-up.


crates/net/rpc/src/lib.rs + fork_choice.rs

test_utils extraction:
The pub(crate) module inside #[cfg(test)] is a reasonable pattern for sharing test helpers within the crate. The fork_choice.rs import use crate::test_utils::create_test_state; is also gated by #[cfg(test)], so the cfg conditions align correctly.

The extracted create_test_state() in lib.rs is slightly richer than the one removed from fork_choice.rs (it uses an explicit genesis_checkpoint variable rather than Checkpoint::default()), which is a minor divergence in style but not a correctness issue.


crates/blockchain/tests/forkchoice_spectests.rs

Removing the duplicate extract_attestations free function in favor of the now-public Store::extract_latest_attestations is the right call. The old free function had a comment acknowledging it mirrored the private implementation — promoting the real method to pub and deleting the duplicate is strictly better.


Summary

  • Correctness: All changes are behaviorally equivalent to the code they replace. No consensus logic is touched.
  • Main actionable item: The unnecessary payloads.clone() in produce_block_with_signatures — minor inefficiency, not a bug, but easy to eliminate.
  • Secondary: Consider a doc comment on insert_gossip_signature for the data_root precondition, or scope it to pub(crate) if it's never called across crate boundaries.

Automated review by Claude (Anthropic) · custom prompt

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

Code quality cleanup that eliminates redundant computations and duplicate code across the codebase with no behavior changes.

Key improvements:

  • Performance: Removed 4 redundant tree_hash_root() calls on hot paths (attestation processing, block processing, signature aggregation) by reusing pre-computed hashes
  • Efficiency: Added data_root cache in extract_latest_attestations() to eliminate N+1 database lookups when multiple validators attest to the same data
  • Optimization: Changed produce_block_with_signatures() to scan known aggregated payloads once instead of twice
  • Maintainability: Deduplicated storage methods by extracting parameterized helpers (iter_aggregated_payloads, insert_aggregated_payload)
  • Code reuse: Removed duplicate test utilities in favor of existing public APIs and shared modules

All changes are well-justified mechanical refactorings. The PR description provides excellent traceability with a detailed explanation for each optimization.

Confidence Score: 5/5

  • This PR is safe to merge - pure refactoring with no behavior changes, backed by comprehensive test coverage
  • All changes are mechanical code quality improvements: eliminating redundant computations, deduplicating copy-pasted code, and consolidating test utilities. The optimizations are sound (reusing pre-computed hashes, caching DB lookups, single-pass iteration) and preserve existing behavior. All 102 tests pass.
  • No files require special attention - changes are straightforward refactoring

Important Files Changed

Filename Overview
crates/blockchain/src/store.rs Eliminated 4 redundant tree_hash_root() calls on hot paths and optimized block production to scan known payloads once instead of twice
crates/blockchain/tests/forkchoice_spectests.rs Removed 20-line duplicate extract_attestations() helper in favor of public Store::extract_latest_attestations() API
crates/storage/src/store.rs Deduplicated aggregated payload methods with parameterized helpers and added data_root cache to avoid N+1 lookups

Last reviewed commit: 3b569d6

Extracts Checkpoint struct and its deser_dec_str helper from state.rs
into a new checkpoint.rs file in crates/common/types/src/. Re-exports
from state.rs so all existing imports via state::Checkpoint continue
to work unchanged.
…checkpoint module directly

Replace all `state::Checkpoint` imports with `checkpoint::Checkpoint` across
the codebase (14 files). This removes the backwards-compatibility re-export
from state.rs, making checkpoint.rs the single source of truth for the
Checkpoint type.
…rrow extract_latest_attestations API

- Batch insert_aggregated_payload: new insert_aggregated_payloads_batch
  performs a single read-write-commit cycle instead of N individual
  round-trips per validator. Updated all loop callers in
  aggregate_committee_signatures, on_gossip_aggregated_attestation,
  and on_block_core.

- Narrow extract_latest_attestations to accept Iterator<Item = SignatureKey>
  instead of full (SignatureKey, Vec<StoredAggregatedPayload>) tuples.
  The function only uses the keys, so this eliminates unnecessary cloning
  of large proof data in produce_block_with_signatures.

- Deduplicate prune_gossip_signatures and prune_attestation_data_by_root
  into a shared prune_by_slot helper parameterized by table and a
  slot-extraction closure.

- Make deser_dec_str private in checkpoint.rs (only used by serde internally).
CheckpointInfo was a redundant struct with the same fields as Checkpoint.
Use Checkpoint directly in ForkChoiceResponse.
@MegaRedHand MegaRedHand changed the title Deduplicate storage helpers, eliminate redundant hash computations, and consolidate test utilities refactor: deduplicate storage helpers, eliminate redundant hash computations, and consolidate test utilities Mar 2, 2026
blockchain/tests/common.rs and state_transition/tests/types.rs contained
~250 lines of identical serde Deserialize structs and From impls for
converting JSON test fixtures into domain types. Since these are
integration tests in different crates, they couldn't share code via mod.

New crate crates/common/test-fixtures/ holds the 12 shared types
(Container, Config, Checkpoint, BlockHeader, Validator, TestState, Block,
BlockBody, AggregatedAttestation, AggregationBits, AttestationData,
TestInfo) plus the deser_pubkey_hex helper. Both consumer files now
re-export from the crate and keep only their test-specific types.
@pablodeymo pablodeymo merged commit a91f257 into main Mar 2, 2026
3 checks passed
@pablodeymo pablodeymo deleted the simplify-review-cleanup branch March 2, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants