feat: use Arc cloning in extend_ref instead of deep cloning BranchNodeCompact#82
feat: use Arc cloning in extend_ref instead of deep cloning BranchNodeCompact#82defistar wants to merge 6 commits intocliff/triedbfrom
Conversation
| use alloy_primitives::map::DefaultHashBuilder; | ||
| use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion}; | ||
| use reth_trie_common::{updates::TrieUpdates, BranchNodeCompact, Nibbles}; | ||
| use std::{collections::HashMap, sync::Arc}; |
There was a problem hiding this comment.
pls follow standard rust import sequences, std first, followed by 3rd party, then workspace. other places change also
| let mut updates = TrieUpdates::default(); | ||
|
|
||
| for i in 0..num_nodes { | ||
| let path = Nibbles::from_nibbles(&[i as u8 % 16, (i / 16) as u8 % 16]); |
There was a problem hiding this comment.
for ethereum and acct, the Nibble should be 64 u8. better check with actual case,
| // the important part is the Arc cloning behavior, not node content | ||
| let node = BranchNodeCompact::default(); | ||
|
|
||
| updates.account_nodes.insert(path, Arc::new(node)); |
There was a problem hiding this comment.
can add storage_tries also, storage trie usually is much bigger
There was a problem hiding this comment.
added storage_updates for benchmark tests
|
|
||
| if !branch_nodes_equal(task.as_ref(), regular.as_ref(), database.as_ref())? { | ||
| diff.account_nodes.insert(key, EntryDiff { task, regular, database }); | ||
| if !branch_nodes_equal(task.as_ref().map(|n| &**n), regular.as_ref().map(|n| &**n), database.as_ref())? { |
There was a problem hiding this comment.
&**n can just be written as n.as_ref()
There was a problem hiding this comment.
updated &**n to n.as_ref() at all occurences
| diff.account_nodes.insert(key, EntryDiff { task, regular, database }); | ||
| if !branch_nodes_equal(task.as_ref().map(|n| &**n), regular.as_ref().map(|n| &**n), database.as_ref())? { | ||
| diff.account_nodes.insert(key, EntryDiff { | ||
| task: task.map(|n| (*n).clone()), |
There was a problem hiding this comment.
possible to define as Arc? so do not clone
There was a problem hiding this comment.
Yes We can change EntryDiff to use Arc to avoid cloning. This is diagnostic code that only runs when there are differences, so we should avoid the unnecessary clones.
| // Create account trie updates: one Some (update) and one None (removal) | ||
| let account_nodes = vec![ | ||
| (account_nibbles1, Some(node1.clone())), // This will update existing node | ||
| (account_nibbles1, Some(Arc::new(node1.clone()))), // This will update existing node |
There was a problem hiding this comment.
this could just be Arc::new(node1), without clone
| storage_nodes: vec![ | ||
| (storage_nibbles1, Some(storage_node1.clone())), // Updated node already in db | ||
| (storage_nibbles2, Some(storage_node2.clone())), /* Updated node not in db | ||
| (storage_nibbles1, Some(Arc::new(storage_node1.clone()))), // Updated node already in db |
There was a problem hiding this comment.
can clone 1 time, and use Arc::clone()?
let storage_node1 = Arc::new(storage_node1.clone());
Arc::clone(&storage_node1)
Arc::clone(&storage_node1)
There was a problem hiding this comment.
removed redundancy by cloning to local-variable
| (storage_nibbles1, Some(storage_node1.clone())), // Updated node from overlay | ||
| (storage_nibbles2, Some(storage_node2.clone())), /* Updated node not in overlay | ||
| (storage_nibbles1, Some(Arc::new(storage_node1.clone()))), // Updated node from overlay | ||
| (storage_nibbles2, Some(Arc::new(storage_node2.clone()))), /* Updated node not in overlay |
There was a problem hiding this comment.
many clone here, pls check whether can save any clone if possible
| fn from(value: &'a super::TrieUpdates) -> Self { | ||
| Self { | ||
| account_nodes: Cow::Borrowed(&value.account_nodes), | ||
| account_nodes: Cow::Owned( |
There was a problem hiding this comment.
why need to clone and own here, what would be the overall impact? possible just borrow?
There was a problem hiding this comment.
The clone happens because:
- We have
Arc<BranchNodeCompact>in the source - Bincode serialization expects
BranchNodeCompact(owned, not Arc) - We must unwrap the
Arc ([.as_ref()]and clone the inner value
Bincode serialization is infrequent (persistence/snapshots) and the clone cost is acceptable for this use case.
The Arc optimization still wins massively on the hot path (extend_ref, aggregation).
crates/trie/common/src/updates.rs
Outdated
| is_deleted: value.is_deleted, | ||
| storage_nodes: Cow::Borrowed(&value.storage_nodes), | ||
| storage_nodes: Cow::Owned( | ||
| value.storage_nodes.iter().map(|(k, v)| (*k, (**v).clone())).collect() |
There was a problem hiding this comment.
the previous version does not need to clone and own
There was a problem hiding this comment.
trade-off for clone & own:
What we gained:
- Hot path (extend_ref, aggregation): 3.08x speedup, just Arc::clone (8 bytes) instead of full clone (112 bytes)
- Happens thousands of times per block
What we lost:
- Cold path (bincode serialization): Must clone entire trie for serialization
- Happens once per persistence/snapshot operation
This is the correct trade-off for production because:
- Trie aggregation happens continuously (hot)
- Bincode serialization happens rarely (cold - snapshots, persistence)
crates/trie/db/src/trie_cursor.rs
Outdated
| self.cursor.upsert( | ||
| self.hashed_address, | ||
| &StorageTrieEntry { nibbles, node: node.clone() }, | ||
| &StorageTrieEntry { nibbles, node: (**node).clone() }, |
There was a problem hiding this comment.
better to write (**node).clone() to node.as_ref().clone(), check other places also.
There was a problem hiding this comment.
updated to get new owned value by getting as_ref() and cloning it
| cursor_entry: Option<(Nibbles, BranchNodeCompact)>, | ||
| /// Forward-only in-memory cursor over storage trie nodes. | ||
| in_memory_cursor: ForwardInMemoryCursor<'a, Nibbles, Option<BranchNodeCompact>>, | ||
| in_memory_cursor: ForwardInMemoryCursor<'a, Nibbles, Option<std::sync::Arc<BranchNodeCompact>>>, |
| // then we return the overlay's node. | ||
| return Ok(Some((mem_key, node))) | ||
| // then we return the overlay's node. Clone the Arc to get the actual node. | ||
| return Ok(Some((mem_key, (*node).clone()))) |
There was a problem hiding this comment.
that's additional clone, what is the overall impact?
There was a problem hiding this comment.
- Frequency: Lower - only during state root calculation or proof generation
- What's added: Must clone BranchNodeCompact (112 bytes) when returning from Arc
- Cost: One 112-byte clone per node read from overlay
savings outweigh costs
- saving: using Arc in extend_ref() calls - aggregating blocks into RPC cache
- What's saved: Cloning BranchNodeCompact (112 bytes) -> now just clones Arc pointer (8 bytes)
Without Arc: 1000 blocks × 50 nodes × 112 bytes = 5.6 MB + 50,000 expensive clones
With Arc: 1000 blocks × 50 nodes × 8 bytes = 0.4 MB + 50,000 cheap clones + some read clones
Savings: 5.2 MB memory + 50,000 fast aggregations
Cost: ~50-500 read clones during proof generation (depending on trie structure)
| let entry = match (mem_entry, &self.cursor_entry) { | ||
| (Some((mem_key, entry_inner)), _) if mem_key == key => { | ||
| entry_inner.map(|node| (key, node)) | ||
| entry_inner.as_ref().map(|node| (key, (**node).clone())) |
There was a problem hiding this comment.
updated to get new owned value by getting as_ref() and cloning it
| // collect account updates and sort them in descending order, so that when we pop them | ||
| // off the Vec they are popped in ascending order. | ||
| self.account_nodes.extend(updates.account_nodes); | ||
| self.account_nodes.extend(updates.account_nodes.into_iter().map(|(k, v)| (k, (*v).clone()))); |
There was a problem hiding this comment.
updated to get new owned value by getting as_ref() and cloning it
Arc Optimization for TrieUpdates
Summary
This PR optimizes TrieUpdates aggregation by using
Arc<BranchNodeCompact>instead of ownedBranchNodeCompactvalues, eliminating expensive deep cloning during block processing.Performance Impact
extend_ref()operations (440 micro-seconds vs 1,357 micro-seconds for 1,024 blocks)Changes Overview
Core Optimization
Primary Change:
crates/trie/common/src/updates.rs- ChangedHashMap<Nibbles, BranchNodeCompact>→HashMap<Nibbles, Arc<BranchNodeCompact>>Propagation to Trie Components:
crates/trie/sparse/src/traits.rs- UpdatedSparseTrieUpdates.updated_nodesto use Arccrates/trie/sparse/src/trie.rs- Wrap branch nodes inArc::new()on insertioncrates/trie/trie/src/trie.rs- Map hash builder updates to Arccrates/trie/db/src/trie_cursor.rs- Handle Arc in trie cursor operationscrates/trie/sparse-parallel/src/trie.rs- Arc support in parallel trie implementationBenchmark & Profiling
New Additions:
crates/trie/common/benches/extend_ref_benchmark.rs- Benchmark demonstrating 3.08x speedupcrates/trie/common/Cargo.toml- Added criterion dependency for benchmarkingcrates/chain-state/src/trie_profiler.rs- Profiling instrumentation (318 lines)crates/chain-state/src/lib.rs- Export profiler moduleRun benchmark:
Propagation Fixes
Required updates to support Arc changes throughout the codebase:
crates/storage/provider/src/providers/database/provider.rs- Wrap DB nodes in Arc, update static arrayscrates/trie/trie/src/trie_cursor/in_memory.rs- Update in-memory cursor for Arc typescrates/trie/trie/src/node_iter.rs- Unwrap Arc in test helper functionscrates/trie/trie/src/verify.rs- Handle Arc in trie verificationcrates/trie/db/tests/trie.rs- Update test assertions for Arc typescrates/engine/tree/src/tree/trie_updates.rs- Handle Arc in trie update comparisonTest Fixes
Required changes to make tests work with Arc types:
crates/engine/invalid-block-hooks/src/witness.rs- Wrap test nodes in Arccrates/exex/test-utils/src/lib.rs- Add TriedbProvider to test setupcrates/storage/db-common/src/init.rs- Add TriedbProvider to test setupcrates/chain-state/src/in_memory.rs- Import PlainPostState for testsTesting
All Core Trie Tests Pass
Test Commands
Technical Details
Before (Deep Clone)
extend_ref()call deep clones all BranchNodeCompact valuesAfter (Arc)
extend_ref()call clones Arc pointers (cheap)Memory Layout Comparison
Benchmark Details
What the Benchmark Tests
Compares two implementations of trie node aggregation:
Runs 16 test cases across two scenarios:
How It Works
Arc-based approach:
Deep clone approach:
Measured Results
Block accumulation (50 nodes per block):
Single extend operations:
Key observations:
Statistical data
target/criterion/with HTML reportsValidation