Chainbase internal optimizations: session allocs, spinlock, snapshot preallocate#283
Open
Chainbase internal optimizations: session allocs, spinlock, snapshot preallocate#283
Conversation
Replace vector<unique_ptr<abstract_session>> with a lightweight database* + bool pair. The abstract_session / session_impl virtual dispatch layer was redundant — database::undo() and database::squash() already iterate _index_list with the same virtual dispatch through abstract_index. Removes 18 heap allocations per transaction (1 vector + 17 session_impl objects for each registered index type).
shared_cow_string is used as shared_blob for binary data (KV keys, values, ABI blobs) where null termination is unnecessary. No c_str() method exists — all access is via data() + size(). Saves 1 byte per allocation, which with 8-byte slab bucket rounding saves 8 bytes per allocation that crosses a bucket boundary (e.g., 24-byte keys: 33 -> 32 bytes, fitting in a smaller bucket).
Reduces per-bucket overhead from ~40 bytes (pthread_mutex_t) to 1 byte (atomic_flag), saving ~5KB across 128 buckets in shared memory. Uncontended spinlock is ~7-10ns vs ~25ns for mutex, saving ~15ns per alloc+dealloc cycle.
Expose per-section row_count from snapshot readers via section_reader::row_count(), then call preallocate() before the row creation loop for all index types. This batch-allocates node storage from the segment manager upfront, avoiding repeated get_some() calls during row-by-row insertion. Covers controller_index_set, kv_database_index_set, authorization_index_set, and resource_index_set loading paths.
Session destructor calls undo() which throws if _read_only_mode is true. This causes a crash when nodeop receives SIGTERM during a read window while a block-building session is still alive. Add undo_from_session() and squash_from_session() that bypass the read-only guard so RAII cleanup always succeeds regardless of database mode.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Eliminate per-transaction heap allocations in undo sessions — Replace
vector<unique_ptr<abstract_session>>(18 heap allocations per transaction) with a lightweightdatabase* + boolpair. Theabstract_session/session_implvirtual dispatch layer was redundant sincedatabase::undo()anddatabase::squash()already iterate_index_listwith the same dispatch throughabstract_index.Remove null terminator from shared_cow_string —
shared_blobstores binary data (KV keys, values, ABI blobs) where null termination is unnecessary. Saves 1 byte per allocation; with 8-byte slab bucket rounding this saves 8 bytes per allocation that crosses a bucket boundary (e.g., 24-byte keys: 33 -> 32 bytes).Replace std::mutex with spinlock in small_size_allocator — Reduces per-bucket overhead from ~40 bytes (pthread_mutex_t) to 1 byte (atomic_flag), saving ~5KB across 128 buckets in shared memory. Uncontended spinlock is ~7-10ns vs ~25ns for mutex, saving ~15ns per alloc+dealloc cycle.
Preallocate chainbase node storage during snapshot loading — Expose per-section row count from snapshot readers, then batch-allocate node storage upfront before the row creation loop. Avoids repeated
get_some()calls to the segment manager during row-by-row insertion. Covers all index loading paths (controller, KV, authorization, resource limits).