Skip to content

VishakBaddur/Custom_Database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

16 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš€ DistributedDB - Distributed Key-Value Database with Raft Consensus

A distributed, fault-tolerant key-value database built from scratch in C++17. Features a fully implemented Raft consensus algorithm, asynchronous event-driven networking, and crash-safe WAL persistence.

C++ CMake Boost License


๐ŸŽฏ Empirical Performance Validation

Tested end-to-end over local loopback on an Apple Silicon (M-series) environment.

  • โšก Network Throughput: 28,700+ operations/second fully end-to-end over TCP
  • ๐Ÿ›ก๏ธ Success Rate: 100.0% (50,000 / 50,000 operations completed successfully)
  • ๐Ÿ”„ Concurrency: 50 simultaneous client threads
  • ๐Ÿ’พ WAL Efficiency: ~4.5 MB sequential append-only WAL for 50k dense operations
  • ๐Ÿ—ณ๏ธ Leader Election: Sub-300ms re-election after node failure
  • ๐Ÿ” Fault Tolerance: Cluster survives leader crash and continues serving writes

๐Ÿ—๏ธ Architecture Overview

  Client        Client        Client
    |              |              |
    +โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€+โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€+
                   |
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚   DatabaseServer  โ”‚
         โ”‚ (Boost.Asio Loop) โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚   8x Worker Pool  โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚     RaftNode      โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚   RaftNode      โ”‚
         โ”‚  (Leader/Follow)  โ”‚  RPC   โ”‚  (Follower)     โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚  Database Engine  โ”‚
         โ”‚  (WAL + ACID)     โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Write path: Client โ†’ Server โ†’ Worker โ†’ RaftNode (consensus) โ†’ majority ACK โ†’ Database Engine โ†’ WAL โ†’ response


๐Ÿ—ณ๏ธ Raft Consensus Implementation

Built from scratch following the Raft paper ("In Search of an Understandable Consensus Algorithm", Ongaro & Ousterhout 2014).

What's implemented

  • Leader Election โ€” randomized election timeouts (150โ€“300ms), majority voting, term management
  • Log Replication โ€” AppendEntries RPC with prev_log consistency checks
  • Safety โ€” ยง5.4 log completeness: leader only commits entries from current term
  • Fast Log Backtracking โ€” conflict_term/conflict_index optimization to skip entire terms on retry
  • No-op Entry โ€” leader appends no-op on election to commit previous term entries (ยง8)
  • Heartbeats โ€” 50ms interval to suppress spurious elections

Fault Tolerance Demo Output

โ”โ”โ” Phase 1: Starting 3-node cluster โ”โ”โ”
  โ˜…  NODE 0 ELECTED AS LEADER  โ˜…

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Node   โ”‚ Role     โ”‚ Term   โ”‚ Leader     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Node 0 โ”‚ LEADER   โ”‚      1 โ”‚ Node 0     โ”‚
โ”‚ Node 1 โ”‚ FOLLOWER โ”‚      1 โ”‚ Node 0     โ”‚
โ”‚ Node 2 โ”‚ FOLLOWER โ”‚      1 โ”‚ Node 0     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”โ”โ” Phase 2: Writing data to cluster โ”โ”โ”
  โœ“  [Node 0] committed PUT key:A=alpha
  โœ“  [Node 0] committed PUT key:B=beta
  โœ“  [Node 0] committed PUT key:C=gamma
  Committed so far: 9  (3 entries ร— 3 nodes)

โ”โ”โ” Phase 3: KILLING LEADER (Node 0) โ”โ”โ”
  Simulating leader crash...

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ KILLED โ”‚ -------- โ”‚ ------ โ”‚ ---------- โ”‚
โ”‚ Node 1 โ”‚ FOLLOWER โ”‚      1 โ”‚ Node 0     โ”‚
โ”‚ Node 2 โ”‚ FOLLOWER โ”‚      1 โ”‚ Node 0     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”โ”โ” Phase 4: Waiting for new leader election โ”โ”โ”
  โ˜…  NODE 2 ELECTED AS LEADER  โ˜…
  New leader elected in 202ms

โ”โ”โ” Phase 5: Cluster continues serving writes โ”โ”โ”
  โœ“  [Node 2] committed PUT key:D=delta
  โœ“  [Node 1] committed PUT key:E=epsilon

โ”โ”โ” Summary โ”โ”โ”
  Total leaders elected : 2
  Total entries committed: 13
  Re-election time       : 202ms
  Cluster survived kill  : YES
  โœ… FAULT TOLERANCE TEST PASSED

๐Ÿš€ Quick Start

Prerequisites

  • C++17 compiler (Clang 7+, GCC 8+)
  • CMake 3.15+
  • Boost Libraries (boost::asio)

Building

git clone https://github.com/VishakBaddur/Custom_Database.git
cd Custom_Database
mkdir build && cd build
cmake ..
cmake --build .

Running the Single-Node Server

./distributeddb_server 8080
./distributeddb_client localhost 8080 put "user:24" "Vishak"
./distributeddb_client localhost 8080 get "user:24"
./distributeddb_client localhost 8080 scan "user:" "user:~"

Running the Raft Cluster Test

./raft_cluster_test

Running the Fault Tolerance Demo

./raft_failure_demo

Running the Benchmark

./distributeddb_benchmark 127.0.0.1 8080 50 1000

๐Ÿ› ๏ธ Technical Challenges & Solutions

1๏ธโƒฃ Async Buffer Lifetime & Memory Safety

Challenge: During high-concurrency stress testing, outbound responses corrupted or triggered segfaults. Root cause: stack-allocated buffers passed into boost::asio::async_write were destroyed before the OS completed transmission.

Solution: Restructured response ownership around connection-scoped member buffers managed by std::enable_shared_from_this<ConnectionHandler>, guaranteeing buffer lifetime outlives the async write operation.


2๏ธโƒฃ Cross-Thread Socket Race Conditions

Challenge: Worker threads writing directly to client sockets introduced thread-safety violations against the Boost.Asio event loop.

Solution: All worker thread responses are marshalled back onto the networking strand via boost::asio::post(...). Worker threads never touch socket objects directly.


3๏ธโƒฃ Raft Vote Counting & State Machine Correctness

Challenge: Vote replies arrive asynchronously on detached threads. Naive counting caused races where a node could declare itself leader multiple times or count stale votes from previous terms.

Solution: Vote counting is gated inside handle_vote_reply() under state_mutex_ with term and role checks โ€” only counted if still CANDIDATE and term matches exactly.


4๏ธโƒฃ Raft Shutdown & Thread Coordination

Challenge: Stopping a leader node triggered aborts because the heartbeat thread continued firing after the node's state was destroyed.

Solution: RaftNode::stop() joins both the election timer thread and the heartbeat thread in order, with running_ = false set atomically before either join.


๐Ÿ“Š Benchmark Results

=== Concurrent Benchmark Results ===
Total operations:      50000
Successful operations: 50000
Duration:              1741 ms
Throughput:            28719.1 ops/sec
Success rate:          100%

๐Ÿ”ฎ Roadmap

โœ… Phase 1: High-Performance Single-Node Database

  • Async event-driven TCP server (Boost.Asio)
  • 8-thread worker pool with decoupled I/O pipeline
  • Thread-safe key-value engine (std::shared_mutex)
  • ACID transaction support
  • Write-Ahead Logging (WAL) + crash recovery

โœ… Phase 2: Raft Consensus

  • Leader election with randomized timeouts
  • Log replication with AppendEntries RPC
  • Majority commit with safety guarantees (ยง5.4)
  • Fast log backtracking optimization
  • Fault tolerance: leader failover in <300ms
  • 3-node cluster test with verified replication

๐Ÿ“‹ Phase 3: Production Hardening (In Progress)

  • Wire Raft into DatabaseServer (client writes through consensus)
  • Persist voted_for and currentTerm to disk (fsync)
  • B-tree indexing
  • Docker multi-node cluster setup
  • Client redirect to leader

๐Ÿ”— Connect

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors