Skip to content

vaibhavj2006/In-Memory-Database-Engine

Repository files navigation

memdb — In-Memory Database Engine

A production-grade in-memory database engine written in C++17 with HPX for parallel task execution.

Architecture

┌─────────────────────────────────────────┐
│              StorageEngine              │
│  ┌─────────────┐   ┌─────────────────┐  │
│  │  HashIndex  │   │     B-Tree      │  │
│  │  (O(1) pt  │   │  (range scans)  │  │
│  │   lookup)  │   │   O(log n + k)  │  │
│  └─────────────┘   └─────────────────┘  │
│         ↑                  ↑            │
│         └──────────────────┘            │
│              shared_mutex               │
└───────────────────┬─────────────────────┘
                    │
        ┌───────────▼────────────┐
        │   TransactionManager   │
        │  ┌──────────────────┐  │
        │  │   Lock Table     │  │
        │  │  (per-row S/X)   │  │
        │  └──────────────────┘  │
        │  ┌──────────────────┐  │
        │  │  Wait-For Graph  │  │
        │  │ (deadlock detect)│  │
        │  └──────────────────┘  │
        │  ┌──────────────────┐  │
        │  │  WAL (log)       │  │
        │  └──────────────────┘  │
        └────────────────────────┘
                    │
        ┌───────────▼────────────┐
        │    HPX Thread Pool     │
        │  (work-stealing sched) │
        └────────────────────────┘

Components

File Responsibility
include/btree.hpp B-Tree of order T, CRUD + range scan
include/hash_index.hpp Extendible hash index with dynamic splitting
include/transaction.hpp Lock table (S/X), wait-for graph, deadlock detection, WAL
include/storage_engine.hpp Combines all above into one transactional API
src/main.cpp HPX-powered demo: CRUD, range, concurrent R/W, deadlock

Features

  • Point lookups via extendible hash index — O(1) average
  • Range queries via B-Tree — O(log n + k)
  • ACID transactions with 2PL (two-phase locking)
  • Shared/Exclusive row locks with lock upgrade support
  • Deadlock detection via DFS on wait-for graph, victim selection
  • Write-Ahead Log (WAL) for durability tracing
  • HPX async tasks for concurrent workloads (work-stealing scheduler)
  • Soft deletes with version counters

Build

With HPX (recommended)

# Install HPX
sudo apt install libhpx-dev    # Ubuntu/Debian
# or build from source: https://github.com/STEllAR-GROUP/hpx

mkdir build && cd build
cmake -DUSE_HPX=ON -DHPX_DIR=/usr/lib/cmake/HPX ..
make -j$(nproc)
./memdb_demo --hpx:threads=4

Without HPX (std::async fallback)

mkdir build && cd build
cmake -DUSE_HPX=OFF ..
make -j$(nproc)
./memdb_demo

Why HPX?

HPX replaces std::thread with lightweight user-space tasks:

  • hpx::async(f) — submits task to work-stealing thread pool
  • hpx::future<T> — composable, supports .then() continuations
  • hpx::wait_all(futures) — barrier synchronization
  • The scheduler dynamically balances tasks across OS threads

This means thousands of concurrent transactions can be submitted as HPX tasks without the overhead of spawning OS threads per transaction.

Interview talking points

  1. Why B-Tree for range queries? Hash maps can't do ordered scans. B-Tree keeps keys sorted, so range(lo, hi) is O(log n + k) vs O(n) full scan.

  2. Why extendible hashing? Avoids rehashing the entire table on resize. Only splits individual buckets, keeping most entries in place.

  3. How does deadlock detection work? We maintain a wait-for graph where edge A→B means "transaction A is waiting for a lock held by B". A deadlock is a cycle in this graph. We detect it with DFS before blocking.

  4. What's the WAL for? In a real system, WAL lets you recover committed transactions after a crash by replaying the log. Here it records every write with an LSN (log sequence number).

  5. Why HPX over std::thread? HPX's work-stealing scheduler means idle threads steal tasks from busy threads. For a database with thousands of short-lived transactions, this is far more efficient than spawning an OS thread per transaction.

About

A from-scratch in-memory database engine built in C++17, designed to handle concurrent transactional workloads efficiently. The storage layer uses a dual-index architecture — an extendible hash index for O(1) point lookups and a B-Tree for O(log n + k) range scans — both operating under a shared read-write latch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors