A simplified, single-threaded key-value storage engine in C++ based on core LSM-Tree ideas.
This project was built to show a clear understanding of how modern storage engines handle writes, reads, recovery, and on-disk data organization. The implementation is intentionally simple and educational, but it still includes the main building blocks that make an LSM-based design work.
- sorted in-memory writes using a
MemTable - durable writes using a
Write-Ahead Log (WAL) - flushing in-memory data to disk as immutable
SSTablefiles - point lookups across memory and disk
- delete handling through tombstones
- Bloom Filter based SSTable skipping
- basic size-tiered compaction
- recovery after restart by replaying the WAL
For each write:
- append the operation to the WAL
- update the in-memory MemTable
- flush the MemTable to a new SSTable when it reaches a small threshold
This keeps writes durable while also making them fast and simple.
For each read:
- check the MemTable first
- if not found, check SSTables from newest to oldest
- use the Bloom Filter to skip SSTables that definitely do not contain the key
- use the SSTable index to seek to the correct record offset
Deletes are stored as tombstones instead of removing data immediately. During compaction, older values are dropped and tombstoned keys can be removed from the final merged output.
On startup, the engine:
- loads the manifest file
- restores the active SSTable list
- replays the WAL
- rebuilds the MemTable state in memory
include/
memtable.h
wal.h
sstable.h
bloom_filter.h
lsm_engine.h
src/
memtable.cpp
wal.cpp
sstable.cpp
bloom_filter.cpp
lsm_engine.cpp
main.cpp
small_test.cpp
From the project root:
cmake -S . -B build
cmake --build buildThis runs the full end-to-end flow with multiple writes, flushes, compactions, crash simulation, and recovery:
./build/lsmtree_demoThis runs a short, clean test with compact output:
./build/lsmtree_small_test-
src/lsm_engine.cpp
Main engine logic for reads, writes, flushing, recovery, and compaction. -
src/sstable.cpp
SSTable file format, index loading, record lookup, and full-file scan for compaction. -
src/wal.cpp
WAL append and replay logic. -
src/bloom_filter.cpp
Simple Bloom Filter implementation used before SSTable lookups. -
src/main.cpp
Larger end-to-end demo run. -
src/small_test.cpp
Short verification run with compact terminal output.
This is a focused educational implementation, so a few things are intentionally kept simple:
- single-threaded execution
- plain manifest format
- basic size-tiered compaction
- no range scan support
- no block cache
- no checksums
- no multi-level compaction strategy
- add timing and throughput metrics
- support range scans
- add checksums for file validation
- support command-line configuration
- extend compaction into multiple levels
