A hands-on tutorial for building a local lakehouse with DuckDB, DuckLake, and SQLMesh.
DuckLake brings lakehouse capabilities (ACID transactions, time travel, Parquet storage) to DuckDB. Combined with SQLMesh for data transformations, you get a lightweight but production-ready data stack that runs entirely on your laptop.
This repo is an unofficial companion to the Tobiko blog post, packaged as a Jupyter notebook for ease of exploration. The blog post covers:
- What lakehouses are and why they matter
- How DuckLake compares to other open table formats
- The layered data architecture (raw → staging → marts)
New to these concepts? Read the blog post first, then come back here to build it yourself.
- Python 3.13+
- uv package manager
# Install dependencies
uv sync
# Launch JupyterLab
uv run jupyter lab- Open
ducklake_tutorial.ipynb - Run all cells
The notebook will:
- Download NYC Taxi trip data (~50K rows sampled)
- Initialize a DuckLake lakehouse
- Run SQLMesh transformations (staging -> dims -> facts)
- Query the transformed data
├── ducklake_tutorial.ipynb # Main tutorial
├── src/ducklake/ # Helper utilities
├── sqlmesh/ # SQLMesh config & models
├── data/ # Generated data (gitignored)
└── pyproject.toml # Dependencies
Uses NYC TLC Trip Record Data (Yellow Taxi, January 2024).