Zero-dependency, lazy dataframes for Python.
pyfloe is a pure-Python dataframe library with lazy evaluation, query optimization, and type safety — no external dependencies required.
import pyfloe as pf
result = (
pf.read_csv("orders.csv")
.filter(pf.col("amount") > 100)
.with_column("rank", pf.row_number()
.over(partition_by="region", order_by="amount"))
.select("order_id", "region", "amount", "rank")
.sort("region", "rank")
.collect()
)pip install pyfloe- Lazy evaluation — operations build a query plan; data flows only when you collect
- Expression API — composable column expressions with arithmetic, comparisons, string methods, and conditionals
- Window functions —
row_number,rank,dense_rank,cumsum,lag,lead, and more - Datetime handling — auto-detection from CSV,
.dtaccessor for extraction, truncation, and arithmetic - Streaming I/O — read and write CSV, TSV, JSONL, JSON, and fixed-width files with constant memory
- Query optimizer — filter pushdown and column pruning
- Type safety — TypedDict validation and
TypedLazyFramefor IDE-friendly typed results
Full documentation is available at edwardvaneechoud.github.io/pyfloe.
Want to understand how a dataframe engine works under the hood? Build Your Own DataFrame is a five-module course that takes pyfloe apart piece by piece — expression trees, plan nodes, the volcano model, hash joins, and the query optimizer — until nothing feels like magic.
MIT