Skip to content

Edwardvaneechoud/pyfloe

Repository files navigation

pyfloe

Zero-dependency, lazy dataframes for Python.

pyfloe is a pure-Python dataframe library with lazy evaluation, query optimization, and type safety — no external dependencies required.

import pyfloe as pf

result = (
    pf.read_csv("orders.csv")
    .filter(pf.col("amount") > 100)
    .with_column("rank", pf.row_number()
        .over(partition_by="region", order_by="amount"))
    .select("order_id", "region", "amount", "rank")
    .sort("region", "rank")
    .collect()
)

Installation

pip install pyfloe

Key features

  • Lazy evaluation — operations build a query plan; data flows only when you collect
  • Expression API — composable column expressions with arithmetic, comparisons, string methods, and conditionals
  • Window functionsrow_number, rank, dense_rank, cumsum, lag, lead, and more
  • Datetime handling — auto-detection from CSV, .dt accessor for extraction, truncation, and arithmetic
  • Streaming I/O — read and write CSV, TSV, JSONL, JSON, and fixed-width files with constant memory
  • Query optimizer — filter pushdown and column pruning
  • Type safety — TypedDict validation and TypedLazyFrame for IDE-friendly typed results

Documentation

Full documentation is available at edwardvaneechoud.github.io/pyfloe.

Tutorial

Want to understand how a dataframe engine works under the hood? Build Your Own DataFrame is a five-module course that takes pyfloe apart piece by piece — expression trees, plan nodes, the volcano model, hash joins, and the query optimizer — until nothing feels like magic.

License

MIT

About

A minimal zero dependency dataframe library

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages