Flowfile

Documentation: Website - Core - Worker - Frontend - Try Online - Technical Architecture

Flowfile is an open-source data platform that combines a visual pipeline builder, a data catalog with Delta Lake storage, scheduling, Kafka ingestion, sandboxed Python execution, and a Polars-compatible Python API — all installable with pip install Flowfile.

What Flowfile Does

Most ETL tools make you choose: visual and simple, or powerful and complex. Flowfile doesn't.

Visual pipeline builder — Drag-and-drop canvas with 30+ nodes for joins, filters, aggregations, fuzzy matching, pivots, and more
Data catalog with Delta Lake — Every table is stored as Delta Lake with full version history, time travel, and merge/upsert support
Scheduling — Interval-based or triggered by catalog table updates, built into the catalog UI
Kafka ingestion — Read from Kafka/Redpanda topics as a canvas node, with automatic schema inference
Sandboxed Python execution — Run arbitrary Python code in isolated Docker containers via the Python Script node
Flow parameters — ${variable} substitution across any node setting, configurable via UI or CLI
Code generation — Export visual flows as standalone Python/Polars scripts
Cloud storage — Read/write to S3, Azure Data Lake Storage, and Google Cloud Storage
Database connectivity — PostgreSQL, MySQL, SQL Server, Oracle, DuckDB, and more
Python API — Build pipelines programmatically with a Polars-like syntax, then visualize them on the canvas

Every feature serves the canvas. Kafka isn't a separate streaming module — it's a node. Scheduling isn't a separate orchestration UI — it's part of the catalog. Delta Lake isn't a YAML-configured storage backend — it's how the catalog works.

Quick Start

pip install Flowfile
flowfile run ui

That's it. This starts the backend services and opens the web UI in your browser.

Python API

import flowfile as ff
from flowfile import col

df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})

result = df.filter(col("value") > 150).with_columns([
    (col("value") * 2).alias("double_value")
])

# Open the pipeline on the visual canvas
ff.open_graph_in_editor(result.flow_graph)

Platform Capabilities

Data Catalog & Delta Lake

The catalog is Flowfile's central hub. Register flows, track runs, browse tables, and manage schedules — all in one place.

Tables are stored as Delta Lake with overwrite, append, upsert, update, and delete modes
Full version history and time travel for every catalog table
Lineage tracking — see which flows produce and consume each table
Namespace hierarchy — organize tables into catalogs and schemas

Scheduling

Schedule registered flows to run on intervals or in response to data changes:

Interval — run every N minutes/hours/days
Table trigger — run when a specific catalog table is updated
Table set trigger — run when all tables in a set have been updated

Kafka / Redpanda

Drop a Kafka Source node on the canvas, pick a connection and topic, and you have streaming data in your pipeline. Schema is inferred automatically from message samples.

Sandboxed Python Execution

The Python Script node runs your code in an isolated Docker container with its own package environment. Write matplotlib visualizations, use scikit-learn models, or run any Python code — the output flows back into the pipeline.

Code Generation

Click "Generate Code" to export any visual flow as a standalone Python script using Polars. Deploy it anywhere, version control it, or hand it to a teammate who prefers code over canvas.

Automatically generate Polars code from visual flows

Flow Parameters

Parameterize any flow with ${variable} syntax. Set defaults in the UI, override from the CLI:

flowfile run flow pipeline.json --param input_dir=/data/2025/q1 --param output_table=summary

Cloud Storage & Databases

Read and write CSV, Parquet, JSON, and Delta Lake files from S3, Azure Data Lake Storage, and Google Cloud Storage. Connect to PostgreSQL, MySQL, SQL Server, Oracle, DuckDB, and more.

Installation Options

Python Package (Recommended)

pip install Flowfile
flowfile run ui

Docker

git clone https://github.com/edwardvaneechoud/Flowfile.git
cd Flowfile
docker compose up -d

Access the app at http://localhost:8080.

Desktop Application

Download from GitHub Releases for Windows, macOS, or Linux.

Note: You may see security warnings since the app isn't signed yet.

Windows: Click "More info" > "Run anyway"

macOS: Run find /Applications/Flowfile.app -exec xattr -c {} \; then open normally

Browser (Lite)

Try the WASM version at demo.flowfile.org — runs entirely in your browser, no server needed. Includes 14 core transformation nodes.

Development Setup

git clone https://github.com/edwardvaneechoud/Flowfile.git
cd Flowfile
poetry install

# Start backend services
poetry run flowfile_worker  # :63579
poetry run flowfile_core    # :63578

# Start frontend (new terminal)
cd flowfile_frontend
npm install && npm run dev:web  # :8080

Architecture

Frontend (Electron/Web/WASM) --> flowfile_core (:63578) --> flowfile_worker (:63579)
                                       |                          |
                                   Catalog DB              Heavy computation
                                   Scheduler               Data processing
                                   Auth (JWT)
                                       |
                                 kernel_runtime (Docker, :9999)
                                   Sandboxed Python execution

flowfile_core — Central FastAPI app: flow engine (DAG execution), catalog, scheduler, auth, connections, secrets
flowfile_worker — Separate FastAPI service for CPU-intensive data operations
kernel_runtime — Docker containers for isolated Python code execution
flowfile_frame — Standalone Python library with lazy evaluation and Polars-compatible API

Flows are directed acyclic graphs (DAGs). Each node is a data operation. Edges carry data between nodes. The engine supports parallel execution, schema prediction, and two execution modes (Development for step-by-step debugging, Performance for optimized batch runs).

For a deeper dive, see Technical Architecture.

License

MIT License

Acknowledgments

Built with Polars, Vue.js, FastAPI, VueFlow, Delta Lake, and Electron.

Name		Name	Last commit message	Last commit date
Latest commit History 489 Commits
.github		.github
build_backends/build_backends		build_backends/build_backends
docs		docs
flowfile		flowfile
flowfile_core		flowfile_core
flowfile_frame		flowfile_frame
flowfile_frontend		flowfile_frontend
flowfile_scheduler/flowfile_scheduler		flowfile_scheduler/flowfile_scheduler
flowfile_wasm		flowfile_wasm
flowfile_worker		flowfile_worker
kernel_runtime		kernel_runtime
shared		shared
test_utils		test_utils
tests		tests
tools		tools
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
feature-md-catalog-to-delta.md		feature-md-catalog-to-delta.md
main.spec		main.spec
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
readme-pypi.md		readme-pypi.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flowfile

What Flowfile Does

Quick Start

Python API

Platform Capabilities

Data Catalog & Delta Lake

Scheduling

Kafka / Redpanda

Sandboxed Python Execution

Code Generation

Flow Parameters

Cloud Storage & Databases

Installation Options

Python Package (Recommended)

Docker

Desktop Application

Browser (Lite)

Development Setup

Architecture

License

Acknowledgments

About

Uh oh!

Releases 46

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Flowfile

What Flowfile Does

Quick Start

Python API

Platform Capabilities

Data Catalog & Delta Lake

Scheduling

Kafka / Redpanda

Sandboxed Python Execution

Code Generation

Flow Parameters

Cloud Storage & Databases

Installation Options

Python Package (Recommended)

Docker

Desktop Application

Browser (Lite)

Development Setup

Architecture

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 46

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages