A lightweight federated query engine in Python that allows users to write SQL as if all tables exist in a single unified database. The engine then parses the query, analyzes its predicates, and constructs individual sub‑queries for each backend database.
This project is built on top of sqlglot for SQL parsing, with a focus on extensibility and clear, modular internals.
- Uses
sqlglotto parse SQL queries into an AST. - Supports extraction of referenced tables, columns, and predicates.
- Safely validates and normalizes user‑provided SQL.
- Breaks down the
WHEREclause into conjunctive predicates. - Determines which predicates can be pushed down to individual backend databases.
- Produces optimized per‑source query fragments.
- For each backend database involved, constructs:
- A minimal
SELECTclause containing only required columns. - Pushed‑down filters.
- Properly isolated sub‑queries for distributed execution.
- A minimal
- Ensures consistency in aliasing and naming across fragment queries.
- A catalog layer mapping tables to physical database locations.
- Allows the engine to know where each table lives.
- Enables metadata‑aware optimization.
- Connection pooling and session management via SQLAlchemy.
- Uniform interface for executing sub‑queries across heterogeneous SQL backends.
- Support for multi‑dialect compilation.
- Returned results from each database will be collected into Arrow tables.
- Enables efficient in‑memory processing.
- Paves the way for zero‑copy interoperability with analytical tools.
- A web‑based interface for browsing catalogs, schemas, and table metadata.
- Interactive preview of tables and columns.
- Automatic version tracking for user queries.
- Ability to inspect and reproduce past queries.
- Built‑in diffing and auditability.
- A lightweight SQL editor UI built into the Flask app.
- Syntax highlighting, validation, auto‑completion.
- Integrated execution output panel.
- Modular: Each phase—parsing, planning, execution—is isolated and testable.
- Extensible: New database engines and optimizations can be plugged in easily.
- Transparent: Users should understand exactly how queries get split and executed.