Add future feature plans: 9 detailed implementation plans for planned Flowfile features#377
Open
Edwardvaneechoud wants to merge 17 commits intomainfrom
Open
Add future feature plans: 9 detailed implementation plans for planned Flowfile features#377Edwardvaneechoud wants to merge 17 commits intomainfrom
Edwardvaneechoud wants to merge 17 commits intomainfrom
Conversation
… Flowfile features Overview document (docs/future_features.md) with foundational node containment model analysis and per-feature plan files covering: - Iterative Nodes (Option B: embedded sub-flow) - Conditional Execution (Option A: parent pointer) - Delta Lake Catalog Storage - Flow Parameters - Catalog Query & Data Exploration (SQL + GraphicWalker) - Extended Connectors (PostgreSQL, MySQL, BigQuery, Snowflake) - Standardized Custom Node Designer - Flow as Custom Node (Option C: referenced flow) - Enhanced Code Generation (catalog reads/writes + kernel code wrapping) https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
…admap - Move docs/plans/ → docs/for-developers/roadmap/ - Add roadmap index with mermaid dependency diagram - Register all 9 feature pages in mkdocs.yml nav - Link roadmap from developer index page - Remove standalone docs/future_features.md https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
…support The original plan incorrectly stated Flowfile had only "generic database read/write." In fact, database_reader already supports PostgreSQL, MySQL, MariaDB, SQLite, MSSQL, Oracle via ConnectorX with both table and SQL query modes, plus stored connection references and 100+ SQL type mappings. Rewritten to focus on what's actually missing: partitioned reads, bulk loading, BigQuery/Snowflake cloud warehouse support, and incremental loading. https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
…d today Previous version incorrectly claimed broad database support. In reality: - Database: only PostgreSQL is tested, implemented in the UI, and documented - Cloud storage: only AWS S3 is tested and production-ready - Other databases (MySQL, SQLite, MSSQL) exist only as type hints - ADLS/GCS exist only as schema definitions, not tested implementations Restructured plan around phased delivery: MySQL first, then ADLS/GCS, then BigQuery/Snowflake, then PostgreSQL enhancements. https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
… not catalog The catalog is internal to Flowfile (data produced/consumed within flows). Extended connectors are about external database and cloud storage nodes. Removed false dependency between connectors and catalog query in the dependency diagram. https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
… problem The original plan incorrectly framed the custom node API as inaccessible. In reality, the CustomNodeBase + process() pattern is clean and intuitive. The actual problem is kernel code generation: generate_kernel_code() translates clean process() code into proxy classes (_V, _Self), rewrites return statements, and produces unintuitive generated code. Refocused the plan on making the code users write be the code that runs in the kernel. https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
Flow Parameters: rewritten to document the substantial implementation on feature/add-flow-parameters branch (parameter_resolver.py, FlowParametersPanel, flow_graph integration, 763 lines of tests). Refocused on what remains. Custom Node Designer: acknowledge the visual designer already exists in the frontend (3-panel drag-and-drop builder with Polars autocompletion). Custom nodes always use Polars code. Refocused plan on the kernel code generation gap and packaging/sharing. https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
…not script Two key design changes: - Generated code can depend on flowfile (from flowfile import read_from_catalog) rather than trying to be fully standalone - Output is a Python package, not a single script file. Each sub-flow, iterator body, condition branch, and custom node gets its own module with a process() function. Main flow imports and calls them. This is essential for features 1 (iterative), 2 (conditional), and 8 (flow-as-node) where a single script would be unmanageable. https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
…splitting Condition nodes are if/else at the flow level: evaluate a condition on the whole DataFrame (df.count() == 12, column existence, aggregate checks) and route the ENTIRE df to one branch. The other branch is skipped entirely. This is fundamentally different from the filter node (row-level subsetting). Updated execution semantics, code generation (if/else with branch modules), expression examples, and schema inference accordingly. https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
Merge keys are specified on the catalog_writer node settings (write_mode + merge_keys fields), not magically known. The user selects them when configuring the write operation. Optionally, catalog table metadata can declare primary keys for validation. https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
…ble API Polars LazyFrame.sink_delta() natively supports mode='merge' with delta_merge_options for predicate/alias config, returning a TableMerger. No need to convert to Arrow or manage DeltaTable objects manually. Note: sink_delta is marked unstable in Polars, should be monitored. https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
Extracted cloud catalog storage from Delta Lake plan (was Phase 4) and expanded into a proper feature covering: - PostgreSQL as alternative to SQLite for catalog metadata - Cloud storage (S3/ADLS/GCS) for catalog table data - Shared catalog for multi-user/team deployments - Catalog federation (external tables without data copying) The current catalog uses SQLite + local Parquet. The SQLAlchemy ORM and CatalogRepository protocol provide a good foundation for making backends pluggable. Alembic migrations needed for schema evolution. https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
…plitting The previous example used df.filter() to split rows into two streams, which contradicts the flow-level branching design (Feature 2). Conditions route the entire DataFrame to one branch via a Python if/else, not both. https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
…tures Overview now includes: - Recently Shipped section documenting parallel execution, kernel runtime, named I/O, flow catalog, catalog reader/writer, scheduling, embeddable WASM - Implementation order in 5 phases: Foundations → Connectivity → Control Flow → Composition & Generation → Scale - Feature table reordered by implementation sequence, not original numbering - Updated mermaid dependency diagram with all 10 features - MkDocs nav reordered to match implementation sequence https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
- Delete 04_flow_parameters.md (shipped) - Remove "Recently Shipped" section (inaccurate) - Reorder feature table and nav by implementation sequence: Phase 1 (Storage): Delta Lake Phase 2 (Connectivity): Extended Connectors, Catalog Query Phase 3 (Control Flow): Custom Node Designer, Conditional, Iterative Phase 4 (Composition): Code Generation, Flow as Custom Node Phase 5 (Scale): Cloud & Distributed Catalog - Update references to Flow Parameters as shipped in other plans https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd
✅ Deploy Preview for flowfile-wasm ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview document (docs/future_features.md) with foundational node containment
model analysis and per-feature plan files covering:
https://claude.ai/code/session_01AUuzPPf1NKgNZaWno58wAd