Fix: expand dict transformation#9561
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
1 issue found across 3 files
Architecture diagram
sequenceDiagram
participant User as User (Marimo UI)
participant DFPlugin as Dataframe Plugin
participant Handler as ExpandDict Handler (handlers.py)
participant Narwhals as Narwhals Layer
participant Polars as Polars Engine
participant Backend as Original Backend (pandas/Ibis/other)
Note over User,Backend: Expand Dict Transformation Flow
User->>DFPlugin: Trigger expand dict on column
DFPlugin->>Handler: handle_expand_dict(df, transform)
Handler->>Narwhals: collect_and_preserve_type(df)
Narwhals->>Backend: Collect actual data from original backend
Backend-->>Narwhals: Data as native type
Narwhals-->>Handler: Collected DataFrame + undo function
Handler->>Polars: collected_df.to_polars()
Note over Handler,Polars: Convert to Polars for unnest support
Polars->>Polars: polars_df.unnest(column_id)
Note over Polars: Handles null dict values correctly
Polars-->>Handler: Unnested Polars DataFrame
Handler->>Narwhals: nw.from_native(unnested)
Narwhals->>Handler: Narwhals wrapper
Handler->>Handler: undo(narwhals_df)
Note over Handler: Convert back to original backend type
Handler-->>DFPlugin: Transformed DataFrame
DFPlugin-->>User: Updated table with expanded columns
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Re-trigger cubic
| {"A": [{"foo": 1, "bar": "hello"}], "B": [1]} | ||
| {"A": [{"foo": 1, "bar": "hello"}], "B": [1]}, | ||
| ), | ||
| create_test_dataframes( |
There was a problem hiding this comment.
polars should already be created in this create_test_dataframes. so we dont need to add the dataframe below
mscolnick
left a comment
There was a problem hiding this comment.
need to take an optional dep on polars
… rows with the create test dataframes instead.
refactor: adding None == NaN in assert frame equal with nans method to use it in expand_dict test.
Done. |
|
There are some pandas CI errors that i am looking into. |
… errors for mixed object columns.
This is happening because of data conversion mismatch between pandas and narwahls with mixed data types for So my only option is to fallback to pandas backend processing separately for the |
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
… json normalise following the handlers code.
Bundle ReportBundle size has no change ✅ Affected Assets, Files, and Routes:view changes for bundle: marimo-esmAssets Changed:
|
|
@mscolnick Current Behaviour Proposal I have added the fix to this PR, please let me know your thoughts. |
Screen.Recording.2026-05-21.at.18.07.581.mov |
… of pandas, which raises an error with pd.json_normalise for expand_dict function.
|
@mscolnick Just a gentle nudge. |
|
@kirangadhave I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
Pull request overview
This PR makes the dataframe Expand Dict transform robust to nulls by routing expansion through backend-native implementations (Polars unnest and Pandas json_normalize) and adds/updates tests to validate the behavior, including nested dict values.
Changes:
- Update runtime transform handling to expand dict/struct columns using Pandas-native logic for pandas inputs and Polars
unnestotherwise. - Update generated “print code” for Expand Dict in pandas and polars to match the new implementations.
- Expand test coverage for Expand Dict with nulls and nested dicts; adjust equality helper to optionally treat
NoneandNaNas equivalent.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| marimo/_plugins/ui/_impl/dataframes/transforms/handlers.py | Implements Expand Dict via pandas json_normalize or Polars unnest after collection. |
| marimo/_plugins/ui/_impl/dataframes/transforms/print_code.py | Updates printed code generation for Expand Dict for pandas and polars. |
| tests/_plugins/ui/_impl/dataframes/test_handlers.py | Unskips/extends Expand Dict tests (nulls + nested dicts) and tweaks dataframe comparison helper. |
| tests/_plugins/ui/_impl/dataframes/test_print_code.py | Adds print-code parity tests for Expand Dict with nested dicts for pandas and polars. |
| polars_df = collected_df.to_polars() | ||
| unnested = polars_df.unnest(transform.column_id) | ||
| return undo(nw.from_native(unnested)) |
| result = apply(df, in_transform) | ||
| assert_frame_equal_with_nans(result, expected) | ||
|
|
||
| @staticmethod | ||
| @pytest.mark.parametrize( | ||
| ("df", "expected"), | ||
| list( | ||
| zip( | ||
| create_test_dataframes( | ||
| {"nulls": [1, 2, 3, None, "hello"]}, include=["pandas"] | ||
| ), | ||
| create_test_dataframes({"nulls": [None]}, include=["pandas"]), | ||
| strict=False, | ||
| ) | ||
| ), | ||
| ) | ||
| def test_filter_rows_null_pandas_object( | ||
| df: DataFrameType, expected: DataFrameType | ||
| ) -> None: | ||
| in_transform = FilterRowsTransform( | ||
| type=TransformType.FILTER_ROWS, | ||
| operation="keep_rows", | ||
| where=FilterGroup( | ||
| type="group", | ||
| operator="and", | ||
| children=[ | ||
| FilterCondition( | ||
| type="condition", | ||
| column_id="nulls", | ||
| operator="in", | ||
| value=[None], | ||
| ) | ||
| ], | ||
| ), | ||
| ) | ||
| result = apply(df, in_transform) | ||
| assert_frame_equal_with_nans(result, expected) | ||
|
|
||
| @staticmethod | ||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
No issues found across 4 files
Architecture diagram
sequenceDiagram
participant UI as DataFrame UI
participant Handler as NarwhalsTransformHandler
participant Narwhals as Narwhals Layer
participant Backend as DataFrame Backend
participant PrintCode as Print Code Generator
Note over UI,PrintCode: Expand Dict Transform Flow - Current State
UI->>Handler: handle_expand_dict(DataFrame, ExpandDictTransform)
Handler->>Narwhals: collect_and_preserve_type(df)
Narwhals-->>Handler: (collected_df, undo)
Handler->>Narwhals: collected_df.to_native()
Narwhals-->>Handler: native_df
alt Pandas Backend
Handler->>Handler: Check if pandas dataframe
Handler->>Backend: result_df = native_df.copy()
Handler->>Backend: expanded = pd.json_normalize(result_df.pop(column_id).map(...), max_level=0)
Backend-->>Handler: expanded DataFrame
Handler->>Backend: expanded.index = result_df.index
Handler->>Backend: result_df.join(expanded)
Backend-->>Handler: joined DataFrame
Handler->>Narwhals: undo(nw.from_native(joined))
Narwhals-->>Handler: original backend type
else Polars Backend
Handler->>Narwhals: collected_df.to_polars()
Narwhals-->>Handler: polars_df
Handler->>Backend: polars_df.unnest(column_id)
Backend-->>Handler: unnested DataFrame
Handler->>Narwhals: undo(nw.from_native(unnested))
Narwhals-->>Handler: original backend type
end
Handler-->>UI: Transformed DataFrame
Note over UI,PrintCode: Print Code Generation
UI->>PrintCode: Generate Python code for transform
PrintCode->>PrintCode: Check backend type
alt Pandas Backend
PrintCode->>PrintCode: Generate: df.join(pd.json_normalize(df.pop(col).map(...), max_level=0).set_axis(...))
else Polars Backend
PrintCode->>PrintCode: Generate: df.unnest(column_id)
end
PrintCode-->>UI: Generated code string
|
@Shamik-07 can you please update the video in the PR to a higher res version? I'm having difficulty reading text in it |




📝 Summary
Using narwahls to convert all backend to polars and then using the
unnestfunction of polars for expanding the dict and then convert it back to the original backend.Closes #4583
Screen.Recording.2026-05-15.at.18.07.461.mov
📋 Pre-Review Checklist
✅ Merge Checklist