Skip to content

Fix: expand dict transformation#9561

Open
Shamik-07 wants to merge 25 commits into
marimo-team:mainfrom
Shamik-07:fix/expand_dict_transformation
Open

Fix: expand dict transformation#9561
Shamik-07 wants to merge 25 commits into
marimo-team:mainfrom
Shamik-07:fix/expand_dict_transformation

Conversation

@Shamik-07
Copy link
Copy Markdown
Contributor

@Shamik-07 Shamik-07 commented May 15, 2026

📝 Summary

Using narwahls to convert all backend to polars and then using the unnest function of polars for expanding the dict and then convert it back to the original backend.
Closes #4583

Screen.Recording.2026-05-15.at.18.07.461.mov

📋 Pre-Review Checklist

  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • Video or media evidence is provided for any visual changes (optional).

✅ Merge Checklist

  • I have read the contributor guidelines.
  • Documentation has been updated where applicable, including docstrings for API changes.
  • Tests have been added for the changes made.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment May 29, 2026 6:12pm

Request Review

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Architecture diagram
sequenceDiagram
    participant User as User (Marimo UI)
    participant DFPlugin as Dataframe Plugin
    participant Handler as ExpandDict Handler (handlers.py)
    participant Narwhals as Narwhals Layer
    participant Polars as Polars Engine
    participant Backend as Original Backend (pandas/Ibis/other)

    Note over User,Backend: Expand Dict Transformation Flow

    User->>DFPlugin: Trigger expand dict on column
    DFPlugin->>Handler: handle_expand_dict(df, transform)

    Handler->>Narwhals: collect_and_preserve_type(df)
    Narwhals->>Backend: Collect actual data from original backend
    Backend-->>Narwhals: Data as native type
    Narwhals-->>Handler: Collected DataFrame + undo function

    Handler->>Polars: collected_df.to_polars()
    Note over Handler,Polars: Convert to Polars for unnest support

    Polars->>Polars: polars_df.unnest(column_id)
    Note over Polars: Handles null dict values correctly

    Polars-->>Handler: Unnested Polars DataFrame

    Handler->>Narwhals: nw.from_native(unnested)
    Narwhals->>Handler: Narwhals wrapper

    Handler->>Handler: undo(narwhals_df)
    Note over Handler: Convert back to original backend type

    Handler-->>DFPlugin: Transformed DataFrame
    DFPlugin-->>User: Updated table with expanded columns
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Re-trigger cubic

Comment thread marimo/_plugins/ui/_impl/dataframes/transforms/handlers.py
{"A": [{"foo": 1, "bar": "hello"}], "B": [1]}
{"A": [{"foo": 1, "bar": "hello"}], "B": [1]},
),
create_test_dataframes(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

polars should already be created in this create_test_dataframes. so we dont need to add the dataframe below

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Copy Markdown
Contributor

@mscolnick mscolnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to take an optional dep on polars

Shamik-07 added 3 commits May 19, 2026 17:45
… rows with the create test dataframes instead.
refactor: adding None == NaN in assert frame equal with nans method to use it in expand_dict test.
@Shamik-07
Copy link
Copy Markdown
Contributor Author

need to take an optional dep on polars

Done.
Added allow_none_equals_nan for assert_frame_equal_with_nans as None!=NaN, which was causing the use of assert_frame_equal_with_nans to fail for test_expand_dict.

@Shamik-07 Shamik-07 requested a review from mscolnick May 19, 2026 22:30
@Shamik-07
Copy link
Copy Markdown
Contributor Author

There are some pandas CI errors that i am looking into.

@Shamik-07
Copy link
Copy Markdown
Contributor Author

There are some pandas CI errors that i am looking into.

This is happening because of data conversion mismatch between pandas and narwahls with mixed data types

"Could not convert '3' with type str: tried to convert to double"

for

test_print_code_result_matches_actual_transform_pandas(
    transform=ExpandDictTransform(
        type=TransformType.EXPAND_DICT,
        column_id='strings',
    ),
)

So my only option is to fallback to pandas backend processing separately for the unnest.
This should fix it.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread marimo/_plugins/ui/_impl/dataframes/transforms/handlers.py
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Bundle Report

Bundle size has no change ✅

Affected Assets, Files, and Routes:

view changes for bundle: marimo-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/dist-*.js 7 bytes 183 bytes 3.98%
assets/dist-*.js 250 bytes 387 bytes 182.48% ⚠️
assets/dist-*.js 171 bytes 335 bytes 104.27% ⚠️
assets/dist-*.js -6 bytes 177 bytes -3.28%
assets/dist-*.js -100 bytes 176 bytes -36.23%
assets/dist-*.js -13 bytes 164 bytes -7.34%
assets/dist-*.js 299 bytes 403 bytes 287.5% ⚠️
assets/dist-*.js -23 bytes 137 bytes -14.37%
assets/dist-*.js -155 bytes 104 bytes -59.85%
assets/dist-*.js 20 bytes 276 bytes 7.81% ⚠️
assets/dist-*.js -301 bytes 102 bytes -74.69%
assets/dist-*.js 73 bytes 256 bytes 39.89% ⚠️
assets/dist-*.js -231 bytes 104 bytes -68.96%
assets/dist-*.js 79 bytes 183 bytes 75.96% ⚠️
assets/dist-*.js 67 bytes 169 bytes 65.69% ⚠️
assets/dist-*.js 56 bytes 160 bytes 53.85% ⚠️
assets/dist-*.js -65 bytes 104 bytes -38.46%
assets/dist-*.js -128 bytes 259 bytes -33.07%
assets/__vite-*.js -5 bytes 93 bytes -5.1%
assets/__vite-*.js 5 bytes 98 bytes 5.38% ⚠️

@Shamik-07
Copy link
Copy Markdown
Contributor Author

@mscolnick
Apart from the cubic ai's review comment, we started this because expand_dict was having trouble with NaN rows.
We have certainly fixed that, but the issue is regarding deeply nested dicts being expanded by pandas if we are using pd.json_normalize vs. polars unnest will only unnest a single level by default. This will yield discrepancies between both the backend outputs.

Current Behaviour
e.g. df with nested columns
image

polars unnest
image

pandas unnest
image

Proposal
Let's unnest only a single level in pandas thus matching polars unnest.
This will always give us the same backend agnostic output
image

I have added the fix to this PR, please let me know your thoughts.

@Shamik-07
Copy link
Copy Markdown
Contributor Author

Screen.Recording.2026-05-21.at.18.07.581.mov

… of pandas, which raises an error with pd.json_normalise for expand_dict function.
@Shamik-07
Copy link
Copy Markdown
Contributor Author

@mscolnick Just a gentle nudge.

@kirangadhave
Copy link
Copy Markdown
Member

@cubic-dev-ai

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented May 30, 2026

@cubic-dev-ai

@kirangadhave I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes the dataframe Expand Dict transform robust to nulls by routing expansion through backend-native implementations (Polars unnest and Pandas json_normalize) and adds/updates tests to validate the behavior, including nested dict values.

Changes:

  • Update runtime transform handling to expand dict/struct columns using Pandas-native logic for pandas inputs and Polars unnest otherwise.
  • Update generated “print code” for Expand Dict in pandas and polars to match the new implementations.
  • Expand test coverage for Expand Dict with nulls and nested dicts; adjust equality helper to optionally treat None and NaN as equivalent.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
marimo/_plugins/ui/_impl/dataframes/transforms/handlers.py Implements Expand Dict via pandas json_normalize or Polars unnest after collection.
marimo/_plugins/ui/_impl/dataframes/transforms/print_code.py Updates printed code generation for Expand Dict for pandas and polars.
tests/_plugins/ui/_impl/dataframes/test_handlers.py Unskips/extends Expand Dict tests (nulls + nested dicts) and tweaks dataframe comparison helper.
tests/_plugins/ui/_impl/dataframes/test_print_code.py Adds print-code parity tests for Expand Dict with nested dicts for pandas and polars.

Comment on lines +524 to +526
polars_df = collected_df.to_polars()
unnested = polars_df.unnest(transform.column_id)
return undo(nw.from_native(unnested))
Comment on lines 2407 to 2411
result = apply(df, in_transform)
assert_frame_equal_with_nans(result, expected)

@staticmethod
@pytest.mark.parametrize(
("df", "expected"),
list(
zip(
create_test_dataframes(
{"nulls": [1, 2, 3, None, "hello"]}, include=["pandas"]
),
create_test_dataframes({"nulls": [None]}, include=["pandas"]),
strict=False,
)
),
)
def test_filter_rows_null_pandas_object(
df: DataFrameType, expected: DataFrameType
) -> None:
in_transform = FilterRowsTransform(
type=TransformType.FILTER_ROWS,
operation="keep_rows",
where=FilterGroup(
type="group",
operator="and",
children=[
FilterCondition(
type="condition",
column_id="nulls",
operator="in",
value=[None],
)
],
),
)
result = apply(df, in_transform)
assert_frame_equal_with_nans(result, expected)

@staticmethod
@pytest.mark.parametrize(
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Architecture diagram
sequenceDiagram
    participant UI as DataFrame UI
    participant Handler as NarwhalsTransformHandler
    participant Narwhals as Narwhals Layer
    participant Backend as DataFrame Backend
    participant PrintCode as Print Code Generator

    Note over UI,PrintCode: Expand Dict Transform Flow - Current State

    UI->>Handler: handle_expand_dict(DataFrame, ExpandDictTransform)
    Handler->>Narwhals: collect_and_preserve_type(df)
    Narwhals-->>Handler: (collected_df, undo)
    Handler->>Narwhals: collected_df.to_native()
    Narwhals-->>Handler: native_df

    alt Pandas Backend
        Handler->>Handler: Check if pandas dataframe
        Handler->>Backend: result_df = native_df.copy()
        Handler->>Backend: expanded = pd.json_normalize(result_df.pop(column_id).map(...), max_level=0)
        Backend-->>Handler: expanded DataFrame
        Handler->>Backend: expanded.index = result_df.index
        Handler->>Backend: result_df.join(expanded)
        Backend-->>Handler: joined DataFrame
        Handler->>Narwhals: undo(nw.from_native(joined))
        Narwhals-->>Handler: original backend type
    else Polars Backend
        Handler->>Narwhals: collected_df.to_polars()
        Narwhals-->>Handler: polars_df
        Handler->>Backend: polars_df.unnest(column_id)
        Backend-->>Handler: unnested DataFrame
        Handler->>Narwhals: undo(nw.from_native(unnested))
        Narwhals-->>Handler: original backend type
    end
    Handler-->>UI: Transformed DataFrame

    Note over UI,PrintCode: Print Code Generation

    UI->>PrintCode: Generate Python code for transform
    PrintCode->>PrintCode: Check backend type

    alt Pandas Backend
        PrintCode->>PrintCode: Generate: df.join(pd.json_normalize(df.pop(col).map(...), max_level=0).set_axis(...))
    else Polars Backend
        PrintCode->>PrintCode: Generate: df.unnest(column_id)
    end
    PrintCode-->>UI: Generated code string
Loading

Re-trigger cubic

@kirangadhave
Copy link
Copy Markdown
Member

@Shamik-07 can you please update the video in the PR to a higher res version? I'm having difficulty reading text in it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New dataframe transform: Polars Native Expand Dict Transformation

4 participants