Skip to content

Fix: Incorrect PyArrow input syntax#2

Open
andresvalle wants to merge 3 commits into
mainfrom
ram_optimiztions_fix
Open

Fix: Incorrect PyArrow input syntax#2
andresvalle wants to merge 3 commits into
mainfrom
ram_optimiztions_fix

Conversation

@andresvalle

@andresvalle andresvalle commented Sep 17, 2025

Copy link
Copy Markdown
Member

Iván shared with me some of the .lag files so I could finally test this repo. I got this error when trying to run the changes in PR #1 :

File "/Users/polaris/tss/LAGO_data_model/lago_data_model/reader.py", line 47, in save_as_parquet_streaming
    table = pa.table(chunk_data, schema=schema)
  File "pyarrow/table.pxi", line 6200, in pyarrow.lib.table
  File "pyarrow/table.pxi", line 4895, in pyarrow.lib.Table.from_arrays
  File "pyarrow/table.pxi", line 1621, in pyarrow.lib._sanitize_arrays
ValueError: Schema and number of arrays unequal

This happened because PyArrow expects columnar data (separate arrays for each column), not row-based dictionaries. I originally wrote chunk_data as list of dictionaries, this is a fix proposal.

@andresvalle

andresvalle commented Sep 17, 2025

Copy link
Copy Markdown
Member Author

Did you run into the same problem when trying to execute the code?

@andresvalle

andresvalle commented Sep 17, 2025

Copy link
Copy Markdown
Member Author

@ErickDiaz Also, this is how the parquet output looks like. Does it match what you where expecting?

Screenshot 2025-09-17 at 2 27 45 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant