Skip to content

Change echem cacheing to .bdf.parquet#1663

Open
be-smith wants to merge 7 commits intomainfrom
bes/echem_parquet_cacheing
Open

Change echem cacheing to .bdf.parquet#1663
be-smith wants to merge 7 commits intomainfrom
bes/echem_parquet_cacheing

Conversation

@be-smith
Copy link
Copy Markdown
Member

@be-smith be-smith commented Mar 27, 2026

Changes the echem cacheing file from a .pkl to a .bdf.parquet file.

Fixes a bug where the cache wasn't being loaded for the single file mode, instead reloading from the raw file each time.

Also bumps navani to v0.1.17 so neware excel files are supported by the echem block

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 77.77778% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.16%. Comparing base (e9775f2) to head (144230b).

Files with missing lines Patch % Lines
pydatalab/src/pydatalab/apps/echem/blocks.py 77.77% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1663      +/-   ##
==========================================
- Coverage   79.24%   79.16%   -0.08%     
==========================================
  Files          79       79              
  Lines        6731     6759      +28     
==========================================
+ Hits         5334     5351      +17     
- Misses       1397     1408      +11     
Files with missing lines Coverage Δ
pydatalab/src/pydatalab/apps/echem/blocks.py 79.11% <77.77%> (-2.62%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the electrochemistry (echem) caching/export path to use a .bdf.parquet cache (instead of pickle), while still producing a .bdf.csv file for download where appropriate.

Changes:

  • Replace .RAW_PARSED.pkl caching with _cached.bdf.parquet caching in CycleBlock.
  • Add _save_bdf to write parquet (cache) and optionally CSV (download), and adjust load paths accordingly.
  • Add pyarrow to the apps extra and update tests to validate parquet caching behavior.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File Description
pydatalab/src/pydatalab/apps/echem/blocks.py Switches echem caching from pickle to parquet and adjusts single/multi load behavior and export paths.
pydatalab/tests/apps/test_echem_block.py Updates/extends tests to assert .bdf.parquet cache creation and removal of pickle artifacts.
pydatalab/pyproject.toml Adds pyarrow dependency to support parquet writes/reads.
pydatalab/uv.lock Locks pyarrow and related resolved artifacts.
Comments suppressed due to low confidence (1)

pydatalab/src/pydatalab/apps/echem/blocks.py:217

  • The _load_single docstring still says “with pickle caching”, but this PR removes pickle caching in favor of .bdf.parquet caching. Please update the docstring (and any references to “BDF export path” if it’s now specifically the .bdf.csv download path) to match the new caching behavior.
    def _load_single(self, file_id: ObjectId, reload: bool) -> tuple[pd.DataFrame, Path | None]:
        """Parse a single echem file using navani, with pickle caching.

        Returns the raw DataFrame and the BDF export path (or None if the source is already BDF
        or export failed).
        """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@cypress
Copy link
Copy Markdown

cypress bot commented Mar 27, 2026

datalab    Run #4775

Run Properties:  status check passed Passed #4775  •  git commit c9e03f7518 ℹ️: Merge 144230be5dd71a17928e73c1ff3d5fb205a2e32b into e9775f24ecaace728c9e315ba44b...
Project datalab
Branch Review bes/echem_parquet_cacheing
Run status status check passed Passed #4775
Run duration 09m 45s
Commit git commit c9e03f7518 ℹ️: Merge 144230be5dd71a17928e73c1ff3d5fb205a2e32b into e9775f24ecaace728c9e315ba44b...
Committer Ben Smith
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 488
View all changes introduced in this branch ↗︎

@be-smith be-smith marked this pull request as ready for review March 30, 2026 12:03
@be-smith be-smith requested a review from ml-evs March 30, 2026 12:04
…he cache was never being hit. Added useful debugging statements. Fixed bug where navani "state" column with mixed dtypes couldn't be parqued
… succesfully be saved as parquet cache files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants