Skip to content

Add note on how to save catalog datasets to zarr3#15

Open
leifdenby wants to merge 1 commit into
mainfrom
doc/note-on-saving-to-zarr3
Open

Add note on how to save catalog datasets to zarr3#15
leifdenby wants to merge 1 commit into
mainfrom
doc/note-on-saving-to-zarr3

Conversation

@leifdenby

@leifdenby leifdenby commented Nov 27, 2025

Copy link
Copy Markdown
Member

Describe your changes

I came across an issue when trying to save a subset of the radklim 5min dataset locally just now when starting up an ipython REPL with:

uvx --with mlcast-datasets ipython

It turns out the issue arises because the default with our current requirements is to install zarr==3, but the radklim dataset on EWC is in zarr2. Apparently it has changed how zarr represents the encoding (compression) of dataset, so that the encoding attribute must be changed for each variable. This is only an issue when you open a zarr2 dataset from the catalog and try to save into zarr3 format. Reading the zarr2 dataset while having zarr3 installed works just fine. I think we want to support both having zarr2 and zarr3 in the catalog since the zarr community is in a transition phase between the two formats (or at least that is my impression). I suggest adding a note to the README about how to work around this issue.

No change of dependencies needed.

Issue Link

no related issue

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality, e.g. adding a new dataset)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected, e.g. removing or moving a dataset in the catalog)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

Explain how to avoid codec errors when saving v2 datasets in the catalog as Zarr v3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant