-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Using an intake catalogue would be a really effective, clear way to document and ingest data when working on Lorenz with Parcels. Exploring the catalogue and looking at data descriptions would be as simple as:
import intake
# Load the catalogue (either local file or from a URL)
cat = intake.open_catalog('https://raw.githubusercontent.com/.../catalog.yaml')
# Explore what’s available
list(cat)
# ['gcm_ocean_data', ...]
# Display a single dataset entry
cat.gcm_ocean_datagcm_ocean_data:
args:
urlpath: '...' # A local path on Lorenz
description: 'Ocean GCM simulation output including temperature, salinity, and velocity fields.'
driver: zarr
metadata:
institution: 'IMAU'
frequency: 'monthly'
model: 'GCM-XYZ'And reading in an xarray dataset would be as simple as:
# Load the dataset as an xarray object
ds = cat.gcm_ocean_data.to_dask()Note that given Intake would produce Xarray objects, there is a limited benefit doing this work before Parcels v4.
This Intake catalogue can then be built into a website using a script such as the one used in the WCRP KM Scale Hackathon 2025 (website - build workflow). Though for our usecases, since this is internal, perhaps its just easier for us to have users look at the raw YAML file. The same information would be surfaced, just looks slightly nicer in a website. Thoughts @erikvansebille ?
Items:
- Define Intake catalogue in a GitHub repo adding all datasets (transfer all info from the wiki into the catalogue)
Tangential, but related to data ingestion. @erikvansebille , I just want flag that Pangeo Forge provides recipes for fetching datasets from the original providers and bringing them onto disk in a unified format. This would be helpful for updating datasets in a standardized way and also for providing new datasets (so that we don't have to manually update them via scripts). Would this be usefule - is it a burden updating Lorenz datasets, or not really since its only done occasionally? Also I'm not 100% sure how "alive" the Pangeo Forge project is and whether we can get a local runner in place to download these datasets pangeo-forge/pangeo-forge-recipes#814 (comment)