Skip to content

feat: add plotting subpackage for radar dataset diagnostics#30

Open
franchg wants to merge 10 commits into
mainfrom
feat/plotting-subpackage
Open

feat: add plotting subpackage for radar dataset diagnostics#30
franchg wants to merge 10 commits into
mainfrom
feat/plotting-subpackage

Conversation

@franchg

@franchg franchg commented Feb 18, 2026

Copy link
Copy Markdown
Member

Summary

  • Adds mlcast_datasets.plotting with 7 modules for radar dataset diagnostics:
    • domain_map — domain overview map with spatial coverage overlay
    • monthly_cycle — monthly precipitation climatology boxplot
    • precipitation_stats — mean/max/std maps + value histogram
    • sample_precipitation — precipitation event snapshot grid
    • spatial_coverage — data coverage fraction heatmap
    • temporal_coverage — monthly completeness heatmap + yearly bar chart
    • summary_table — metadata summary returning pd.DataFrame (CSV-saveable)
  • Makes plotting an optional install: pip install 'mlcast-datasets[plotting]'
  • Adds [tool.isort] profile = "black" to resolve pre-commit hook conflict

Test plan

  • uv run pytest src/mlcast_datasets/tests/ -q — all tests pass
  • uv pip install -e . — installs without plotting extras
  • python -c "import mlcast_datasets" — core import works
  • uv pip install -e ".[plotting]" — installs plotting extras
  • python -c "from mlcast_datasets.plotting import plot_domain_map" — plotting import works

Adds mlcast_datasets.plotting with 10 modules covering:
- Domain overview map with spatial coverage overlay
- Monthly precipitation climatology (boxplot)
- Precipitation statistics: mean/max/std maps + value histogram
- Sample precipitation event maps
- Spatial data coverage heatmap
- Temporal completeness heatmap + yearly timestep bar chart
- Summary metadata table (returns pd.DataFrame, saves as CSV)

Also makes plotting an optional install:
  pip install 'mlcast-datasets[plotting]'
(cartopy, dask, matplotlib, numpy, pandas)

Removes numpy, pandas, jupyter-server, ipykernel, xarray, cartopy,
matplotlib, tqdm from mandatory core dependencies. Docs-only packages
(jupyter-server, ipykernel) moved to [dependency-groups].docs.

Adds [tool.isort] profile = "black" to pyproject.toml to resolve
isort/black pre-commit hook conflict.
@franchg

franchg commented Feb 18, 2026

Copy link
Copy Markdown
Member Author

EXAMPLE OF USAGE

import mlcast_datasets
from mlcast_datasets.plotting import  plot_domain_map, plot_monthly_cycle, \
    plot_precipitation_stats, plot_sample_precipitation, plot_spatial_coverage, \
    plot_temporal_coverage, generate_summary_table
cat = mlcast_datasets.open_catalog()
ds = cat.precipitation.it_dpc_sri_5min.to_dask()
_ = plot_domain_map(ds.sel(time=slice("2025-01-01", None)), n_coverage_samples=1000)
image
_ = plot_spatial_coverage(ds, n_samples=10000)
image
_ = plot_monthly_cycle(ds, n_samples=10000)
image
_ = plot_sample_precipitation(ds, time_slice=slice("2023-07-01", "2023-07-02"), time_spacing_hours=1)
image
_ = plot_temporal_coverage(ds)
image
_ = plot_precipitation_stats(ds.sel(time=slice("2020-01-01", None)))
image image
generate_summary_table(ds)
Property Value
0 Time range 2010-01-01 to 2025-12-31
1 Total timesteps 1,039,785
2 Missing timesteps 15,530 (1.5%)
3 Grid dimensions 1400 × 1200 pixels
4 Spatial resolution 1 km
5 Data variable RR (Total precipitation rate)
6 Units kg m-2 h-1
7 Data type float32
8 Uncompressed volume 7.0 TB
9 Compressed volume N/A
10 Compression ratio N/A
11 Temporal frequency 15min (2010-01-01-2014-06-25), 10min (2014-06-...
12 CRS Transverse Mercator
13 License CC-BY-SA-4.0

@leifdenby leifdenby self-requested a review February 24, 2026 13:50

@leifdenby leifdenby left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add docstrings throughout? I will give it a thorough review once I have those :)

Comment thread src/mlcast_datasets/plotting/_map_helpers.py
franchg and others added 3 commits March 2, 2026 13:57
Convert all 21 functions across 10 files in the plotting subpackage
from one-liner or Google-style docstrings to full numpydoc format
with Parameters, Returns, and Raises sections.
Demonstrates all plotting functions with small sample sizes for CI.
Includes install instructions for the plotting extra.
@franchg franchg requested a review from leifdenby March 2, 2026 13:46
@leifdenby

Copy link
Copy Markdown
Member

This looks great @franchg :)

I have made a PR #37 which ensures that we build notebooks in CI in PRs (that doesn't happen now, I had overlooked that) and it also will comment with a link to this preview build. Maybe we could merge that first and then we can check with that how long your notebooks take to build?

@leifdenby leifdenby added this to the v0.3.0 milestone Apr 10, 2026
@leifdenby leifdenby modified the milestones: v0.3.0, v0.4.0 Apr 14, 2026
@github-actions

Copy link
Copy Markdown

View preview of built jupyterbooks on https://mlcast-community.github.io/mlcast-datasets/pr-preview/pr-30/
(preview is automatically rebuilt and uploaded on later commits)

@leifdenby

Copy link
Copy Markdown
Member

The rendered notebooks look great @franchg, but it takes the execution of the jupyterbook build from ~ 4min (https://github.com/mlcast-community/mlcast-datasets/actions/runs/24443479747/job/71413893785) to ~15min (https://github.com/mlcast-community/mlcast-datasets/actions/runs/24993116111/job/73183302255)

Maybe we need to think about how to can reduce the long-running computations a bit? Otherwise we need to work out how to execute the notebook build closer to the data (i.e. on a EWC host)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants