Skip to content

Conversation

@lewisjared
Copy link
Contributor

@lewisjared lewisjared commented Jan 19, 2026

Description

Add RegistryRequest class for fetching datasets from pooch registries (pmp-climatology, obs4ref) instead of ESGF, along with catalog handling improvements and CLI enhancements.

New RegistryRequest Class

  • Fetches datasets from pooch registries (pmp-climatology, obs4ref)
  • Parses registry keys to extract metadata (variable_id, source_id, version, etc.)
  • Automatically filters to latest version of each dataset
  • Satisfies the ESGFRequest protocol for compatibility

Catalog Improvements

  • Hash-based change detection prevents writing unchanged catalogs
  • Multi-file datasets (e.g., time-chunked data) are preserved with composite keys
  • Added get_catalog_hash() function for checking existing catalog state
  • save_datasets_to_yaml() returns bool indicating if written

CLI Improvements

  • fetch: Added --only-missing and --force flags
  • list: Added catalog/regression status columns
  • run: Refactored to iterate over multiple test cases for a provider
  • run: Added --dry-run, --if-changed, and --clean flags

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

…ndling

- Add RegistryRequest class for fetching datasets from pooch registries
  (pmp-climatology, obs4ref) instead of ESGF
- Add hash-based change detection to skip writing unchanged catalogs
- Support multi-file datasets (same instance_id, different files)
- Improve CLI test-cases commands:
  - Add --only-missing and --force flags to fetch command
  - Add catalog/regression status columns to list command
  - Refactor run command to iterate over multiple test cases
  - Add --dry-run, --if-changed, and --clean flags to run command
- Add comprehensive tests for new functionality
@codecov
Copy link

codecov bot commented Jan 19, 2026

- Create _git_utils.py with git status utilities (get_repo_for_path,
  get_git_status, collect_regression_file_info)
- Add format_size() to _utils.py for human-readable file sizes
- Move catalog_changed_since_regression() to climate_ref_core/testing.py
- Update test_cases.py to use extracted functions (729 lines, down from 808)
- Add comprehensive tests for all new/moved functions (100% coverage)

Previously untested functions now have dedicated tests:
- format_size: 14 parametrized tests
- get_git_status: 8 tests
- get_repo_for_path: 2 tests
- collect_regression_file_info: 6 tests
- catalog_changed_since_regression: 6 tests
Mark sections that are impractical to unit test:
- _print_regression_summary: Rich table display code
- fetch command dry-run display and ESGF fetching loop
- run command dry-run display table
@lewisjared lewisjared merged commit ceb9280 into main Jan 20, 2026
12 of 13 checks passed
@lewisjared lewisjared deleted the feat/registry-request-catalog-improvements-clean branch January 20, 2026 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants