Skip to content

Adopt ESM1.6's new filenames#241

Open
joshuatorrance wants to merge 30 commits into
mainfrom
esm1p6-filenames
Open

Adopt ESM1.6's new filenames#241
joshuatorrance wants to merge 30 commits into
mainfrom
esm1p6-filenames

Conversation

@joshuatorrance

@joshuatorrance joshuatorrance commented May 21, 2026

Copy link
Copy Markdown
Collaborator

Overview

It has been decided (see Zulip topic) that ESM1.6's output files (ocean, ice and atmosphere files) will follow the naming scheme for OM3.

um2nc needs to use the new filename pattern for single-variable output files for the ESM1.6 driver.
For ESM1.6 we have:
<model>.<component>.<dimension>.<field>.<frequency>.<time_cell_method>.<datestamp>.nc

  • <model>: access-esm1p6
  • <component>: umX.Y where X.Y is the UM version used
  • <dimension>: 2D or 3D, i.e. the number of dimensions for the field, excluding time
  • <field>: The variable/field name
  • <frequency>: The frequency of the data
  • <time_cell_method>: Function applied along the time axis, e.g. mean, max (optional)
  • <datestamp>: Date for the data (optional)

The optional portions are omitted where they are not applicable, e.g. variables without time aggregation, or fx files with no time component.

Implementation

Certain portions of the filename need information from a couple of places:

  • From the input filename: frequency
  • From the source cube : um_version, dimension, field, frequency, time_cell_method, datestamp

This means that the output file name cannot be fully defined until the iris cube for the data is available, i.e. in the process function of um2netcdf.py.

DelayedCubePath

Prior to these changes the output paths were defined well before process was called in the Driver's get_output_path method. For single variable output files, the variable name crudely prepended to the output filename during process - things were implemented that way as a placeholder for a change such as this.

To allow the output file name to be determined later get_output_path now returns a Path (for multi-variable output files) or a DelayedCubePath (for single-variable output files) which has a resolve_cube method to allow process to resolve the remaining unknowns in the filename without having to know anything about which scenario called process.

  • For ESM1.6 a subclass, Esm1p6DelayedCubePath, contains the functionality to resolved the filename as described above
  • For non-ESM1.6 workflows DelayedCubePath replicates the previous behaviour of just prepending the output filename with the variable name.

Changes

This PR messes with some of the nuts and bolts, I've attempted to isolate changes into separate commits.

  • The custom Exceptions needed to be moved to another file to avoid circular dependencies
  • get_ff_date was changed to use a datetime instead of a tuple to support refactoring I later decided not to do but a datetime seems neater so I left that change in reverted
  • Added DelayedCubePath which allows abstraction of not-fully-resolved paths that need info from Iris Cubes

Still to do

  • Improve handling of instance attributes in Esm1p6DelayedCubePath
  • Unit tests for DelayedCubePaths
  • Fix use of unpack_fieldsfile in test_esm1p6_filenames.py - should be shared with test_single_field.py but I was struggling to import it.

@joshuatorrance joshuatorrance requested a review from blimlim May 22, 2026 01:34
@blimlim

blimlim commented May 25, 2026

Copy link
Copy Markdown
Collaborator

Thanks @joshuatorrance for putting this together! I really like the idea of the delayed paths. They look like a good way to add flexibility to the filenames without requiring major restructures.

Reading through the changes, I did find some of the code paths harder to understand under specific options. One example was using the ESM1.6 driver with multi-variable files. In this case, the final path is passed through the ESM1.5 driver’s get_output_path method, the base DelayedCubePath class, and the Esm1p6DelayedCubePath class. Meanwhile the DelayedCubePath and Esm1p6DelayedCubePaths aren't doing too much here, and just provide back the original path. I think it would be nicer to make what's happening a bit more explicit if there's a way.

A rough idea I was thinking could be:

  • Only use DelayedCubePaths when the --one-nc-per-stash-variable option is selected, otherwise just use plain Paths. DelayedCubePaths would then drop the resolve_cubelist method, and I think this could make the DelayedCubePaths conceptually clearer (to me at least!). We could ensure ensure that the correct objects are passed to the process function for each setting with something like:

  • Modify the driver’s get_output_path method to return the right type of object:


     def get_output_path(self, input_path, single_var):
     	if single_var:
     		return self.get_output_path_single()
     	else:
     		return self.get_outptut_path_multi()

    The get_output_path_multi method could match get_output_path from the main branch, while the get_output_path_single method would return a DelayedCubePath.

This is just what thought could be a bit clearer from mulling over it, which obviously doesn't mean it necessarily is actually any clearer, so feel to take this just as a suggestion!

@atteggiani

Copy link
Copy Markdown
Collaborator

Hi @joshuatorrance,
Thank you for working on this.

As there seems to be no issue attached to this PR, I might not have enough background knowledge on the purpose these changes are trying to achieve to be able to fully provide my initial review in the merit of the design and implementation of the changes.

Could you please expand on

Adopt ESM1.6's new filename pattern that follows the OM3 pattern

especially providing a bit more context on:

  • what is the main purpose of this PR and why there is the need for a delayed path?
  • what model it should apply to? (I see the PR mentioning ESM1.6 but I see ESM1.5 driver being changed as well)

Thank you!

@atteggiani atteggiani left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also some small initial comments

Comment thread src/um2nc/drivers/common.py Outdated
Comment thread src/um2nc/drivers/esm1p5.py Outdated
@joshuatorrance

Copy link
Copy Markdown
Collaborator Author

Thanks for the feedback @blimlim and @atteggiani !

  • The top comment has been updated with more detail that hopefully justifies this PR
  • DelayedCubePaths are now only used with --one-nc-per-stash-variable
    • Since they always call resolve_cube when used they no longer need to extent Path either
    • Some logic was pushed into the Drivers to support this
  • Reverted datetime change to get_ff_date
  • More tests!

Comment thread src/um2nc/drivers/esm1p6.py Outdated

@blimlim blimlim left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joshuatorrance, I think this is looking good. I'm still thinking a little about the tests but wanted to put up some small comments in the meantime.

Comment thread src/um2nc/drivers/common.py Outdated
Comment thread src/um2nc/drivers/esm1p5.py Outdated
Comment thread src/um2nc/drivers/esm1p5.py
Comment thread src/um2nc/drivers/esm1p6.py Outdated
Comment thread src/um2nc/drivers/esm1p6.py Outdated
Comment thread test/drivers/test_common.py Outdated
Comment thread test/test_um2netcdf.py Outdated
…tput pairs, increment filename stem instead of var

- When there's a collision the entire filename is now incremented e.g. var_file.nc -> var_file_1.nc (before var, var-max, var-min would be detected as a collission)
- Colission checking moved into DelayedCubePath
- resolve_cube no longer needs an alternative var name
- tests!
@joshuatorrance joshuatorrance marked this pull request as ready for review June 1, 2026 01:37

@atteggiani atteggiani left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joshuatorrance for updating this.

I haven't looked at the tests too carefully, but overall this looks good.

I left a few minor comments.

Comment thread src/um2nc/common.py
Comment thread test/drivers/test_esm1p6_filenames.py Outdated
Comment thread test/test_common.py Outdated
joshuatorrance and others added 2 commits June 1, 2026 13:31
Co-authored-by: Davide Marchegiani <davide.marchegiani@gmail.com>

@blimlim blimlim left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joshuatorrance, I think this is looking great and the tests look thorough. I've just added a few very small comments!

Comment thread test/drivers/test_common.py Outdated
Comment thread src/um2nc/drivers/common.py Outdated
Comment thread src/um2nc/drivers/esm1p6.py Outdated
Comment thread src/um2nc/drivers/esm1p5.py Outdated
Comment thread src/um2nc/drivers/esm1p6.py Outdated
Comment thread test/drivers/test_esm1p6_filenames.py Outdated
Comment thread test/drivers/test_esm1p6_filenames.py
Comment thread test/drivers/test_esm1p6_filenames.py Outdated
Comment thread test/drivers/test_esm1p6_filenames.py Outdated
Comment thread test/drivers/test_esm1p6_filenames.py Outdated
blimlim
blimlim previously approved these changes Jun 15, 2026

@blimlim blimlim left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joshuatorrance, this looks great. One point we should consider is the instantaneous cell methods for each component of ESM1.6, as discussed here: ACCESS-NRI/esm1.6-scripts#6 (comment)

If you prefer to separate that discussion out into a separate PR, that's ok with me too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants