Adopt ESM1.6's new filenames#241
Conversation
|
Thanks @joshuatorrance for putting this together! I really like the idea of the delayed paths. They look like a good way to add flexibility to the filenames without requiring major restructures. Reading through the changes, I did find some of the code paths harder to understand under specific options. One example was using the ESM1.6 driver with multi-variable files. In this case, the final path is passed through the ESM1.5 driver’s A rough idea I was thinking could be:
This is just what thought could be a bit clearer from mulling over it, which obviously doesn't mean it necessarily is actually any clearer, so feel to take this just as a suggestion! |
|
Hi @joshuatorrance, As there seems to be no issue attached to this PR, I might not have enough background knowledge on the purpose these changes are trying to achieve to be able to fully provide my initial review in the merit of the design and implementation of the changes. Could you please expand on
especially providing a bit more context on:
Thank you! |
atteggiani
left a comment
There was a problem hiding this comment.
Also some small initial comments
…var output, some logic moved into the drivers, no longer extends Path
|
Thanks for the feedback @blimlim and @atteggiani !
|
blimlim
left a comment
There was a problem hiding this comment.
Thanks @joshuatorrance, I think this is looking good. I'm still thinking a little about the tests but wanted to put up some small comments in the meantime.
Co-authored-by: Spencer Wong <88933912+blimlim@users.noreply.github.com>
…stracted and tested
…tput pairs, increment filename stem instead of var - When there's a collision the entire filename is now incremented e.g. var_file.nc -> var_file_1.nc (before var, var-max, var-min would be detected as a collission) - Colission checking moved into DelayedCubePath - resolve_cube no longer needs an alternative var name - tests!
atteggiani
left a comment
There was a problem hiding this comment.
Thanks @joshuatorrance for updating this.
I haven't looked at the tests too carefully, but overall this looks good.
I left a few minor comments.
Co-authored-by: Davide Marchegiani <davide.marchegiani@gmail.com>
blimlim
left a comment
There was a problem hiding this comment.
Thanks @joshuatorrance, I think this is looking great and the tests look thorough. I've just added a few very small comments!
Co-authored-by: Spencer Wong <88933912+blimlim@users.noreply.github.com>
Co-authored-by: Spencer Wong <88933912+blimlim@users.noreply.github.com>
blimlim
left a comment
There was a problem hiding this comment.
Thanks @joshuatorrance, this looks great. One point we should consider is the instantaneous cell methods for each component of ESM1.6, as discussed here: ACCESS-NRI/esm1.6-scripts#6 (comment)
If you prefer to separate that discussion out into a separate PR, that's ok with me too!
Overview
It has been decided (see Zulip topic) that ESM1.6's output files (ocean, ice and atmosphere files) will follow the naming scheme for OM3.
um2ncneeds to use the new filename pattern for single-variable output files for the ESM1.6 driver.For ESM1.6 we have:
<model>.<component>.<dimension>.<field>.<frequency>.<time_cell_method>.<datestamp>.nc<model>:access-esm1p6<component>:umX.YwhereX.Yis the UM version used<dimension>:2Dor3D, i.e. the number of dimensions for the field, excluding time<field>: The variable/field name<frequency>: The frequency of the data<time_cell_method>: Function applied along the time axis, e.g.mean,max(optional)<datestamp>: Date for the data (optional)The optional portions are omitted where they are not applicable, e.g. variables without time aggregation, or
fxfiles with notimecomponent.Implementation
Certain portions of the filename need information from a couple of places:
frequencyum_version,dimension,field,frequency,time_cell_method,datestampThis means that the output file name cannot be fully defined until the iris cube for the data is available, i.e. in the
processfunction ofum2netcdf.py.DelayedCubePath
Prior to these changes the output paths were defined well before
processwas called in the Driver'sget_output_pathmethod. For single variable output files, the variable name crudely prepended to the output filename during process - things were implemented that way as a placeholder for a change such as this.To allow the output file name to be determined later get_output_path now returns a
Path(for multi-variable output files) or aDelayedCubePath(for single-variable output files) which has aresolve_cubemethod to allowprocessto resolve the remaining unknowns in the filename without having to know anything about which scenario calledprocess.Esm1p6DelayedCubePath, contains the functionality to resolved the filename as described aboveDelayedCubePathreplicates the previous behaviour of just prepending the output filename with the variable name.Changes
This PR messes with some of the nuts and bolts, I've attempted to isolate changes into separate commits.
revertedget_ff_datewas changed to use a datetime instead of a tuple to support refactoring I later decided not to do but a datetime seems neater so I left that change inDelayedCubePathwhich allows abstraction of not-fully-resolved paths that need info from Iris CubesStill to do
unpack_fieldsfileintest_esm1p6_filenames.py- should be shared withtest_single_field.pybut I was struggling to import it.