Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
d9e8767
Added draft of splitnc and a rudimentary readme
joshuatorrance Apr 17, 2026
9f44d2b
Mild improvement to readme
joshuatorrance Apr 20, 2026
9264aee
Fleshed out README
joshuatorrance Apr 21, 2026
cff24ed
Added option to use command line option file instead of command line …
joshuatorrance Apr 21, 2026
2bfad3a
Now that files argument is option need to error out if no files provided
joshuatorrance Apr 22, 2026
1e33405
Removed diagnostic print
joshuatorrance Apr 22, 2026
224d8d4
Added decoder to avoid xarray warning about cftime, removed escape de…
joshuatorrance Apr 22, 2026
2b796de
Added basic test for esm1.6 files
joshuatorrance Apr 22, 2026
8fd3a8d
Added a simple test for automatic field determination.
joshuatorrance Apr 23, 2026
a39ce17
Added some explanatory docstrings, updated regex examples from .* to .+
joshuatorrance Apr 23, 2026
3cabe29
Moved pytest.ini into pyproject.yaml, attempt at ci
joshuatorrance Apr 23, 2026
3aa565d
Added a minimum version to xarray due to cftime usage.
joshuatorrance Apr 23, 2026
7906560
Removed py3.9
joshuatorrance Apr 23, 2026
95c4e7a
Missed committing the readme with the previous example regex change. …
joshuatorrance Apr 23, 2026
f24afc9
Resolve shared-vars before field-vars. Added test for that failure, t…
joshuatorrance Apr 29, 2026
70eb99d
Apply suggestions from code review
joshuatorrance Apr 29, 2026
138a486
Added currently failing tests that check for renaming cell_methods & …
joshuatorrance Apr 29, 2026
459b266
Fixed bug in tests, renaming now updates cell_methods and coordinates…
joshuatorrance Apr 29, 2026
f5cb1ad
Fixed bug in renaming when coord==None, added test case for this case…
joshuatorrance Apr 30, 2026
f64c9d7
Fixed some comments
joshuatorrance Apr 30, 2026
577a1da
Now split coordinates on any whitespace. Added test case for this too
joshuatorrance Apr 30, 2026
ea7a5d8
Update splitnc/splitnc.py
joshuatorrance Apr 30, 2026
0800cee
Added updating of history attr and option to disable this new feature
joshuatorrance May 1, 2026
e913ccd
Added option to exclude-variable from output. Updated tests and readm…
joshuatorrance May 1, 2026
bff2f8c
Update to README and argparse help string
joshuatorrance May 1, 2026
9e9f741
Tests now also test with cmdline files. Tweaked handling of commandli…
joshuatorrance May 1, 2026
83cc246
Update splitnc/splitnc.py
joshuatorrance May 4, 2026
7eb9c78
Cleaning up a lost comment and whitespace
joshuatorrance May 4, 2026
be2d868
Tests now check excluded vars are excluded
joshuatorrance May 4, 2026
8a2f57c
Now using Counter when determining field vars. Suggested by @aidanhee…
joshuatorrance May 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: CI

on:
pull_request:
push:
branches:
- main
workflow_dispatch:

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13"]
script-dir: ["splitnc"]

steps:
- name: Install netcdf-bin
run: sudo apt-get install -y netcdf-bin

- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: pip

- name: Install dependencies
run: cd ${{ matrix.script-dir }}; pip install .[test]

- name: Run tests
run: cd ${{ matrix.script-dir }}; python -m pytest
112 changes: 112 additions & 0 deletions splitnc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# splitnc
This script splits multi-field netCDF files into single-field files.
It is designed to work on ESM1.6's atmosphere and ice files.

## Automatic Field Identification
By default `splitnc` will attempt to identify the fields for a multi-field netCDF files by looking for variables that no other variables depend on.
A variable that no others depend on is likely to be a field.
E.g. many variables depend on `time`, but none depend on `sea_surface_temperature`.

Alternatively the fields to separate to individual files can be specified as a comma separated list with the `--field-vars` command line option.
`--field-vars` interprets each item as regex, e.g. one could use `--field-vars fld_.+` to match all variable names that start with the string `fld_`.

## "Ancillary" Variables

Some variables with no dependents should not be separated into individual files, these variables must be manually identified with the `--shared-vars` command line option.
These variables will then be present in every output file.
Regex is also supported for this option.

If there are ancillary fields that should only be present in only some of the output field files then multiple invocations of `splitnc` using `--field-vars` and `--shared-vars` will be required.

Example of these variables are the `latitude_longitude` found in atmosphere files or the `uarea`, `tmask`, `tarea`, `VGRDb`, `VGRDi`, `VGRDs` variables from ice files.

## Config File

The `-c`/`--command-line-file` option can be used to supply a filepath to a file that contains command line options.
If this option is used, all other options supplied on the command line will be ignored.
Newline characters in the file will be treated as whitespace, i.e. newlines can be used as well as spaces to separate command line arguments.

For example to replicate this command line,
```
python splitnc.py --verbose --overwrite --output-dir /output/directory --shared-vars latitude_longitude --rename-regex "(?P<newname>.+)_\d+" /input/directory/*.nc
```
the following file could be used;
```
--verbose
--overwrite
--output-dir /output/directory
--shared-vars latitude_longitude
--rename-regex "(?P<newname>.+)_\d+"
/input/directory/*.nc
```

## Command Line Options

```quote
usage: splitnc [-h] [--field-vars FIELD_VAR1,FIELD_VAR2,...] [--shared-vars SHARED_VAR1,SHARED_VAR2,...]
[--output-name-pattern OUTPUT_NAME_PATTERN] [--rename-regex REGEX] [--output-dir OUTPUT_DIR] [--overwrite] [-v]
[-c COMMAND_LINE_FILE]
[filepaths ...]

Splits a multi-field netCDF file into separate one-field files

positional arguments:
filepaths One or more filepaths to process

options:
-h, --help show this help message and exit
--field-vars FIELD_VAR1,FIELD_VAR2,...
Specify the names of the field variables to split into separate files - dimensions, bounds, and
coordinates of these fields will be included in each file. Disables automatic field variable
identification. Regex patterns can be used here.
--shared-vars SHARED_VAR1,SHARED_VAR2,...
Specify the names of variables that should be shared across files that cannot be automatically
identified, as a comma separated list. Regex patterns can be used here.
--excluded-vars EXCLUDED_VAR1,EXCLUDED_VAR2,...
Specify the names of variables that should be excluded from files. This option can be used with
automatic identification of field variables. Regex patterns can be used here.
--rename-regex REGEX Look for duplicated coordinate names that match the given regex and rename them to the first
"newname" capture group in the regex. E.g. "(?P<newname>.*)_\d+" will match "time_0" and rename
it to "time".
--output-dir OUTPUT_DIR
Output directory for the processed files. If not given output files will be placed in the same
directory as the original file.
--overwrite Overwrite existing files
--dont-update-history
Disable automatic update of history attribute
-v, --verbose
-c COMMAND_LINE_FILE, --command-line-file COMMAND_LINE_FILE
A file containing a list of command-line arguments. Newlines in this file will be ignored. If
supplied all other command line arguments will be ignored.
```

## Example Usage

`splitnc` just needs the `xarray` and `netCDF4` python modules.
On Gadi use load any module with `xarray`, such as `conda/analysis3`.
Alternatively create a new python environment and install `xarray` and `netCDF4`.

### Atmosphere
To use this script for split multi-field atmosphere files from ACCESS-ESM1.6:
```bash
python split-nc.py --shared-vars latitude_longitude --rename-regex "(?P<newname>.+)_\\d+" $INPUT_DIR/*.nc
```

`splitnc` will automatically determine which variables are fields by looking at which variables depend on other variables.
Variables with nothing depending on them are deemed to be fields.
Alternatively one could use `--field-vars fld_.+` to match the variable names in these files.

The `--rename-regex` option with the supplied regex will rename variables like
`time_0` or `pseudo_level_0` are renamed to `time` or `pseudo_level`.

The `--shared-vars` option will ensure that the variable `latitude_longitude` is
included in all files even though none of the field variable depend on it.

### Ice
To use this script for split multi-field ice files from ACCESS-ESM1.6:
```bash
python split-nc.py --shared-vars uarea,tmask,tarea --excluded-vars VGRD. $INPUT_DIR/*.nc
```

In comparison to the atmosphere files, ice files have different shared-vars and there are no duplicated variables that require renaming.
The variables `VGRDb`, `VGRDi`, and `VGRDs` are not required and can thus be excluded from the output.
33 changes: 33 additions & 0 deletions splitnc/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
[project]
name = "splitnc"
version = "v0.1"
authors = [
{name = "Joshua Torrance", email="joshua.torrance@anu.edu.au"},
]
maintainers = [
{name = "Joshua Torrance", email="joshua.torrance@anu.edu.au"},
]
description = "Split multi-field ESM1.6 files into single-field files"
readme = "README.md"
requires-python = ">=3.9"
dependencies = [
"netcdf4",
"xarray",
]

[project.optional-dependencies]
test = [
"pytest",
"netCDF4",
"xarray>2025.1.2",
]

[build-system]
build-backend = "setuptools.build_meta"
requires = [
"setuptools",
]

[tool.pytest.ini_options]
testpaths = "test"
pythonpath = "."
Loading
Loading