Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions configurations/surface-dummy-model_DINI/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
*.zip
*.zarr/
inference_artifact/
*.yaml
inference_workdir/
.env
1 change: 1 addition & 0 deletions configurations/surface-dummy-model_DINI/Containerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ WORKDIR /workspace
COPY pyproject.toml .
COPY *.yaml ./
COPY entry.sh ./
COPY src/ ./src

# Download inference artifact from S3
ARG DEFAULT_ARTIFACT="s3://mlwm-artifacts/inference-artifacts/surface-dummy-model_DINI.zip"
Expand Down
51 changes: 51 additions & 0 deletions configurations/surface-dummy-model_DINI/DEVELOPING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Development notes

## Local development

- currently image build only works on amd64 machines (i.e. not on macos)

- image build requires `aws` cli which can be retrieved from https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip

- to load AWS crendentials from `.aws/credentials` you can use the following script (drop in e.g. `~/.bashrc`):

```bash
aws-load-creds() {
local profile=$1
if [[ -z "$profile" ]]; then
echo "❌ Usage: aws-load-creds <profile-name>"
return 1
fi

local access_key
local secret_key

access_key=$(aws configure get aws_access_key_id --profile "$profile" 2>/dev/null)
secret_key=$(aws configure get aws_secret_access_key --profile "$profile" 2>/dev/null)

if [[ -z "$access_key" || -z "$secret_key" ]]; then
echo "❌ The config profile '$profile' could not be found or is incomplete."
return 1
fi

export AWS_ACCESS_KEY_ID="$access_key"
export AWS_SECRET_ACCESS_KEY="$secret_key"

echo "✅ Loaded AWS credentials from profile: $profile"
}

aws-list-profiles() {
echo "📂 AWS profiles found:"
grep '^\[profile ' ~/.aws/config 2>/dev/null | sed 's/^\[profile //' | sed 's/\]//'
grep '^\[' ~/.aws/credentials 2>/dev/null | sed 's/^\[//' | sed 's/\]//'
}
```

- to set the environment variables for `./entry.sh` you can use a `.env` file. E.g. to run with DINI forecast data you would use:

```bash
# .env
ANALYSIS_TIME="2025-09-22T120000Z"
DINI_ZARR="s3://harmonie-zarr/dini/control/${ANALYSIS_TIME}/single_levels.zarr/"
DATASTORE_INPUT_PATHS="danra.danra_surface=${DINI_ZARR},danra.danra_static=${DINI_ZARR}"
TIME_DIMENSIONS="time"
```
83 changes: 83 additions & 0 deletions configurations/surface-dummy-model_DINI/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# surface-dummy-model_DINI

The model configuration in this directory is a dummy model that was trained on
surface variables from DANRA, only 10 days of data and only trained 10
epochs. It is intended only as a demonstration of the inference pipeline and is
expected to give very poor results.

## Building image and running inference

To build the image on "superjuice" (`27sj894.dmi.dk`) we need to set the AWS tokens to read the inference artifact and also use the local http proxy for pulling the base image:

```bash
export AWS_SECRET_ACCESS_KEY=<secret-key-to-read-inference-artifact>
export AWS_ACCESS_KEY_ID=<access-key-to-read-inference-artifact>
export MLWM_PULL_PROXY=http://squid1.dmi.dk:3128
```



## Upstream package change requirements

Relative to the `main` branch on both github.com/mllam/mllam-data-prep and
github.com/mllam/neural-lam and number of pieces of functionality are currently
required to run this configuration:

**mllam-data-prep**:

using branch `feat/inference-cli-args` on
https://github.com/leifdenby/mllam-data-prep@feat/inference-cli-args, which adds:

- functionality to invert datasets created by `mllam-data-prep` back to the
structure of the input datasets that we were used. In the current
configuration that is used to restructure the forecast zarr dataset that
`neural-lam` outputs during inference back to the structure of the input
forecast dataset.

- also in seperate branch and PR: https://github.com/leifdenby/mllam-data-prep/tree/feat/inverse-ops

- use of cf-compliant encoding of `xarray/pandas` `MultiIndex` coordinates to
store stacked coordinates. This is required since we `MultiIndex` coordinates
can't natively be stored in zarr/netcdf files, but fortunately `cf_xarray`
have implemented the cf-compliant way of handling this (see
https://cf-xarray.readthedocs.io/en/latest/coding.html)

- needs its own branch and PR

- support for supplying statistics from the training dataset during creation of
the inference dataset, so that the inference dataset can be normalised in the
same way as the training dataset.

- needs its own branch and PR

- support for selecting only a single value from a variable/coordinate in the
configuration. This is used to select only a single analysis time during
creation of the inference dataset.

- needs its own branch and PR


**neural-lam**:

using branch `dev/first-inference-image` on
https://github.com/leifdenby/neural-lam/tree/dev/first-inference-image, which
adds:

- support for decoding cf-compliant `MultiIndex` encoded coordinates when reading
datasets produced with mllam-data-prep.

- this needs its own branch and PR, and needs to be implemented so datasets
made with previous versions of `mllam-data-prep` are still usable in `neural-lam`

- support for writing output from inference (i.e. `--eval` mode) to a zarr
dataset. Needs to be merged after the multiindex decoding above.

- also in seperate branch and PR: https://github.com/leifdenby/neural-lam/tree/feat/write-to-zarr

- support for using forecast data in in mllam-data-prep datastore (`MDPDatastore`)

- needs its own branch and PR

- make logging of validation steps optional in the training CLI (i.e. `--eval` mode)

- needs its own branch and PR
165 changes: 0 additions & 165 deletions configurations/surface-dummy-model_DINI/datastore.yaml

This file was deleted.

Loading