dmidk · leifdenby · May 14, 2025 · May 15, 2025 · Sep 18, 2025 · Sep 18, 2025
diff --git a/configurations/surface-dummy-model_DINI/.gitignore b/configurations/surface-dummy-model_DINI/.gitignore
@@ -0,0 +1,6 @@
+*.zip
+*.zarr/
+inference_artifact/
+*.yaml
+inference_workdir/
+.env
diff --git a/configurations/surface-dummy-model_DINI/Containerfile b/configurations/surface-dummy-model_DINI/Containerfile
@@ -5,6 +5,7 @@ WORKDIR /workspace
 COPY pyproject.toml .
 COPY *.yaml ./
 COPY entry.sh ./
+COPY src/ ./src
 
 # Download inference artifact from S3
 ARG DEFAULT_ARTIFACT="s3://mlwm-artifacts/inference-artifacts/surface-dummy-model_DINI.zip"

diff --git a/configurations/surface-dummy-model_DINI/DEVELOPING.md b/configurations/surface-dummy-model_DINI/DEVELOPING.md
@@ -0,0 +1,51 @@
+# Development notes
+
+## Local development
+
+- currently image build only works on amd64 machines (i.e. not on macos)
+
+- image build requires `aws` cli which can be retrieved from https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip
+
+- to load AWS crendentials from `.aws/credentials` you can use the following script (drop in e.g. `~/.bashrc`):
+
+```bash
+aws-load-creds() {
+  local profile=$1
+  if [[ -z "$profile" ]]; then
+    echo "❌ Usage: aws-load-creds <profile-name>"
+    return 1
+  fi
+
+  local access_key
+  local secret_key
+
+  access_key=$(aws configure get aws_access_key_id --profile "$profile" 2>/dev/null)
+  secret_key=$(aws configure get aws_secret_access_key --profile "$profile" 2>/dev/null)
+
+  if [[ -z "$access_key" || -z "$secret_key" ]]; then
+    echo "❌ The config profile '$profile' could not be found or is incomplete."
+    return 1
+  fi
+
+  export AWS_ACCESS_KEY_ID="$access_key"
+  export AWS_SECRET_ACCESS_KEY="$secret_key"
+
+  echo "✅ Loaded AWS credentials from profile: $profile"
+}
+
+aws-list-profiles() {
+  echo "📂 AWS profiles found:"
+  grep '^\[profile ' ~/.aws/config 2>/dev/null | sed 's/^\[profile //' | sed 's/\]//'
+  grep '^\[' ~/.aws/credentials 2>/dev/null | sed 's/^\[//' | sed 's/\]//'
+}
+```
+
+- to set the environment variables for `./entry.sh` you can use a `.env` file. E.g. to run with DINI forecast data you would use:
+
+```bash
+# .env
+ANALYSIS_TIME="2025-09-22T120000Z"
+DINI_ZARR="s3://harmonie-zarr/dini/control/${ANALYSIS_TIME}/single_levels.zarr/"
+DATASTORE_INPUT_PATHS="danra.danra_surface=${DINI_ZARR},danra.danra_static=${DINI_ZARR}"
+TIME_DIMENSIONS="time"
+```
diff --git a/configurations/surface-dummy-model_DINI/README.md b/configurations/surface-dummy-model_DINI/README.md
@@ -0,0 +1,83 @@
+# surface-dummy-model_DINI
+
+The model configuration in this directory is a dummy model that was trained on
+surface variables from DANRA, only 10 days of data and only trained 10
+epochs. It is intended only as a demonstration of the inference pipeline and is
+expected to give very poor results.
+
+## Building image and running inference
+
+To build the image on "superjuice" (`27sj894.dmi.dk`) we need to set the AWS tokens to read the inference artifact and also use the local http proxy for pulling the base image:
+
+```bash
+export AWS_SECRET_ACCESS_KEY=<secret-key-to-read-inference-artifact>
+export AWS_ACCESS_KEY_ID=<access-key-to-read-inference-artifact>
+export MLWM_PULL_PROXY=http://squid1.dmi.dk:3128
+```
+
+
+
+## Upstream package change requirements
+
+Relative to the `main` branch on both github.com/mllam/mllam-data-prep and
+github.com/mllam/neural-lam and number of pieces of functionality are currently
+required to run this configuration:
+
+**mllam-data-prep**:
+
+using branch `feat/inference-cli-args` on
+https://github.com/leifdenby/mllam-data-prep@feat/inference-cli-args, which adds:
+
+- functionality to invert datasets created by `mllam-data-prep` back to the
+  structure of the input datasets that we were used. In the current
+  configuration that is used to restructure the forecast zarr dataset that
+  `neural-lam` outputs during inference back to the structure of the input
+  forecast dataset.
+
+  - also in seperate branch and PR: https://github.com/leifdenby/mllam-data-prep/tree/feat/inverse-ops
+
+- use of cf-compliant encoding of `xarray/pandas` `MultiIndex` coordinates to
+  store stacked coordinates. This is required since we `MultiIndex` coordinates
+  can't natively be stored in zarr/netcdf files, but fortunately `cf_xarray`
+  have implemented the cf-compliant way of handling this (see
+  https://cf-xarray.readthedocs.io/en/latest/coding.html)
+
+  - needs its own branch and PR
+
+- support for supplying statistics from the training dataset during creation of
+  the inference dataset, so that the inference dataset can be normalised in the
+  same way as the training dataset.
+
+  - needs its own branch and PR
+
+- support for selecting only a single value from a variable/coordinate in the
+  configuration. This is used to select only a single analysis time during
+  creation of the inference dataset.
+
+  - needs its own branch and PR
+
+
+**neural-lam**:
+
+using branch `dev/first-inference-image` on
+https://github.com/leifdenby/neural-lam/tree/dev/first-inference-image, which
+adds:
+
+- support for decoding cf-compliant `MultiIndex` encoded coordinates when reading
+  datasets produced with mllam-data-prep.
+
+  - this needs its own branch and PR, and needs to be implemented so datasets
+    made with previous versions of `mllam-data-prep` are still usable in `neural-lam`
+
+- support for writing output from inference (i.e. `--eval` mode) to a zarr
+  dataset. Needs to be merged after the multiindex decoding above.
+
+  - also in seperate branch and PR: https://github.com/leifdenby/neural-lam/tree/feat/write-to-zarr
+
+- support for using forecast data in in mllam-data-prep datastore (`MDPDatastore`)
+
+  - needs its own branch and PR
+
+- make logging of validation steps optional in the training CLI (i.e. `--eval` mode)
+
+  - needs its own branch and PR
diff --git a/configurations/surface-dummy-model_DINI/datastore.yaml b/configurations/surface-dummy-model_DINI/datastore.yaml