dmidk · mfroelund · Nov 27, 2025 · Nov 27, 2025
diff --git a/README.md b/README.md
@@ -1,106 +1,164 @@
 LDCast is a precipitation nowcasting model based on a latent diffusion model (LDM, used by e.g. [Stable Diffusion](https://github.com/CompVis/stable-diffusion)).
 
-This repository contains the code for using LDCast to make predictions and the code used to generate the analysis in the LDCast paper (a preprint is available at https://arxiv.org/abs/2304.12891).
-
-A GPU is recommended for both using and training LDCast, although you may be able to generate some samples with a CPU and enough patience.
+This repository contains the code for using LDCast to make predictions. The code is reworked from https://github.com/MeteoSwiss/ldcast.
 
 # Installation
 
-It is recommended you install the code in its own virtual environment (created with e.g. pyenv or conda).
-
-Clone the repository, then, in the main directory, run
+The package uses the `uv` package manager to handle dependencies. First, install `uv` by executing
 ```bash
-$ pip install -e .
+curl -LsSf https://astral.sh/uv/install.sh | sh
 ```
-This should automatically install the required packages (which might take some minutes). In the paper, we used PyTorch 11.2 but are not aware of any problems with newer versions.
-
-If you don't want the requirements to be installed (e.g. if you installed them manually with conda), use:
+Then, clone this repository and install the dependencies by executing
 ```bash
-$ pip install --no-dependencies -e .
+git clone git@github.com:dmidk/ldcast-dmi.git
+uv sync --all-extras
 ```
 
 # Using LDCast
 
-## Pretrained models
+The package defines a command line interface, which can be inspected by executing
+```bash
+uv run ldcast - --help
+```
+Output:
+```
+NAME
+    ldcast - Main CLI class.
+
+SYNOPSIS
+    ldcast - GROUP | COMMAND | VALUE
 
-The pretrained models are available at the Zenodo repository https://doi.org/10.5281/zenodo.7780914. Unzip the file `ldcast-models.zip`. The default is to unzip it to the `models` directory, but you can also use another location.
+DESCRIPTION
+    Main CLI class.
 
-## Producing predictions
+GROUPS
+    GROUP is one of the following:
 
-The easiest way to produce predictions is to use the `ldcast.forecast.Forecast` class, which will set up all models and data transformations and is callable with a past precipitation array.
-```python
-from ldcast import forecast
+     train
+       Cli setup for executing training.
 
-fc = forecast.Forecast(
-    ldm_weights_fn=ldm_weights_fn, autoenc_weights_fn=autoenc_weights_fn
-)
-R_pred = fc(R_past)
-```
-Here, `ldm_weights_fn` is the path to the LDM weights and `autoenc_weights_fn` is the path to the autoencoder weights. `R_past` is a NumPy array of precipitation rates with shape `(timesteps, height, width)` where `timesteps` must be 4 and `height` and `width` must be divisible by 32.
+     visualize
+       Cli setup for visualization.
 
-### Ensemble predictions
+COMMANDS
+    COMMAND is one of the following:
 
-If want to process multiple cases at once and/or generate several ensemble members, there is the `ldcast.forecast.ForecastDistributed` class. The usage is similar to the `Forecast` class, for example:
-```python
-from ldcast import forecast
+     forecast
+       Cli entry point for running forecast without training.
 
-fc = forecast.ForecastDistributed(
-    ldm_weights_fn=ldm_weights_fn, autoenc_weights_fn=autoenc_weights_fn
-)
-R_pred = fc(R_past, ensemble_members=32)
-```
-Here, `R_past` should be of shape `(cases, timesteps, height, width)` where `cases` is the number of cases you want to process. For each case, `ensemble_members` predictions are produced (this is the last axis of `R_pred`). `ForecastDistributed` automatically distributes the workload to multiple GPUs if you have them.
+     sample
+       Cli entry point for running sampling without training.
 
-## Demo
+VALUES
+    VALUE is one of the following:
 
-For a practical example, you can run the demo in the `scripts` directory. First download the `ldcast-demo-20210622.zip` file from the [Zenodo repository](https://doi.org/10.5281/zenodo.7780914), then unzip it in the `data` directory. Then run
-```bash
-$ python forecast_demo.py
+     config
 ```
-A sample output can be found in the file `ldcast-demo-video-20210622.zip` in the data repository. See the function `forecast_demo` in `forecast_demo.py` see how the `Forecast` class works. To run an ensemble mean of 8 members using the `ForecastDistributed` class, you can use:
+
+To show the available commands of the groups, execute e.g.
 ```bash
-$ python forecast_demo.py --ensemble-members=8
+uv run ldcast train --help
+```
+Output:
 ```
+NAME
+    ldcast train - Cli setup for executing training.
 
-The demo for a single ensemble member runs in a couple of minutes on our system using one V100 GPU; with a CPU around 10 minutes or more would be expected. A progress bar will show the status of the generation.
+SYNOPSIS
+    ldcast train COMMAND | VALUE
 
-# Training 
+DESCRIPTION
+    Cli setup for executing training.
 
-## Training data
+COMMANDS
+    COMMAND is one of the following:
 
-The preprocessed training data, needed to rerun the LDCast training, can be found at the [Zenodo repository](https://doi.org/10.5281/zenodo.7780914). Unzip the `ldcast-datasets.zip` file to the `data` directory.
+     all
+       Execute all training pipelines.
 
-## Training the autoencoder
+     autoenc
+       Execute the autoencoder training pipeline.
 
-In the `scripts` directory, run
-```bash
-$ python train_autoenc.py --model_dir="../models/autoenc_train"
-```
-to run the training of the autoencoder with the default parameters. The training checkpoints will be saved in the `../models/autoenc_train` directory (feel free to change this).
+     genforecast
+       Execute the genforecast training pipeline.
 
-It has been reported that this training may encounter a condition where the loss goes to `nan`. If this happens, try restarting from the latest checkpoint:
-```bash
-$ python train_autoenc.py --model_dir="../models/autoenc_train" --ckpt_path="../models/autoenc_train/<checkpoint_file>"
-```
-where `<checkpoint_file>` should be the latest checkpoint in the `../models/autoenc_train/` directory.
+VALUES
+    VALUE is one of the following:
 
-## Training the diffusion model
+     config
 
-In the `scripts` directory, run
-```bash
-$ python train_genforecast.py --model_dir="../models/genforecast_train"
+     num_nodes
+
+     save_model
 ```
-to run the training of the diffusion model with the default parameters, or
+
+As an example, to train the autoencoder one would execute
 ```bash
-$ python train_genforecast.py --model_dir="../models/genforecast_train" --config=<path_to_config_file>
+uv run ldcast train autoenc --config path/to/config.yaml --save_model -num_nodes 1
 ```
-to run the training with different parameters. Some config files can be found in the `config` directory. The training checkpoints will be saved in the `../models/genforecast_train` directory (again, this can be changed freely).
 
-# Evaluation
+# Configuration
+The package specifies configurations in a YAML file.
+An example configuration file can be found at `./example_config.yaml`.
+The configuration file is parsed using the [pydantic](https://pydantic-docs.helpmanual.io/) library.
+The configuration is structured into 4 main sections: `general`, `datasets`, `preprocessing`, and `models`, where the `general`, and the `datasets` sections are required for all commands.
+The `preprocessing` section is optional, while for the `model` section it is required to specify at least one model configuration (i.e. `autoenc`, `genforecast`, or `forecast`).
+In addition to various model parameters, the model sections specify what input datasets/weigths are needed, and what output datasets/weights are produced.
 
-You can find scripts for evaluating models in the `scripts` directory:
-* `eval_genforecast.py` to evaluate LDCast
-* `eval_dgmr.py` to evaluate DGMR (requires tensorflow installation and the DGMR model from https://github.com/deepmind/deepmind-research/tree/master/nowcasting placed in the `models/dgmr` directory)
-* `eval_pysteps.py` to evaluate PySTEPS (requires pysteps installation)
-* `metrics.py` to produce metrics from the evaluation results produced with the functions in scripts above
-* `plot_genforecast.py` to make plots from the results generated
+# Package structure
+The main structure of the package is outlined below:
+```
+.
+├── DataPreprocessing               # Scripts for splitting radar data into patches
+│   ├── merge_datasets.py
+│   ├── radarForML.py
+│   ├── radarToZarrML.py
+│   └── settings/
+├── ldcast
+│   ├── analysis/                   # Scripts for analyzing model performance
+│   ├── features                    # Various common features used across models
+│   │   ├── radar/
+│   │   ├── data_handling.py        # Main pytorch datahandling module
+│   │   ├── debug.py
+│   │   ├── io.py
+│   │   ├── patches.py
+│   │   ├── re_patch.py             # Script for re-patching data into custom new patch sizes 
+│   │   ├── sampling.py             # Sampling of dataset, e.g. EqualFrequencySampler
+│   │   ├── split.py                # Script for splitting data into train/val/test sets
+│   │   ├── transform.py
+│   │   └── utils.py
+│   ├── models                      # Main model implementations
+│   │   ├── autoenc
+│   │   │   ├── autoenc.py
+│   │   │   ├── encoder.py
+│   │   │   ├── __init__.py         # Main entry point to run autoencoder
+│   │   │   └── training.py
+│   │   ├── benchmarks/
+│   │   ├── blocks/
+│   │   ├── diffusion/
+│   │   ├── genforecast
+│   │   │   ├── analysis.py
+│   │   │   ├── __init__.py         # Main entry point to run genforecast
+│   │   │   ├── training.py
+│   │   │   └── unet.py
+│   │   ├── nowcast/
+│   │   ├── distributions.py
+│   │   ├── forecast.py             # Main entry point to run forecast
+│   │   └── utils.py
+│   ├── visualization/
+│   ├── __init__.py
+│   ├── __main__.py
+│   ├── cli.py                      # Defines the command line interface
+│   └── config_parser.py            # Configuration parser using pydantic
+├── test/
+├── LICENSE
+├── README.md
+├── leonardo_config.yaml            # Example configuration for running on Leonardo
+├── example_config.yaml             # Example configuration to start with
+├── pyproject.toml
+├── run_autoenc.sh                  # Bash script to run training of the autoencoder on Leonardo
+├── run_forecast.sh                 # Bash script to run forecasting on Leonardo
+├── run_genforecast.sh              # Bash script to run training of the genforecast model on Leonardo    
+├── run_sampler.sh                  # Bash script to run data sampling on Leonardo
+└── uv.lock
+```
diff --git a/ldcast/cli.py b/ldcast/cli.py
@@ -97,9 +97,7 @@ class TrainCLI(object):
     def __init__(self, config: cp.Config, save_model: bool, num_nodes: int):
         self.config = config
         self.save_model = save_model
-        self.sampling_grp = SamplingCLI(config)
-        self.autoenc_grp = AutoencCLI(config, save_model, num_nodes)
-        self.genforecast_grp = GenforecastCLI(config, save_model, num_nodes)
+        self.num_nodes = num_nodes
 
     def autoenc(self):
         """Execute the autoencoder training pipeline.
@@ -110,17 +108,17 @@ def autoenc(self):
         specified configurations.
         """
 
-        self.autoenc_grp.train()
+        AutoencCLI(self.config, self.save_model, self.num_nodes).train()
 
     def genforecast(self):
         """Execute the genforecast training pipeline."""
-        self.genforecast_grp.train()
+        GenforecastCLI(self.config, self.save_model, self.num_nodes).train()
 
     def all(self):
         """Execute all training pipelines."""
-        self.sampling_grp.run()
-        self.autoenc_grp.train()
-        self.genforecast_grp.train()
+        SamplingCLI(self.config).run()
+        AutoencCLI(self.config, self.save_model, self.num_nodes).train()
+        GenforecastCLI(self.config, self.save_model, self.num_nodes).train()
 
 
 class SamplingCLI(object):