Zephyrus: An Agentic Framework for Weather Science

Accepted at the International Conference on Learning Representations, 2026.

Authors
Sumanth Varambally, Marshall Fisher, Jas Thakker, Yiwei Chen, Zhirui Xia, Yasaman Jafari, Ruijia Niu, Manas Jain, Veeramakali Vignesh Manivannan, Zachary Novack, Luyu Han, Srikar Eranky, Salva Rühling Cachay, Taylor Berg-Kirkpatrick, Duncan Watson-Parris, Yian Ma, and Rose Yu

Links
arXiv | ICLR 2026 Poster | Dataset

Zephyrus pairs an LLM with ZephyrusWorld, a weather-science execution environment that exposes WeatherBench 2 data, geolocation utilities, forecasting, simulation, and climatology tools through Python APIs. This repository contains the agent implementations, the code execution server, the benchmark/evaluation pipeline, and the task-generation code used in the paper.

Figure from the paper: Zephyrus writes code, executes it against weather tools and datasets, observes the result, and iterates before answering.

Highlights

ZephyrusWorld unifies WeatherBench 2 access, Natural Earth geolocation, a Stormer forecaster, a JAX-based simulator, and climatology queries.
Zephyrus-Direct solves questions with one generated program; Zephyrus-Reflective uses a multi-turn execute-observe-refine loop.
ZephyrusBench contains 2,230 question-answer pairs across 49 weather-science tasks.
Zephyrus improves correctness over text-only baselines by up to 44.2 percentage points.

Percentage of benchmark questions answered correctly across LLM backbones and model variants.

The FastAPI-based execution server distributes requests across workers and pooled weather tools.

Setup

1. Create the environment

curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv .venv --python 3.11
source .venv/bin/activate
uv sync --active

2. Download Natural Earth

mkdir -p assets/NaturalEarth
cd assets/NaturalEarth
wget https://naciscdn.org/naturalearth/packages/natural_earth_vector.zip
unzip natural_earth_vector.zip
rm natural_earth_vector.zip
cd ../../

3. Prepare WeatherBench 2

Download the WeatherBench 2 ERA5 zarr used by the project and point configs/paths/default.yaml wb2_path to it. Note that this step requires you to install gsutil.

#!/bin/bash

DATA_DIR="./data/" # update to the desired path
DATASET="1959-2023_01_10-6h-240x121_equiangular_with_poles_conservative.zarr"
subdirs1=(
  10m_u_component_of_wind
  10m_v_component_of_wind
  2m_temperature
  geopotential
  geopotential_at_surface
  land_sea_mask
  latitude
  level
  longitude
  mean_sea_level_pressure
  soil_type
  specific_humidity
  surface_pressure
  temperature
  time
  u_component_of_wind
  v_component_of_wind
  mean_top_downward_short_wave_radiation_flux
)

DATA_DIR="$DATA_DIR/$DATASET"
mkdir -p "$DATA_DIR"
cd "$DATA_DIR"
gsutil -m cp -n \
  "gs://weatherbench2/datasets/era5/$DATASET/.zattrs" \
  "gs://weatherbench2/datasets/era5/$DATASET/.zgroup" \
  "gs://weatherbench2/datasets/era5/$DATASET/.zmetadata" \
  .
for subdir in "${subdirs1[@]}"; do
  echo "Downloading $subdir"
  gsutil -m cp -r -n "gs://weatherbench2/datasets/era5/$DATASET/$subdir" .
done

echo "Downloaded data to $DATA_DIR"

WeatherBench 2 is large. Plan for roughly 550 GB of free disk space.

4. Optional helper assets

Install Stormer and fetch the checkpoint:

sh scripts/install_stormer.sh

To download the cache, review and adapt scripts/download_cache.sh before running it so its output directories match your local config, then run:

sh scripts/download_cache.sh

5. Update local paths and model endpoints

Before running the code, update these configs for your machine:

configs/paths/default.yaml: wb2_path, natural_earth_path, climatology_cache_dir, model_output_dir, model_output_cache_dir
configs/model/agent.yaml, configs/model/pal.yaml, configs/model/api_text_only_llm.yaml: base_url, api_llm_model
configs/server.yaml: gpu_pool, num_workers, port
configs/eval/evaluator.yaml: evaluator backend/model if you are not using the default OpenAI endpoint

6. Set API keys

For OpenAI-compatible backends:

export OPENAI_API_KEY="<your-key>"

If your local or self-hosted OpenAI-compatible endpoint ignores auth, a dummy value is enough:

export OPENAI_API_KEY="none"

For Gemini-based configs:

export GOOGLE_API_KEY="<your-key>"

7. Setup the ZephyrusBench dataset

Download the ZephyrusBench dataset from this link, and update the dataset_path variable in configs/paths/default.yaml.

Make sure to move the simulation_outputs folder to cache/simulation_outputs.

Run Zephyrus

Start the execution server

python -m src.code_execution.server

The server uses configs/server.yaml and defaults to port 8000.

Run benchmark inference

The benchmark loader expects a directory containing one or more .json files. The benchmark can be downloaded from the Huggingface link.

Run the reflective agent:

python -m src.run model=agent

Run the single-shot PAL baseline (Zephyrus-Direct):

python -m src.run model=pal

Run the text-only baseline:

python -m src.run model=api_text_only_llm

To resume from cached predictions:

python -m src.run model=agent resume=true

api_text_only_llm does not need the execution server. agent and pal do.

Evaluate model outputs

python -m src.evaluate

Evaluation scans model_outputs/, writes per-run _processed.json files, and produces summary logs. The evaluator also uses an LLM backend for answer extraction/verification.

Using Local Open Source Models with vLLM

Example vLLM server launch:

CUDA_VISIBLE_DEVICES=2,3 python -m vllm.entrypoints.openai.api_server \
  --model /data/qwen_models/qwen3-coder-30b \
  --dtype bfloat16 \
  --tensor-parallel-size 2 \
  --gpu-memory-utilization 0.85 \
  --port 8008

To plug a local model into Zephyrus:

Point llm_client.base_url to http://localhost:8008/v1.
Set llm_client.api_llm_model to the model name exposed by vLLM.
Export OPENAI_API_KEY="none" if your local endpoint does not enforce auth.
Start the Zephyrus code execution server in another terminal if you are using agent or pal.
Run inference normally with python -m src.run model=....

This uses the same OpenAI-compatible client path as hosted endpoints, so no separate local-model code path is required.

Generate benchmark data

If you want to regenerate task data instead of using the released dataset, first edit the config files in configs/data/default.yaml. Then, run:

python -m src.generate

Generated outputs are written under the configured save directories in configs/paths/default.yaml.

Example output:

Generating prompt for prompt id: HIuSnl question id: dvzBtE
The following data shows the global data over a period of 42 hours, sampled at an interval of 6 hours. {'variables': ['2m_temperature'], 'time_indices': [496474, 496480, 496486, 496492, 496498, 496504, 496510]} Based on the above data, answer the following question: What is the median 2m_temperature in Rîşcani,MD? Based on the provided data, the median 2m_temperature at Rîşcani,MD is 292.8622131347656.

Generating prompt for prompt id: HIuSnl question id: QaajpN
The following data shows the global data over a period of 42 hours, sampled at an interval of 6 hours. {'variables': ['2m_temperature'], 'time_indices': [523641, 523647, 523653, 523659, 523665, 523671, 523677]} Based on the above data, answer the following question: Which continent experienced the lowest 2m_temperature? Based on the provided data, Antarctica experienced the lowest 2m_temperature over the specified time-period, with the lowest 2m_temperature of 205.49594116210938.

Citation

If you found our work useful, please consider citing our paper.

@inproceedings{
  varambally2026zephyrus,
  title={Zephyrus: An Agentic Framework for Weather Science},
  author={Sumanth Varambally and Marshall Fisher and Jas Thakker and Yiwei Chen and Zhirui Xia and Yasaman Jafari and Ruijia Niu and Manas Jain and Veeramakali Vignesh Manivannan and Zachary Novack and Luyu Han and Srikar Eranky and Salva R{\"u}hling Cachay and Taylor Berg-Kirkpatrick and Duncan Watson-Parris and Yian Ma and Rose Yu},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=aVeaNahsID}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
docs/figures		docs/figures
scripts		scripts
src		src
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zephyrus: An Agentic Framework for Weather Science

Highlights

Setup

1. Create the environment

2. Download Natural Earth

3. Prepare WeatherBench 2

4. Optional helper assets

5. Update local paths and model endpoints

6. Set API keys

7. Setup the ZephyrusBench dataset

Run Zephyrus

Start the execution server

Run benchmark inference

Evaluate model outputs

Using Local Open Source Models with vLLM

Generate benchmark data

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Zephyrus: An Agentic Framework for Weather Science

Highlights

Setup

1. Create the environment

2. Download Natural Earth

3. Prepare WeatherBench 2

4. Optional helper assets

5. Update local paths and model endpoints

6. Set API keys

7. Setup the ZephyrusBench dataset

Run Zephyrus

Start the execution server

Run benchmark inference

Evaluate model outputs

Using Local Open Source Models with vLLM

Generate benchmark data

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages