Skip to content

Rose-STL-Lab/Zephyrus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zephyrus: An Agentic Framework for Weather Science

arXiv ICLR 2026 Dataset

Accepted at the International Conference on Learning Representations, 2026.

Authors
Sumanth Varambally, Marshall Fisher, Jas Thakker, Yiwei Chen, Zhirui Xia, Yasaman Jafari, Ruijia Niu, Manas Jain, Veeramakali Vignesh Manivannan, Zachary Novack, Luyu Han, Srikar Eranky, Salva Rühling Cachay, Taylor Berg-Kirkpatrick, Duncan Watson-Parris, Yian Ma, and Rose Yu

Links
arXiv | ICLR 2026 Poster | Dataset

Zephyrus pairs an LLM with ZephyrusWorld, a weather-science execution environment that exposes WeatherBench 2 data, geolocation utilities, forecasting, simulation, and climatology tools through Python APIs. This repository contains the agent implementations, the code execution server, the benchmark/evaluation pipeline, and the task-generation code used in the paper.

Zephyrus overview

Figure from the paper: Zephyrus writes code, executes it against weather tools and datasets, observes the result, and iterates before answering.

Highlights

  • ZephyrusWorld unifies WeatherBench 2 access, Natural Earth geolocation, a Stormer forecaster, a JAX-based simulator, and climatology queries.
  • Zephyrus-Direct solves questions with one generated program; Zephyrus-Reflective uses a multi-turn execute-observe-refine loop.
  • ZephyrusBench contains 2,230 question-answer pairs across 49 weather-science tasks.
  • Zephyrus improves correctness over text-only baselines by up to 44.2 percentage points.

Zephyrus paper results

Percentage of benchmark questions answered correctly across LLM backbones and model variants.

Code execution server

The FastAPI-based execution server distributes requests across workers and pooled weather tools.

Setup

1. Create the environment

curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv .venv --python 3.11
source .venv/bin/activate
uv sync --active

2. Download Natural Earth

mkdir -p assets/NaturalEarth
cd assets/NaturalEarth
wget https://naciscdn.org/naturalearth/packages/natural_earth_vector.zip
unzip natural_earth_vector.zip
rm natural_earth_vector.zip
cd ../../

3. Prepare WeatherBench 2

Download the WeatherBench 2 ERA5 zarr used by the project and point configs/paths/default.yaml wb2_path to it. Note that this step requires you to install gsutil.

#!/bin/bash

DATA_DIR="./data/" # update to the desired path
DATASET="1959-2023_01_10-6h-240x121_equiangular_with_poles_conservative.zarr"
subdirs1=(
  10m_u_component_of_wind
  10m_v_component_of_wind
  2m_temperature
  geopotential
  geopotential_at_surface
  land_sea_mask
  latitude
  level
  longitude
  mean_sea_level_pressure
  soil_type
  specific_humidity
  surface_pressure
  temperature
  time
  u_component_of_wind
  v_component_of_wind
  mean_top_downward_short_wave_radiation_flux
)

DATA_DIR="$DATA_DIR/$DATASET"
mkdir -p "$DATA_DIR"
cd "$DATA_DIR"
gsutil -m cp -n \
  "gs://weatherbench2/datasets/era5/$DATASET/.zattrs" \
  "gs://weatherbench2/datasets/era5/$DATASET/.zgroup" \
  "gs://weatherbench2/datasets/era5/$DATASET/.zmetadata" \
  .
for subdir in "${subdirs1[@]}"; do
  echo "Downloading $subdir"
  gsutil -m cp -r -n "gs://weatherbench2/datasets/era5/$DATASET/$subdir" .
done

echo "Downloaded data to $DATA_DIR"

WeatherBench 2 is large. Plan for roughly 550 GB of free disk space.

4. Optional helper assets

Install Stormer and fetch the checkpoint:

sh scripts/install_stormer.sh

To download the cache, review and adapt scripts/download_cache.sh before running it so its output directories match your local config, then run:

sh scripts/download_cache.sh

5. Update local paths and model endpoints

Before running the code, update these configs for your machine:

  • configs/paths/default.yaml: wb2_path, natural_earth_path, climatology_cache_dir, model_output_dir, model_output_cache_dir
  • configs/model/agent.yaml, configs/model/pal.yaml, configs/model/api_text_only_llm.yaml: base_url, api_llm_model
  • configs/server.yaml: gpu_pool, num_workers, port
  • configs/eval/evaluator.yaml: evaluator backend/model if you are not using the default OpenAI endpoint

6. Set API keys

For OpenAI-compatible backends:

export OPENAI_API_KEY="<your-key>"

If your local or self-hosted OpenAI-compatible endpoint ignores auth, a dummy value is enough:

export OPENAI_API_KEY="none"

For Gemini-based configs:

export GOOGLE_API_KEY="<your-key>"

7. Setup the ZephyrusBench dataset

Download the ZephyrusBench dataset from this link, and update the dataset_path variable in configs/paths/default.yaml.

Make sure to move the simulation_outputs folder to cache/simulation_outputs.

Run Zephyrus

Start the execution server

python -m src.code_execution.server

The server uses configs/server.yaml and defaults to port 8000.

Run benchmark inference

The benchmark loader expects a directory containing one or more .json files. The benchmark can be downloaded from the Huggingface link.

Run the reflective agent:

python -m src.run model=agent

Run the single-shot PAL baseline (Zephyrus-Direct):

python -m src.run model=pal

Run the text-only baseline:

python -m src.run model=api_text_only_llm

To resume from cached predictions:

python -m src.run model=agent resume=true

api_text_only_llm does not need the execution server. agent and pal do.

Evaluate model outputs

python -m src.evaluate

Evaluation scans model_outputs/, writes per-run _processed.json files, and produces summary logs. The evaluator also uses an LLM backend for answer extraction/verification.

Using Local Open Source Models with vLLM

Example vLLM server launch:

CUDA_VISIBLE_DEVICES=2,3 python -m vllm.entrypoints.openai.api_server \
  --model /data/qwen_models/qwen3-coder-30b \
  --dtype bfloat16 \
  --tensor-parallel-size 2 \
  --gpu-memory-utilization 0.85 \
  --port 8008

To plug a local model into Zephyrus:

  1. Point llm_client.base_url to http://localhost:8008/v1.
  2. Set llm_client.api_llm_model to the model name exposed by vLLM.
  3. Export OPENAI_API_KEY="none" if your local endpoint does not enforce auth.
  4. Start the Zephyrus code execution server in another terminal if you are using agent or pal.
  5. Run inference normally with python -m src.run model=....

This uses the same OpenAI-compatible client path as hosted endpoints, so no separate local-model code path is required.

Generate benchmark data

If you want to regenerate task data instead of using the released dataset, first edit the config files in configs/data/default.yaml. Then, run:

python -m src.generate

Generated outputs are written under the configured save directories in configs/paths/default.yaml.

Example output:

Generating prompt for prompt id: HIuSnl question id: dvzBtE
The following data shows the global data over a period of 42 hours, sampled at an interval of 6 hours. {'variables': ['2m_temperature'], 'time_indices': [496474, 496480, 496486, 496492, 496498, 496504, 496510]} Based on the above data, answer the following question: What is the median 2m_temperature in Rîşcani,MD? Based on the provided data, the median 2m_temperature at Rîşcani,MD is 292.8622131347656.

Generating prompt for prompt id: HIuSnl question id: QaajpN
The following data shows the global data over a period of 42 hours, sampled at an interval of 6 hours. {'variables': ['2m_temperature'], 'time_indices': [523641, 523647, 523653, 523659, 523665, 523671, 523677]} Based on the above data, answer the following question: Which continent experienced the lowest 2m_temperature? Based on the provided data, Antarctica experienced the lowest 2m_temperature over the specified time-period, with the lowest 2m_temperature of 205.49594116210938.

Citation

If you found our work useful, please consider citing our paper.

@inproceedings{
  varambally2026zephyrus,
  title={Zephyrus: An Agentic Framework for Weather Science},
  author={Sumanth Varambally and Marshall Fisher and Jas Thakker and Yiwei Chen and Zhirui Xia and Yasaman Jafari and Ruijia Niu and Manas Jain and Veeramakali Vignesh Manivannan and Zachary Novack and Luyu Han and Srikar Eranky and Salva R{\"u}hling Cachay and Taylor Berg-Kirkpatrick and Duncan Watson-Parris and Yian Ma and Rose Yu},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=aVeaNahsID}
}

About

[ICLR 2026] Zephyrus: An Agentic Framework For Weather Science

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors