Energy Scenario Evaluation Dataset and Benchmark (Energy-EVA)

Updates!!

[2026-06-01]: Released Energy-EVA 2.0 — added a new day-ahead electricity price forecasting scene (3 sub-datasets); upgraded four third-party baselines (Chronos-2, Moirai-2.0-R-small, Toto-2.0-2.5B, TimesFM-2.5-200M); added EnergyTS V3.0.

Overview

The Energy Scenario Evaluation Dataset and Benchmark (Energy-EVA) serves as a dedicated evaluation standard for applications in the energy domain, focusing currently on zero-shot time series forecasting tasks, including renewable energy production and industrial usage. It provides a consistent framework and datasets for assessing model generalization within practical energy environments, and features a versatile structure to facilitate the future incorporation of multi-modal tasks. Looking ahead, Energy-EVA plans to broaden its scope to encompass various task types, including applications for energy-oriented large language models and visual tasks for contexts such as power grid inspections.

Key Features

Time series datasets tailored for applications in energy and electricity
Customized evaluation metrics for precise energy and electricity forecasting
Performance comparison of leading open-source models (Moirai, Chronos, TiRex, Sundial, TOTO, TimesFM) alongside our proprietary model EnergyTS V3.0.

Scene Description

We provide evaluation benchmarks for four scenarios:

Univariate electricity load forecasting - Single variable power consumption prediction with datetime information
Photovoltaic power generation forecasting with meteorological covariates - Solar power prediction incorporating weather data and datetime information
Wind power generation forecast with meteorological covariates - Wind power prediction incorporating weather data and datetime information.
Day-ahead electricity price forecasting - Hourly day-ahead electricity price prediction with datetime information.

Every scenario encompasses several sub-datasets. All data is obtained from publicly accessible and traceable platforms. Using data pre-processing and processing methods, the initial raw data is converted to standardized evaluation data files.

Dataset Description

Download link for Solar/Wind/Load

Download link for Price

Scene	Sub Dataset Name	Instance Num	Timestep Num	Source
Solar	csg_forecast_competition	107	304,760	Link
Solar	mendeley	8	5,824	Link
Solar	pvod	59	142,085	Link
Solar	solete	10	29,184	Link
Wind	csg_forecast_competition	157	293,977	Link
Wind	mendeley	12	8,752	Link
Wind	europe_offshore_wind	1,160	10,168,560	Link
Load	aemo	40	117,256	Link
Load	entsoe	73	53,352	Link
Load	active_power_load	7	5,160	Link
Load	icsuci	64	90,930	Link
Price	R1_Sim	1	8,640	-
Price	R2_Sim	1	8,832	-
Price	R3_Sim	1	8,640	-

Leaderboard

Access the evaluation detail to examine comprehensive information.

Solar Power Generation Forecasting

model_name	gmean_relative_error	avg_rank	avg_acc
EnergyTS_V3.0	0.4452	1.90	0.8296
chronos-2	0.4442	1.50	0.8274
timesfm2.5_xreg_early	0.5910	3.63	0.7938
toto_2.0_2.5B	0.7595	3.62	0.7074
tirex	0.9201	5.70	0.6724
moirai_2.0_R_small	0.9924	6.50	0.6555
sundial_base_128m	0.9749	6.30	0.6452
dummy_model	1.0000	6.85	0.6263

Wind Power Generation Forecasting

model_name	gmean_relative_error	avg_rank	avg_acc
EnergyTS_V3.0	0.0732	1.31	0.8292
chronos-2	0.2393	1.69	0.7263
timesfm2.5_xreg_early	0.3725	3.60	0.6134
tirex	0.6887	4.87	0.3619
sundial_base_128m	0.7046	5.98	0.3617
toto_2.0_2.5B	0.6797	4.60	0.3591
moirai_2.0_R_small	0.7065	5.96	0.3494
dummy_model	1.0000	8.00	0.0462

Power Load Forecasting

model_name	gmean_relative_error	avg_rank	avg_acc
EnergyTS_V3.0	0.5877	3.47	0.7071
toto_2.0_2.5B	0.6125	2.90	0.6938
chronos-2	0.6198	3.03	0.6921
moirai_2.0_R_small	0.6287	3.67	0.6894
timesfm2.5_xreg_early	0.7549	4.70	0.6725
sundial_base_128m	0.7569	5.12	0.6689
tirex	0.7766	5.40	0.6688
dummy_model	1.0000	7.72	0.6163

Day-ahead Electricity Price Forecasting

model_name	gmean_relative_error	avg_rank	avg_acc
EnergyTS_V3.0	0.5974	2.64	0.8874
chronos-2	0.7007	3.51	0.8699
timesfm2.5_xreg_early	0.7237	3.96	0.8478
toto_2.0_2.5B	0.8163	4.39	0.8447
tirex	0.8820	4.98	0.8381
moirai_2.0_R_small	0.9130	5.22	0.8321
sundial_base_128m	0.9590	5.59	0.8274
dummy_model	1.0000	5.71	0.8152

Project Structure

├── Core # Modules that are commonly used or various utilities
│   ├── Models
│   ├── Utils
│   └── __init__.py
├── LEGAL.md
├── LICENSE
├── README.md # This document
├── pyproject.toml # Project requirements with `uv`
├── time_series_portal # Primary entrance for time series benchmark
│   ├── __init__.py
│   ├── benchmark_tasks.py  # Load difference scene benchmark tasks
│   ├── config.py # Several typical configurations, like the path for storing the model
│   ├── evaluation.py # Execute this for assessment
│   ├── evaluation_methods # Implemented evaluation methods
│   │   └── third_party_methods # Contains v1 baselines (chronos/, moirai/, timesfm/, toto/, tirex/, sundial/) and v2 baselines (chronos2/, moirai2/, timesfm25/, toto2/)
│   ├── evaluation_utils  # Utilities for evaluating time series
│   ├── leaderboard_generate.py # Generate customized leaderboard
│   └── visualize_multi_model_results.py  # Visualize predictions from multiple models
└── uv.lock # Generated by `uv`

Energy-EVA employs fev for time-series forecasting evaluation purposes. The system operates by interpreting datasets and transforming them into Context and Future segments to invoke models for inference.

Customized models must derive from either ArchAdapter or CallableAdapter and include the generate method. These callable algorithms are then registered within the registry using @registry.register("algorithm_name").

The benchmarking process comprises the following components:

evaluation.py - Script for batch benchmarking
visualize_multi_model_results.py - Visualization of results from multiple models
leaderboard_generate.py - Leaderboard construction

Pipeline

Create virtual environment:

# cd into Energy-EVA
pip install uv
uv venv . # Substitute this with the path to your specific virtual environment
uv sync   # Install basic requirements. For third-party models, add necessary packages (e.g., chronos-forecasting).
source .venv/bin/activate

Run evaluation:

python time_series_portal/evaluation.py \ 
        --dataset_path PATH/TO/YOUR/DATASET/LOCATION \
        --target_path PATH/TO/YOUR/EVALUATION_RESULT/LOCATION \
        --scene wind load solar price \
        --model dummy_model \

Use python time_series_portal/evaluation.py --help to view more configuration options

Generate visualization of multi-models:

python time_series_portal/visualize_multi_model_results.py \
        --dataset_path PATH/TO/YOUR/DATASET/LOCATION \
        --target_path PATH/TO/YOUR/EVALUATION_RESULT/LOCATION \
        --scene load \
        --model dummy_model toto_2.0_2.5B

Use python time_series_portal/visualize_multi_model_results.py --help to view more configuration options

Generate leaderboard among multi-models:

python time_series_portal/leaderboard_generate.py \
        --source_path PATH/TO/YOUR/EVALUATION_RESULT/LOCATION \
        --target_path PATH/TO/YOUR/EVALUATION_RESULT/LOCATION/leaderboard \
        --select_column dataset_path

Evaluate proprietary algorithms

Examples are available in time_series_portal/evaluation_methods/third_party_methods. Implement your model in Core/Models/arch_adapter and invoke it via time_series_portal/evaluation_methods/adapter_methods.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Energy Scenario Evaluation Dataset and Benchmark (Energy-EVA)

Updates!!

Overview

Key Features

Scene Description

Dataset Description

Leaderboard

Solar Power Generation Forecasting

Wind Power Generation Forecasting

Power Load Forecasting

Day-ahead Electricity Price Forecasting

Project Structure

Pipeline

Evaluate proprietary algorithms

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Core		Core
evaluation_results		evaluation_results
time_series_portal		time_series_portal
.gitignore		.gitignore
.python-version		.python-version
LEGAL.md		LEGAL.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Energy Scenario Evaluation Dataset and Benchmark (Energy-EVA)

Updates!!

Overview

Key Features

Scene Description

Dataset Description

Leaderboard

Solar Power Generation Forecasting

Wind Power Generation Forecasting

Power Load Forecasting

Day-ahead Electricity Price Forecasting

Project Structure

Pipeline

Evaluate proprietary algorithms

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages