- [2026-06-01]: Released Energy-EVA 2.0 — added a new day-ahead electricity price forecasting scene (3 sub-datasets); upgraded four third-party baselines (Chronos-2, Moirai-2.0-R-small, Toto-2.0-2.5B, TimesFM-2.5-200M); added EnergyTS V3.0.
The Energy Scenario Evaluation Dataset and Benchmark (Energy-EVA) serves as a dedicated evaluation standard for applications in the energy domain, focusing currently on zero-shot time series forecasting tasks, including renewable energy production and industrial usage. It provides a consistent framework and datasets for assessing model generalization within practical energy environments, and features a versatile structure to facilitate the future incorporation of multi-modal tasks. Looking ahead, Energy-EVA plans to broaden its scope to encompass various task types, including applications for energy-oriented large language models and visual tasks for contexts such as power grid inspections.
- Time series datasets tailored for applications in energy and electricity
- Customized evaluation metrics for precise energy and electricity forecasting
- Performance comparison of leading open-source models (Moirai, Chronos, TiRex, Sundial, TOTO, TimesFM) alongside our proprietary model EnergyTS V3.0.
We provide evaluation benchmarks for four scenarios:
- Univariate electricity load forecasting - Single variable power consumption prediction with datetime information
- Photovoltaic power generation forecasting with meteorological covariates - Solar power prediction incorporating weather data and datetime information
- Wind power generation forecast with meteorological covariates - Wind power prediction incorporating weather data and datetime information.
- Day-ahead electricity price forecasting - Hourly day-ahead electricity price prediction with datetime information.
Every scenario encompasses several sub-datasets. All data is obtained from publicly accessible and traceable platforms. Using data pre-processing and processing methods, the initial raw data is converted to standardized evaluation data files.
Download link for Solar/Wind/Load
| Scene | Sub Dataset Name | Instance Num | Timestep Num | Source |
|---|---|---|---|---|
| Solar | csg_forecast_competition | 107 | 304,760 | Link |
| Solar | mendeley | 8 | 5,824 | Link |
| Solar | pvod | 59 | 142,085 | Link |
| Solar | solete | 10 | 29,184 | Link |
| Wind | csg_forecast_competition | 157 | 293,977 | Link |
| Wind | mendeley | 12 | 8,752 | Link |
| Wind | europe_offshore_wind | 1,160 | 10,168,560 | Link |
| Load | aemo | 40 | 117,256 | Link |
| Load | entsoe | 73 | 53,352 | Link |
| Load | active_power_load | 7 | 5,160 | Link |
| Load | icsuci | 64 | 90,930 | Link |
| Price | R1_Sim | 1 | 8,640 | - |
| Price | R2_Sim | 1 | 8,832 | - |
| Price | R3_Sim | 1 | 8,640 | - |
Access the evaluation detail to examine comprehensive information.
| model_name | gmean_relative_error | avg_rank | avg_acc |
|---|---|---|---|
| EnergyTS_V3.0 | 0.4452 | 1.90 | 0.8296 |
| chronos-2 | 0.4442 | 1.50 | 0.8274 |
| timesfm2.5_xreg_early | 0.5910 | 3.63 | 0.7938 |
| toto_2.0_2.5B | 0.7595 | 3.62 | 0.7074 |
| tirex | 0.9201 | 5.70 | 0.6724 |
| moirai_2.0_R_small | 0.9924 | 6.50 | 0.6555 |
| sundial_base_128m | 0.9749 | 6.30 | 0.6452 |
| dummy_model | 1.0000 | 6.85 | 0.6263 |
| model_name | gmean_relative_error | avg_rank | avg_acc |
|---|---|---|---|
| EnergyTS_V3.0 | 0.0732 | 1.31 | 0.8292 |
| chronos-2 | 0.2393 | 1.69 | 0.7263 |
| timesfm2.5_xreg_early | 0.3725 | 3.60 | 0.6134 |
| tirex | 0.6887 | 4.87 | 0.3619 |
| sundial_base_128m | 0.7046 | 5.98 | 0.3617 |
| toto_2.0_2.5B | 0.6797 | 4.60 | 0.3591 |
| moirai_2.0_R_small | 0.7065 | 5.96 | 0.3494 |
| dummy_model | 1.0000 | 8.00 | 0.0462 |
| model_name | gmean_relative_error | avg_rank | avg_acc |
|---|---|---|---|
| EnergyTS_V3.0 | 0.5877 | 3.47 | 0.7071 |
| toto_2.0_2.5B | 0.6125 | 2.90 | 0.6938 |
| chronos-2 | 0.6198 | 3.03 | 0.6921 |
| moirai_2.0_R_small | 0.6287 | 3.67 | 0.6894 |
| timesfm2.5_xreg_early | 0.7549 | 4.70 | 0.6725 |
| sundial_base_128m | 0.7569 | 5.12 | 0.6689 |
| tirex | 0.7766 | 5.40 | 0.6688 |
| dummy_model | 1.0000 | 7.72 | 0.6163 |
| model_name | gmean_relative_error | avg_rank | avg_acc |
|---|---|---|---|
| EnergyTS_V3.0 | 0.5974 | 2.64 | 0.8874 |
| chronos-2 | 0.7007 | 3.51 | 0.8699 |
| timesfm2.5_xreg_early | 0.7237 | 3.96 | 0.8478 |
| toto_2.0_2.5B | 0.8163 | 4.39 | 0.8447 |
| tirex | 0.8820 | 4.98 | 0.8381 |
| moirai_2.0_R_small | 0.9130 | 5.22 | 0.8321 |
| sundial_base_128m | 0.9590 | 5.59 | 0.8274 |
| dummy_model | 1.0000 | 5.71 | 0.8152 |
├── Core # Modules that are commonly used or various utilities
│ ├── Models
│ ├── Utils
│ └── __init__.py
├── LEGAL.md
├── LICENSE
├── README.md # This document
├── pyproject.toml # Project requirements with `uv`
├── time_series_portal # Primary entrance for time series benchmark
│ ├── __init__.py
│ ├── benchmark_tasks.py # Load difference scene benchmark tasks
│ ├── config.py # Several typical configurations, like the path for storing the model
│ ├── evaluation.py # Execute this for assessment
│ ├── evaluation_methods # Implemented evaluation methods
│ │ └── third_party_methods # Contains v1 baselines (chronos/, moirai/, timesfm/, toto/, tirex/, sundial/) and v2 baselines (chronos2/, moirai2/, timesfm25/, toto2/)
│ ├── evaluation_utils # Utilities for evaluating time series
│ ├── leaderboard_generate.py # Generate customized leaderboard
│ └── visualize_multi_model_results.py # Visualize predictions from multiple models
└── uv.lock # Generated by `uv`
Energy-EVA employs fev for time-series forecasting evaluation purposes. The system operates by interpreting datasets and transforming them into Context and Future segments to invoke models for inference.
Customized models must derive from either ArchAdapter or CallableAdapter and include the generate method. These callable algorithms are then registered within the registry using @registry.register("algorithm_name").
The benchmarking process comprises the following components:
evaluation.py- Script for batch benchmarkingvisualize_multi_model_results.py- Visualization of results from multiple modelsleaderboard_generate.py- Leaderboard construction
Create virtual environment:
# cd into Energy-EVA
pip install uv
uv venv . # Substitute this with the path to your specific virtual environment
uv sync # Install basic requirements. For third-party models, add necessary packages (e.g., chronos-forecasting).
source .venv/bin/activateRun evaluation:
python time_series_portal/evaluation.py \
--dataset_path PATH/TO/YOUR/DATASET/LOCATION \
--target_path PATH/TO/YOUR/EVALUATION_RESULT/LOCATION \
--scene wind load solar price \
--model dummy_model \Use python time_series_portal/evaluation.py --help to view more configuration options
Generate visualization of multi-models:
python time_series_portal/visualize_multi_model_results.py \
--dataset_path PATH/TO/YOUR/DATASET/LOCATION \
--target_path PATH/TO/YOUR/EVALUATION_RESULT/LOCATION \
--scene load \
--model dummy_model toto_2.0_2.5BUse python time_series_portal/visualize_multi_model_results.py --help to view more configuration options
Generate leaderboard among multi-models:
python time_series_portal/leaderboard_generate.py \
--source_path PATH/TO/YOUR/EVALUATION_RESULT/LOCATION \
--target_path PATH/TO/YOUR/EVALUATION_RESULT/LOCATION/leaderboard \
--select_column dataset_pathExamples are available in time_series_portal/evaluation_methods/third_party_methods. Implement your model in Core/Models/arch_adapter and invoke it via time_series_portal/evaluation_methods/adapter_methods.py.