VBVR-EvalKit

The official evaluation toolkit for Very Big Video Reasoning (VBVR). Unified inference and evaluation across 37 video generation models.

37 Models: Commercial APIs (Luma, Veo, Kling, Sora, Runway) and open-source models (LTX-Video, LTX-2, HunyuanVideo, SVD, WAN, CogVideoX, and more)
VBVR-Bench: 100+ rule-based evaluators with deterministic 0–1 scores and no API calls
Coming Soon: Human evaluation (Gradio) and VLM-as-a-Judge (GPT-4o, InternVL, Qwen3-VL)

Quick Start

# Install
git clone https://github.com/Video-Reason/VBVR-EvalKit.git && cd VBVR-EvalKit
python -m venv venv && source venv/bin/activate
pip install -e .

# Setup a model
bash setup/install_model.sh --model svd --validate

# Inference
python examples/generate_videos.py --questions-dir setup/test_assets/ --output-dir ./outputs --model svd

# Evaluation (VBVR-Bench)
python examples/score_videos.py --inference-dir ./outputs

Evaluation

VBVR-Bench matches each task to a rule-based evaluator by the generator name in the directory path. The evaluator needs both the generated video and reference data side by side:

{model}/{generator_name}/{task_type}/{task_id}/{run_id}/
    ├── video/output.mp4          # generated video
    └── question/                 # reference data
        ├── first_frame.png
        ├── final_frame.png
        ├── prompt.txt
        └── ground_truth.mp4     # optional

python examples/score_videos.py --inference-dir ./outputs           # task_specific score only
python examples/score_videos.py --inference-dir ./outputs --full-score  # all 5 dimensions

See docs/En/SCORING.md for the full end-to-end workflow, scoring dimensions, output format, and CLI reference.

API Keys (Inference Only)

cp env.template .env
# LUMA_API_KEY=... OPENAI_API_KEY=... GEMINI_API_KEY=... KLING_API_KEY=... RUNWAYML_API_SECRET=...

Docs

Topic	Link
Scoring (VBVR-Bench)	docs/SCORING.md
Inference	docs/INFERENCE.md
Supported Models	docs/MODELS.md
Adding Models	docs/ADDING_MODELS.md
End-to-End Workflow	docs/DATA_GENERATOR.md
FAQ	docs/FAQ.md

Links

Website: Video-Reason.com
Paper: A Very Big Video Reasoning Suite
Slack: Join our workspace
HuggingFace: Video-Reason
Contact: hokinxqdeng@gmail.com

Citation

If you use VBVR in your research, please cite:

@article{vbvr2026,
  title   = {A Very Big Video Reasoning Suite},
  author  = {Wang, Maijunxian and Wang, Ruisi and Lin, Juyi and Ji, Ran and
             Wiedemer, Thadd{\"a}us and Gao, Qingying and Luo, Dezhi and
             Qian, Yaoyao and Huang, Lianyu and Hong, Zelong and Ge, Jiahui and
             Ma, Qianli and He, Hang and Zhou, Yifan and Guo, Lingzi and
             Mei, Lantao and Li, Jiachen and Xing, Hanwen and Zhao, Tianqi and
             Yu, Fengyuan and Xiao, Weihang and Jiao, Yizheng and
             Hou, Jianheng and Zhang, Danyang and Xu, Pengcheng and
             Zhong, Boyang and Zhao, Zehong and Fang, Gaoyun and Kitaoka, John and
             Xu, Yile and Xu, Hua and Blacutt, Kenton and Nguyen, Tin and
             Song, Siyuan and Sun, Haoran and Wen, Shaoyue and He, Linyang and
             Wang, Runming and Wang, Yanzhi and Yang, Mengyue and Ma, Ziqiao and
             Milli{\`e}re, Rapha{\"e}l and Shi, Freda and Vasconcelos, Nuno and
             Khashabi, Daniel and Yuille, Alan and Du, Yilun and Liu, Ziming and
             Lin, Dahua and Liu, Ziwei and Kumar, Vikash and Li, Yijiang and
             Yang, Lei and Cai, Zhongang and Deng, Hokin},
  journal = {arXiv preprint arXiv:2602.20159},
  year    = {2026},
  url     = {https://arxiv.org/abs/2602.20159}
}

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 699 Commits
assets		assets
docs		docs
examples		examples
script		script
setup		setup
submodules		submodules
vbvrevalkit		vbvrevalkit
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
env.template		env.template
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VBVR-EvalKit

Quick Start

Evaluation

API Keys (Inference Only)

Docs

Links

Citation

License

About

Uh oh!

Releases 5

Packages

Contributors 20

Languages

License

Video-Reason/VBVR-EvalKit

Folders and files

Latest commit

History

Repository files navigation

VBVR-EvalKit

Quick Start

Evaluation

API Keys (Inference Only)

Docs

Links

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 20

Languages

Packages