Uni-World VLA: Interleaved World Modeling and Planning for Autonomous Driving

Qiqi Liu^1,2,3*, Huan Xu³*, Jingyu Li^1,2,3, Bin Sun³†, Zhihui Hao³†, Dangen She³, Xiatian Zhu⁴, Li Zhang^1,2‡

¹Fudan University; ²Shanghai Innovation Institute; ³Li Auto Inc.; ⁴University of Surrey

* equal contribution; † project leader; ‡ corresponding author

📰 News

[2026-04] Paper released on arXiv.

🔭 Project Overview

Uni-World VLA is a unified Vision-Language-Action model for autonomous driving that performs interleaved world modeling and planning. It jointly predicts future visual observations and ego trajectories in a single autoregressive sequence, tightly coupling world understanding with planning under temporal causality.

💡 Key Features

Interleaved world modeling and planning: alternates future frame prediction and ego action/trajectory generation step-by-step, forming a closed-loop interaction that keeps planning conditioned on imagined observations.
Unified autoregressive VLA formulation: generates visual tokens and action queries in a single sequence, tightly coupling prediction and control under temporal causality.
Depth integration for geometric cues: augments historical frames with monocular depth maps and fuses geometry features via cross-attention to improve long-horizon scene prediction.

📊 Results

Table 1. Closed-loop planning results on NAVSIM.

Table 2. World modeling / prediction results on NAVSIM.

Visualization.

🛠️ Installation

1. Clone the repository

git clone --recurse-submodules https://github.com/LogosRoboticsGroup/UniWorldVLA.git
cd UniWorldVLA

Note: The DA3 (Depth Anything 3) component is included as a git submodule. If you cloned without --recurse-submodules, run:
git submodule update --init --recursive

2. Create environment

conda create -n uniworld python=3.10 -y
conda activate uniworld

3. Install dependencies

pip install torch==2.2.1 torchvision==0.17.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

Full pinned environment (including all transitive deps) is available in environment.yaml for exact reproducibility:
conda env create -f environment.yaml

📦 Pretrained Weights & Data

Model weights are released at SII-Rigby/UniWorldVLA on HuggingFace.

See SETUP.md for detailed instructions on:

Downloading backbone model weights (Show-O, Phi-1.5, MagViT-v2, I3D)
Downloading our released checkpoints (VQ tokenizer, pre-trained model, SFT checkpoint)
Preparing the NAVSIM dataset
Setting up the DA3 depth model

⚙️ Configuration

All paths are configured in configs/sft_navsim/navsim.yaml. Before running, set the base_root field to the absolute path of this repository on your machine:

experiment:
    base_root: '/path/to/UniWorldVLA'   # <-- change this

All other paths in the config are derived from base_root via OmegaConf interpolation. See SETUP.md for the expected directory layout.

🚀 Training

Single node, 8 GPUs

cd UniWorldVLA
bash scripts/finetune/navsim/run_sft_navsim_baseline8.sh

The script launches training/fine-tune_navsim.py via accelerate with the DeepSpeed ZeRO-2 config in accelerate_configs/8_gpus_deepspeed_zero2.yaml.

Key training options are controlled via environment variables (see training/fine-tune_navsim.py → configure_experiment_from_env):

Variable	Default	Description
`MAX_TRAIN_STEPS`	160000	Total training steps
`LEARNING_RATE`	1e-5	Base learning rate
`BATCH_SIZE_TRAIN_NUS`	7	Per-GPU batch size
`VIDEO_COEFF`	0.3	Weight for video prediction loss
`TJ_COEFF`	1.0	Weight for trajectory loss
`EVAL_ONLY`	0	Set to 1 to run evaluation only
`LOCAL_RUN_PWM`	0	Set to 1 for local debug mode (small subset)

Example with custom settings:

MAX_TRAIN_STEPS=50000 LEARNING_RATE=3e-5 \
bash scripts/finetune/navsim/run_sft_navsim_baseline8.sh

Local / single-GPU debug

LOCAL_RUN_PWM=1 bash scripts/finetune/navsim/run_sft_navsim_baseline8_local.sh

📐 Evaluation

EVAL_ONLY=1 EVAL_FROM_CHECKPOINT=1 \
EVAL_DIR=/path/to/checkpoint \
bash scripts/finetune/navsim/run_sft_navsim_baseline8.sh

Evaluation computes:

PDMS (PDM Score) for closed-loop planning
FVD for video prediction quality

🧾 TODO

Release arXiv paper
Release code
Release model weights

📖 Citation

@article{liu2026uniworld,
  title   = {Uni-World VLA: Interleaved World Modeling and Planning for Autonomous Driving},
  author  = {Liu, Qiqi and Xu, Huan and Li, Jingyu and Sun, Bin and Hao, Zhihui and She, Dangen and Zhu, Xiatian and Zhang, Li},
  journal = {arXiv preprint arXiv:2603.27287},
  year    = {2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
accelerate_configs		accelerate_configs
assets		assets
configs/sft_navsim		configs/sft_navsim
data_utils		data_utils
llava		llava
models		models
navsim		navsim
nuplan		nuplan
nuscenes_kit		nuscenes_kit
scripts/finetune/navsim		scripts/finetune/navsim
tests		tests
training		training
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
SETUP.md		SETUP.md
attn_mask.png		attn_mask.png
copy_matching_folder.py		copy_matching_folder.py
environment.yaml		environment.yaml
folder_traversal.py		folder_traversal.py
requirements.txt		requirements.txt
scene_path_dict.py		scene_path_dict.py
try_scene_index.py		try_scene_index.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uni-World VLA: Interleaved World Modeling and Planning for Autonomous Driving

📰 News

🔭 Project Overview

💡 Key Features

📊 Results

🛠️ Installation

1. Clone the repository

2. Create environment

3. Install dependencies

📦 Pretrained Weights & Data

⚙️ Configuration

🚀 Training

Single node, 8 GPUs

Local / single-GPU debug

📐 Evaluation

🧾 TODO

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Uni-World VLA: Interleaved World Modeling and Planning for Autonomous Driving

📰 News

🔭 Project Overview

💡 Key Features

📊 Results

🛠️ Installation

1. Clone the repository

2. Create environment

3. Install dependencies

📦 Pretrained Weights & Data

⚙️ Configuration

🚀 Training

Single node, 8 GPUs

Local / single-GPU debug

📐 Evaluation

🧾 TODO

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages