🌍 Boundless-World-Model

BWM is a physically consistent, action-conditioned video world model built upon Wan2.2-TI2V-5B, serving as a low-cost yet high-fidelity simulator for robotic manipulation.

🗞️ News

[2026-05] 🚀 Inference code released! Generate action-conditioned robot manipulation videos with BWM. See 🛠️ Usage.
[2026-05] 🎉 Model definition released! The BWM architecture and core model components are now available.

✅ TODO

🏗️ Framework

Coming soon !

🎬 Qualitative Results

CVPR 2026 WorldArena Challenge

The following simulation scenes are generated autoregressively by BWM from initial frames and action sequences in the WorldArena test set, achieving high-fidelity visual realism while maintaining long-horizon physical consistency.

🧩 Scene 1: Compositional Spatial Rearrangement

Task: arrange blocks by size, stack bowls
Challenge: Multi-object spatial ordering, stacking stability, and contact-rich placement
Ours:
- ✅ Preserves object identity and target layout
- ✅ Maintains stable stacking contacts
- ✅ Predicts adaptive gripper control

🚪 Scene 2: Articulated Hinge Interaction

Task: open microwave, open laptop
Challenge: Articulated hinge motion, constrained rotation, and persistent object state
Ours:
- ✅ Captures hinge-constrained opening dynamics
- ✅ Maintains coherent object geometry during rotation
- ✅ Preserves opened states over long-horizon rollouts

🕹️ Scene 3: Fine-Grained Affordance Interaction

Task: turn switch, hang mug, click bell, stamp seal
Challenge: Small contact regions, constrained placement, and precise state-changing interactions
Ours:
- ✅ Captures fine-grained affordance dynamics
- ✅ Aligns contact with object affordances
- ✅ Preserves state-changing interactions

🤝 Scene 4: Bimanual Coordination and Handover

Task: hand over block, hand over mic
Challenge: Dual-arm synchronization, inter-arm occlusion, and coordinated grasp timing
Ours:
- ✅ Models synchronized dual-arm motion
- ✅ Preserves object continuity
- ✅ Avoids close-contact collisions

📦 Scene 5: Long-Horizon Constrained Placement

Task: put object in cabinet, put bottles in dustbin
Challenge: Long-horizon transport, partial occlusion, and constrained final placement
Ours:
- ✅ Maintains long-horizon scene coherence
- ✅ Handles occlusion without object drift
- ✅ Produces stable constrained placement

Out-of-Distribution Generalization

To test generalization beyond benchmark initial states, we use GPT-Image-2-created initial scenes with original robot action sequences and let BWM autoregressively roll out the future under object appearance shifts.

Task: shake bottle, put object in cabinet
Challenge: Novel initial scenes and object appearance shifts
Ours:
- ✅ Generalizes to GPT-Image-2-created initial scenes
- ✅ Preserves action-conditioned dynamics
- ✅ Maintains coherent robot-object interaction

🛠️ Usage

Quick Start: Video Generation Inference

Environment Setup

# Create conda environment
conda create -n BWM python=3.10.20
conda activate BWM

# Install PyTorch with CUDA support
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128

# Install DiffSynth-Studio
pip install diffsynth==2.0.11

# Install dependencies
pip install -r requirements.txt

Model Weights

Download the Wan2.2-TI2V-5B base model from ModelScope:

modelscope download --model Wan-AI/Wan2.2-TI2V-5B --local_dir models/Wan2.2-TI2V-5B

Download the BWM checkpoint from Hugging Face:

hf download BLM-Lab/Boundless-World-Model step-12000.safetensors --local-dir ckpt/BLM

Run Inference

The demo metadata, videos, actions, and normalization statistics are already included under demo/.

Set local paths before running inference:

cp scripts/local.example.sh scripts/local.sh

Update MODEL_PATHS and CKPT_PATH in scripts/local.sh, then run:

bash scripts/infer_example.sh

🏋️ Training

Coming soon !

🙏 Acknowledgements

This project builds upon the following open-source projects and benchmarks. We thank these teams for their contributions:

Wan2.2: https://github.com/Wan-Video/Wan2.2
DiffSynth-Studio: https://github.com/modelscope/DiffSynth-Studio
WorldArena: https://github.com/tsinghua-fib-lab/WorldArena/
ABot-PhysWorld: https://github.com/amap-cvlab/ABot-PhysWorld

We also acknowledge the following engineering contributions:

Wentao Tan: basic architecture design · Email · GitHub
Zengrong Lin: core code implementation · Email · GitHub
Yang Sun: code refactoring and software maintainability · Email · GitHub

📜 Citing

If you find BWM is useful in your research or applications, please consider giving us a star 🌟.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
assets		assets
configs		configs
demo		demo
docs		docs
scripts		scripts
wan_video_action		wan_video_action
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌍 Boundless-World-Model

🗞️ News

Table of Contents

✅ TODO

🏗️ Framework

🎬 Qualitative Results

CVPR 2026 WorldArena Challenge

🧩 Scene 1: Compositional Spatial Rearrangement

🚪 Scene 2: Articulated Hinge Interaction

🕹️ Scene 3: Fine-Grained Affordance Interaction

🤝 Scene 4: Bimanual Coordination and Handover

📦 Scene 5: Long-Horizon Constrained Placement

Out-of-Distribution Generalization

🛠️ Usage

Quick Start: Video Generation Inference

Environment Setup

Model Weights

Run Inference

🏋️ Training

🙏 Acknowledgements

📜 Citing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌍 Boundless-World-Model

🗞️ News

Table of Contents

✅ TODO

🏗️ Framework

🎬 Qualitative Results

CVPR 2026 WorldArena Challenge

🧩 Scene 1: Compositional Spatial Rearrangement

🚪 Scene 2: Articulated Hinge Interaction

🕹️ Scene 3: Fine-Grained Affordance Interaction

🤝 Scene 4: Bimanual Coordination and Handover

📦 Scene 5: Long-Horizon Constrained Placement

Out-of-Distribution Generalization

🛠️ Usage

Quick Start: Video Generation Inference

Environment Setup

Model Weights

Run Inference

🏋️ Training

🙏 Acknowledgements

📜 Citing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages