BWM is a physically consistent, action-conditioned video world model built upon Wan2.2-TI2V-5B, serving as a low-cost yet high-fidelity simulator for robotic manipulation.
- [2026-05] π Inference code released! Generate action-conditioned robot manipulation videos with BWM. See π οΈ Usage.
- [2026-05] π Model definition released! The BWM architecture and core model components are now available.
- β TODO
- ποΈ Framework
- π¬ Qualitative Results
- π οΈ Usage
- ποΈ Training
- π Acknowledgements
- π Citing
- Release inference code
- Release model definition
- Release model weights
- Release training code
- Release technical report
Coming soon !
The following simulation scenes are generated autoregressively by BWM from initial frames and action sequences in the WorldArena test set, achieving high-fidelity visual realism while maintaining long-horizon physical consistency.
![]() |
![]() |
- Task: arrange blocks by size, stack bowls
- Challenge: Multi-object spatial ordering, stacking stability, and contact-rich placement
- Ours:
- β Preserves object identity and target layout
- β Maintains stable stacking contacts
- β Predicts adaptive gripper control
![]() |
![]() |
- Task: open microwave, open laptop
- Challenge: Articulated hinge motion, constrained rotation, and persistent object state
- Ours:
- β Captures hinge-constrained opening dynamics
- β Maintains coherent object geometry during rotation
- β Preserves opened states over long-horizon rollouts
![]() |
![]() |
![]() |
![]() |
- Task: turn switch, hang mug, click bell, stamp seal
- Challenge: Small contact regions, constrained placement, and precise state-changing interactions
- Ours:
- β Captures fine-grained affordance dynamics
- β Aligns contact with object affordances
- β Preserves state-changing interactions
![]() |
![]() |
- Task: hand over block, hand over mic
- Challenge: Dual-arm synchronization, inter-arm occlusion, and coordinated grasp timing
- Ours:
- β Models synchronized dual-arm motion
- β Preserves object continuity
- β Avoids close-contact collisions
![]() |
![]() |
- Task: put object in cabinet, put bottles in dustbin
- Challenge: Long-horizon transport, partial occlusion, and constrained final placement
- Ours:
- β Maintains long-horizon scene coherence
- β Handles occlusion without object drift
- β Produces stable constrained placement
To test generalization beyond benchmark initial states, we use GPT-Image-2-created initial scenes with original robot action sequences and let BWM autoregressively roll out the future under object appearance shifts.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
- Task: shake bottle, put object in cabinet
- Challenge: Novel initial scenes and object appearance shifts
- Ours:
- β Generalizes to GPT-Image-2-created initial scenes
- β Preserves action-conditioned dynamics
- β Maintains coherent robot-object interaction
# Create conda environment
conda create -n BWM python=3.10.20
conda activate BWM
# Install PyTorch with CUDA support
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
# Install DiffSynth-Studio
pip install diffsynth==2.0.11
# Install dependencies
pip install -r requirements.txtDownload the Wan2.2-TI2V-5B base model from ModelScope:
modelscope download --model Wan-AI/Wan2.2-TI2V-5B --local_dir models/Wan2.2-TI2V-5BDownload the BWM checkpoint from Hugging Face:
hf download BLM-Lab/Boundless-World-Model step-12000.safetensors --local-dir ckpt/BLMThe demo metadata, videos, actions, and normalization statistics are already included under demo/.
Set local paths before running inference:
cp scripts/local.example.sh scripts/local.shUpdate MODEL_PATHS and CKPT_PATH in scripts/local.sh, then run:
bash scripts/infer_example.shComing soon !
This project builds upon the following open-source projects and benchmarks. We thank these teams for their contributions:
- Wan2.2: https://github.com/Wan-Video/Wan2.2
- DiffSynth-Studio: https://github.com/modelscope/DiffSynth-Studio
- WorldArena: https://github.com/tsinghua-fib-lab/WorldArena/
- ABot-PhysWorld: https://github.com/amap-cvlab/ABot-PhysWorld
We also acknowledge the following engineering contributions:
- Wentao Tan: basic architecture design Β· Email Β· GitHub
- Zengrong Lin: core code implementation Β· Email Β· GitHub
- Yang Sun: code refactoring and software maintainability Β· Email Β· GitHub
If you find BWM is useful in your research or applications, please consider giving us a star π.

















