FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance
Quanhao Li1, Zhen Xing1, Rui Wang1, Haidong Cao1, Qi Dai2, Daoguo Dong1 and Zuxuan Wu11 Fudan University; 2 Microsoft Research Asia
Recent advances in trajectory-controllable video generation have achieved remarkable progress. Previous methods mainly use adapter-based architectures for precise motion control along predefined trajectories. However, all these methods rely on a multi-step denoising process, leading to substantial time redundancy and computational overhead. While existing video distillation methods successfully distill multi-step generators into few-step, directly applying these approaches to trajectory-controllable video generation results in noticeable degradation in both video quality and trajectory accuracy. To bridge this gap, we introduce FlashMotion, a novel training framework designed for few-step trajectory-controllable video generation. We first train a trajectory adapter on a multi-step video generator for precise trajectory control. Then, we distill the generator into a few-step version to accelerate video generation. Finally, we finetune the adapter using a hybrid strategy that combines diffusion and adversarial objectives, aligning it with the few-step generator to produce high-quality, trajectory-accurate videos. For evaluation, we introduce FlashBench, a benchmark for long-sequence trajectory-controllable video generation that measures both video quality and trajectory accuracy across varying numbers of foreground objects. Experiments on two adapter architectures show that FlashMotion surpasses existing video distillation methods and previous multi-step models in both visual quality and trajectory consistency.
2026/03/13🔥🔥We released FlashMotion, including its training code, inference code, model weights and also the evaluation benchmark.2026/02🔥🔥🔥 FlashMotion has been accepted by CVPR2026!
- 💡 Abstract
- 📣 Updates
- 📑 Table of Contents
- ✅ TODO List
- 🐍 Installation
- 📦 Model Weights
- ⛽️ Dataset Prepare
- 🔄 Inference
- 🏎️ Train
- 🤝 Acknowledgements
- 📚 Contact
- Release our inference code and model weights
- Release our training code
- Release our evaluation benchmark
# Clone this repository.
git clone https://github.com/quanhaol/FlashMotion
cd FlashMotion
# Install requirements
conda create -n flashmotion python=3.10 -y
conda activate flashmotion
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
python setup.py developFlashMotion
└── ckpts
├── FastGenerator
│ ├── model.pt
├── SlowAdapter
│ ├── ResNet
│ └── model.pt
│ ├── ControlNet
│ └── model.pt
├── FastAdapter
│ ├── ResNet
│ └── model.pt
│ ├── ControlNet
│ └── model.pt
Please use the following commands to download the model weights
pip install "huggingface_hub[hf_transfer]"
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download quanhaol/FlashMotion --local-dir ckpts
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir wan_models/Wan2.2-TI2V-5BAll three training stages of FlashMotion uses MagicData, an open-sourced dataset built for trajectory-controllable video generation. Please follow this README to download and extract the data in a proper path on your machine.
The dataset structure can be organized as follows:
MagicData
├── videos
│ ├── videoid_1.mp4
│ ├── videoid_2.mp4
│ ├── ...
├── masks
│ ├── videoid_1
│ │ ├── annotated_frame_00000.png
│ │ ├── annotated_frame_00001.png
│ │ ├── ...
│ ├── videoid_2
│ │ ├── ...
├── boxs
│ ├── videoid_1
│ │ ├── annotated_frame_00000.png
│ │ ├── annotated_frame_00001.png
│ │ ├── ...
│ ├── videoid_2
│ │ ├── ...
├── MagicData.csv # detailed information of each video
The Inference process requires around 42 GiB GPU memory to use the ResNet FastAdapter and 50GiB GPU memory to use the ControlNet FastAdapter, all tested on a single NVIDIA A100 GPU.
⚡️⚡️⚡️ It takes only 11 seconds for denoising a video using the ResNet Adapter, and around 24 seconds to denoise a video using the ControlNet Adapter.
We here provide demo scripts to run both types of trajectory adapter.
# Demo inference script of each adapter type
bash running_scripts/inference/i2v_control_fewstep_controlnet.sh
bash running_scripts/inference/i2v_control_fewstep_resnet.shWe also provide sample input image and trajectory maps in ./assets.
Feel free to replace the --prompt, --image, --trajectory with your customized input prompt, input image and input trajectory maps.
Note: If you want to build your own trajectory maps, please refer to the box trajectory construction pipeline introduced in MagicMotion.
We here provide scripts for all three training stages of FlashMotion, including training the SlowAdapter, FastGenerator, and the FastAdapter.
In this stage, we first train the SlowAdapter using the mask annotations in MagicData, and then finetune it using bounding box as the trajectory maps conditions.
# Demo training script of SlowAdapter
bash running_scripts/train/stage1_mask.sh
bash running_scripts/train/stage1_box.shIn this stage, we distill the Wan2.2-TI2V-5B model into a 4-steps image-to-video generation model, named as the FastGenerator.
# Demo training script of FastGenerator
bash running_scripts/train/stage2.shIn this stage, we trains the FastAdapter to fit with the FastGenerator and enable few-step trajectory controllable video generation.
# Demo training script of FastGenerator
bash running_scripts/train/stage3.shWe would like to express our gratitude to the following open-source projects that have been instrumental in the development of our project:
- Wan: An open sourced base video generation model.
- Self-Forcing and Causvid: Two frameworks that pioneer the field of distilling video generation methods.
- MagicMotion: An open source trajectory-controllable video generation framework.
- Wan2.2-TI2V-5B-Turbo: An open source step distillation image-to-video generation framework that distill Wan2.2-5B-TI2V model into 4 steps.
Special thanks to the contributors of these libraries for their hard work and dedication!
If you have any suggestions or find our work helpful, feel free to contact us
Email: liqh24@m.fudan.edu.cn
If you find our work useful, please consider giving a star to this github repository and citing it:
@misc{li2026flashmotionfewstepcontrollablevideo,
title={FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance},
author={Quanhao Li and Zhen Xing and Rui Wang and Haidong Cao and Qi Dai and Daoguo Dong and Zuxuan Wu},
year={2026},
eprint={2603.12146},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.12146},
}