Skip to content

ModelTC/LightX2V

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

โšก๏ธ LightX2V:
Light Video Generation Inference Framework

logo

License Ask DeepWiki Doc Doc Papers Docker

[ English | ไธญๆ–‡ ]


LightX2V is an advanced lightweight image/video generation inference framework engineered to deliver efficient, high-performance image/video synthesis solutions. This unified platform integrates multiple state-of-the-art image/video generation techniques, supporting diverse generation tasks including text-to-video (T2V), image-to-video (I2V), text-to-image (T2I), image-editing (I2I). X2V represents the transformation of different input modalities (X, such as text or images) into vision output (Vision).

๐ŸŒ Try it online now! Experience LightX2V without installation: LightX2V Online Service - Free, lightweight, and fast AI digital human video generation platform.

๐Ÿ‘‹ Join us on WeChat.

๐Ÿงพ Community Code Contribution Guidelines

Before submitting, please ensure that the code format conforms to the project standard. You can use the following execution command to ensure the consistency of project code format.

pip install ruff pre-commit
pre-commit run --all-files

Besides the contributions from the LightX2V team, we have received contributions from some community developers, including but not limited to:

๐Ÿ”ฅ Latest News

  • December 27, 2025: ๐Ÿš€ Supported deployment on MThreads MUSA.

  • December 25, 2025: ๐Ÿš€ Supported deployment on AMD ROCm and Ascend 910B.

  • December 23, 2025: ๐Ÿš€ We support the Qwen-Image-Edit-2511 image editing model since Day 0. On a single H100 GPU, LightX2V delivers approximately 1.4ร— speedup. We support for CFG parallelism, Ulysses parallelism, and efficient offloading technologies. Our HuggingFace has been updated with CFG / step-distilled LoRA and FP8 weights. Usage examples can be found in the Python scripts. Combined with LightX2V, 4-step CFG / step distillation, and the FP8 model, the maximum acceleration can reach up to approximately 42ร—. Feel free to try LightX2V Online Service with Image to Image and Qwen-Image-Edit-2511 model.

  • December 22, 2025: ๐Ÿš€ Added Wan2.1 NVFP4 quantization-aware 4-step distilled models; weights are available on HuggingFace: Wan-NVFP4.

  • December 15, 2025: ๐Ÿš€ Supported deployment on Hygon DCU.

  • December 4, 2025: ๐Ÿš€ Supported GGUF format model inference & deployment on Cambricon MLU590/MetaX C500.

  • November 24, 2025: ๐Ÿš€ We released 4-step distilled models for HunyuanVideo-1.5! These models enable ultra-fast 4-step inference without CFG requirements, achieving approximately 25x speedup compared to standard 50-step inference. Both base and FP8 quantized versions are now available: Hy1.5-Distill-Models.

  • November 21, 2025: ๐Ÿš€ We support the HunyuanVideo-1.5 video generation model since Day 0. With the same number of GPUs, LightX2V can achieve a speed improvement of over 2 times and supports deployment on GPUs with lower memory (such as the 24GB RTX 4090). It also supports CFG/Ulysses parallelism, efficient offloading, TeaCache/MagCache technologies, and more. We will soon update more models on our HuggingFace page, including step distillation, VAE distillation, and other related models. Quantized models and lightweight VAE models are now available: Hy1.5-Quantized-Models for quantized inference, and LightTAE for HunyuanVideo-1.5 for fast VAE decoding. Refer to this for usage tutorials, or check out the examples directory for code examples.

๐Ÿ† Performance Benchmarks (Updated on 2025.12.01)

๐Ÿ“Š Cross-Framework Performance Comparison (H100)

Framework GPUs Step Time Speedup
Diffusers 1 9.77s/it 1x
xDiT 1 8.93s/it 1.1x
FastVideo 1 7.35s/it 1.3x
SGL-Diffusion 1 6.13s/it 1.6x
LightX2V 1 5.18s/it 1.9x ๐Ÿš€
FastVideo 8 2.94s/it 1x
xDiT 8 2.70s/it 1.1x
SGL-Diffusion 8 1.19s/it 2.5x
LightX2V 8 0.75s/it 3.9x ๐Ÿš€

๐Ÿ“Š Cross-Framework Performance Comparison (RTX 4090D)

Framework GPUs Step Time Speedup
Diffusers 1 30.50s/it 1x
FastVideo 1 22.66s/it 1.3x
xDiT 1 OOM OOM
SGL-Diffusion 1 OOM OOM
LightX2V 1 20.26s/it 1.5x ๐Ÿš€
FastVideo 8 15.48s/it 1x
xDiT 8 OOM OOM
SGL-Diffusion 8 OOM OOM
LightX2V 8 4.75s/it 3.3x ๐Ÿš€

๐Ÿ“Š LightX2V Performance Comparison

Framework GPU Configuration Step Time Speedup
LightX2V H100 8 GPUs + cfg 0.75s/it 1x
LightX2V H100 8 GPUs + no cfg 0.39s/it 1.9x
LightX2V H100 8 GPUs + no cfg + fp8 0.35s/it 2.1x ๐Ÿš€
LightX2V 4090D 8 GPUs + cfg 4.75s/it 1x
LightX2V 4090D 8 GPUs + no cfg 3.13s/it 1.5x
LightX2V 4090D 8 GPUs + no cfg + fp8 2.35s/it 2.0x ๐Ÿš€

Note: All the above performance data were tested on Wan2.1-I2V-14B-480P(40 steps, 81 frames). In addition, we also provide 4-step distilled models on the HuggingFace page.

๐Ÿ’ก Quick Start

For comprehensive usage instructions, please refer to our documentation: English Docs | ไธญๆ–‡ๆ–‡ๆกฃ

We highly recommend using the Docker environment, as it is the simplest and fastest way to set up the environment. For details, please refer to the Quick Start section in the documentation.

Installation from Git

pip install -v git+https://github.com/ModelTC/LightX2V.git

Building from Source

git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
uv pip install -v . # pip install -v .

(Optional) Install Attention/Quantize Operators

For attention operators installation, please refer to our documentation: English Docs | ไธญๆ–‡ๆ–‡ๆกฃ

Usage Example

# examples/wan/wan_i2v.py
"""
Wan2.2 image-to-video generation example.
This example demonstrates how to use LightX2V with Wan2.2 model for I2V generation.
"""

from lightx2v import LightX2VPipeline

# Initialize pipeline for Wan2.2 I2V task
# For wan2.1, use model_cls="wan2.1"
pipe = LightX2VPipeline(
    model_path="/path/to/Wan2.2-I2V-A14B",
    model_cls="wan2.2_moe",
    task="i2v",
)

# Alternative: create generator from config JSON file
# pipe.create_generator(
#     config_json="configs/wan22/wan_moe_i2v.json"
# )

# Enable offloading to significantly reduce VRAM usage with minimal speed impact
# Suitable for RTX 30/40/50 consumer GPUs
pipe.enable_offload(
    cpu_offload=True,
    offload_granularity="block",  # For Wan models, supports both "block" and "phase"
    text_encoder_offload=True,
    image_encoder_offload=False,
    vae_offload=False,
)

# Create generator manually with specified parameters
pipe.create_generator(
    attn_mode="sage_attn2",
    infer_steps=40,
    height=480,  # Can be set to 720 for higher resolution
    width=832,  # Can be set to 1280 for higher resolution
    num_frames=81,
    guidance_scale=[3.5, 3.5],  # For wan2.1, guidance_scale is a scalar (e.g., 5.0)
    sample_shift=5.0,
)

# Generation parameters
seed = 42
prompt = "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
negative_prompt = "้•œๅคดๆ™ƒๅŠจ๏ผŒ่‰ฒ่ฐƒ่‰ณไธฝ๏ผŒ่ฟ‡ๆ›๏ผŒ้™ๆ€๏ผŒ็ป†่Š‚ๆจก็ณŠไธๆธ…๏ผŒๅญ—ๅน•๏ผŒ้ฃŽๆ ผ๏ผŒไฝœๅ“๏ผŒ็”ปไฝœ๏ผŒ็”ป้ข๏ผŒ้™ๆญข๏ผŒๆ•ดไฝ“ๅ‘็ฐ๏ผŒๆœ€ๅทฎ่ดจ้‡๏ผŒไฝŽ่ดจ้‡๏ผŒJPEGๅŽ‹็ผฉๆฎ‹็•™๏ผŒไธ‘้™‹็š„๏ผŒๆฎ‹็ผบ็š„๏ผŒๅคšไฝ™็š„ๆ‰‹ๆŒ‡๏ผŒ็”ปๅพ—ไธๅฅฝ็š„ๆ‰‹้ƒจ๏ผŒ็”ปๅพ—ไธๅฅฝ็š„่„ธ้ƒจ๏ผŒ็•ธๅฝข็š„๏ผŒๆฏๅฎน็š„๏ผŒๅฝขๆ€็•ธๅฝข็š„่‚ขไฝ“๏ผŒๆ‰‹ๆŒ‡่žๅˆ๏ผŒ้™ๆญขไธๅŠจ็š„็”ป้ข๏ผŒๆ‚ไนฑ็š„่ƒŒๆ™ฏ๏ผŒไธ‰ๆก่…ฟ๏ผŒ่ƒŒๆ™ฏไบบๅพˆๅคš๏ผŒๅ€’็€่ตฐ"
image_path="/path/to/img_0.jpg"
save_result_path = "/path/to/save_results/output.mp4"

# Generate video
pipe.generate(
    seed=seed,
    image_path=image_path,
    prompt=prompt,
    negative_prompt=negative_prompt,
    save_result_path=save_result_path,
)

NVFP4 (quantization-aware 4-step) resources

  • Inference examples: examples/wan/wan_i2v_nvfp4.py (I2V) and examples/wan/wan_t2v_nvfp4.py (T2V).
  • NVFP4 operator build/install guide: see lightx2v_kernel/README.md.

๐Ÿ’ก More Examples: For more usage examples including quantization, offloading, caching, and other advanced configurations, please refer to the examples directory.

๐Ÿค– Supported Model Ecosystem

Official Open-Source Models

Quantized and Distilled Models/LoRAs (๐Ÿš€ Recommended: 4-step inference)

Lightweight Autoencoder Models (๐Ÿš€ Recommended: fast inference & low memory usage)

Autoregressive Models

๐Ÿ”” Follow our HuggingFace page for the latest model releases from our team.

๐Ÿ’ก Refer to the Model Structure Documentation to quickly get started with LightX2V

๐Ÿš€ Frontend Interfaces

We provide multiple frontend interface deployment options:

  • ๐ŸŽจ Gradio Interface: Clean and user-friendly web interface, perfect for quick experience and prototyping
  • ๐ŸŽฏ ComfyUI Interface: Powerful node-based workflow interface, supporting complex video generation tasks
  • ๐Ÿš€ Windows One-Click Deployment: Convenient deployment solution designed for Windows users, featuring automatic environment configuration and intelligent parameter optimization

๐Ÿ’ก Recommended Solutions:

  • First-time Users: We recommend the Windows one-click deployment solution
  • Advanced Users: We recommend the ComfyUI interface for more customization options
  • Quick Experience: The Gradio interface provides the most intuitive operation experience

๐Ÿš€ Core Features

๐ŸŽฏ Ultimate Performance Optimization

  • ๐Ÿ”ฅ SOTA Inference Speed: Achieve ~20x acceleration via step distillation and system optimization (single GPU)
  • โšก๏ธ Revolutionary 4-Step Distillation: Compress original 40-50 step inference to just 4 steps without CFG requirements
  • ๐Ÿ› ๏ธ Advanced Operator Support: Integrated with cutting-edge operators including Sage Attention, Flash Attention, Radial Attention, q8-kernel, sgl-kernel, vllm

๐Ÿ’พ Resource-Efficient Deployment

  • ๐Ÿ’ก Breaking Hardware Barriers: Run 14B models for 480P/720P video generation with only 8GB VRAM + 16GB RAM
  • ๐Ÿ”ง Intelligent Parameter Offloading: Advanced disk-CPU-GPU three-tier offloading architecture with phase/block-level granular management
  • โš™๏ธ Comprehensive Quantization: Support for w8a8-int8, w8a8-fp8, w4a4-nvfp4 and other quantization strategies

๐ŸŽจ Rich Feature Ecosystem

  • ๐Ÿ“ˆ Smart Feature Caching: Intelligent caching mechanisms to eliminate redundant computations
  • ๐Ÿ”„ Parallel Inference: Multi-GPU parallel processing for enhanced performance
  • ๐Ÿ“ฑ Flexible Deployment Options: Support for Gradio, service deployment, ComfyUI and other deployment methods
  • ๐ŸŽ›๏ธ Dynamic Resolution Inference: Adaptive resolution adjustment for optimal generation quality
  • ๐ŸŽž๏ธ Video Frame Interpolation: RIFE-based frame interpolation for smooth frame rate enhancement

๐Ÿ“š Technical Documentation

๐Ÿ“– Method Tutorials

๐Ÿ› ๏ธ Deployment Guides

๐Ÿค Acknowledgments

We sincerely thank all the model repositories and research communities that inspired and promoted the development of LightX2V. This framework is built on the collective efforts of the open-source community. It includes but is not limited to:

๐ŸŒŸ Star History

Star History Chart

โœ๏ธ Citation

If you find LightX2V useful in your research, please consider citing our work:

@misc{lightx2v,
 author = {LightX2V Contributors},
 title = {LightX2V: Light Video Generation Inference Framework},
 year = {2025},
 publisher = {GitHub},
 journal = {GitHub repository},
 howpublished = {\url{https://github.com/ModelTC/lightx2v}},
}

๐Ÿ“ž Contact & Support

For questions, suggestions, or support, please feel free to reach out through:


Built with โค๏ธ by the LightX2V team

Releases

No releases published

Packages

No packages published

Contributors 31