Unified multimodal model for image editing, generation, and understanding
Welcome to the Skywork-UniPic repository!
This repository hosts the model weights and official implementations of unipic unified multimodal series, featuring three distinct modeling paradigms:
-
UniPic-3 (README) — 🔥 Open-source SOTA Multi-Image Editing Model. Unified framework for single-image editing & multi-image composition. Supports 1–6 input images with flexible resolutions. 8-step inference with 12.5× speedup via CM + DMD distillation.
-
UniPic-2(README) — SD3.5M-Kontext and MetaQuery variants based on Efficient Architectures with Diffusion Post-Training, delivering state-of-the-art performance in text-to-image generation, fine-grained image editing, and multimodal reasoning.
-
UniPic-1(README) — 1.5B parameters, Unified Autoregressive Modeling for joint visual understanding and generation, enabling a single transformer to handle both perception and synthesis tasks.
- 🎨 Text-to-Image Generation — High-fidelity synthesis from natural language prompts.
- 🛠 Image Editing — Seamless inpainting, outpainting, and object manipulation.
- 🖼 Image Understanding — Robust perception capabilities for various visual tasks.
- ⚡ Efficient Architecture — Optimized for both accuracy and deployability.
This project is licensed under the MIT License — see the LICENSE file for details.

