Hierarchical image colorization model combining a Swin Transformer encoder with an EMA-based VQ-VAE bottleneck and a residual decoder. Learns discrete color representations and produces realistic, perceptually consistent colorizations
computer-vision deep-learning pytorch transformer gan image-colorization representation-learning perceptual-losses image-restoration encoder-decoder lsgan vq-vae generative-modeling swin-transformer lab-color-space vqvae2
-
Updated
Nov 26, 2025 - Jupyter Notebook