Blending Concepts with Text-to-Image Diffusion Models

Authors: Lorenzo Olearo*, Giorgio Longari*, Simone Melzi, Alessandro Raganato, Rafael Peñaloza

This repository contains the implementation of the papers "How to Blend Concepts in Diffusion Models" presented at KMG@ECCV 2024, as well as the latest and extended version "Blending Concepts with Text-to-Image Diffusion Models" currently under review.

- Blending Concepts with Text-to-Image Diffusion Models
- How to Blend Concepts in Diffusion Models

Abstract

Diffusion models have dramatically advanced text-to-image generation in recent years, translating abstract concepts into high-fidelity images with remarkable ease. In this work, we examine whether they can also blend distinct concepts, ranging from concrete objects to intangible ideas, into coherent new visual entities under a zero-shot framework. Specifically, concept blending merges the key attributes of multiple concepts (expressed as textual prompts) into a single, novel image that captures the essence of each concept. We investigate four blending methods, each exploiting different aspects of the diffusion pipeline (e.g., prompt scheduling, embedding interpolation, or layer-wise conditioning). Through systematic experimentation across diverse concept categories, such as merging concrete concepts, synthesizing compound words, transferring artistic styles, and blending architectural landmarks, we show that modern diffusion models indeed exhibit creative blending capabilities without further training or fine-tuning. Our extensive user study, involving 100 participants, reveals that no single approach dominates in all scenarios: each blending technique excels under certain conditions, with factors like prompt ordering, conceptual distance, and random seed affecting the outcome. These findings highlight the remarkable compositional potential of diffusion models while exposing their sensitivity to seemingly minor input variations.

Installation

To install the required dependencies, run:

pip install -r requirements.txt

Usage

To run the blending process, use the following command:

python src/main.py <config_path> [--overwrite]

<config_path>: Path to the configuration file (default: config.json).
--overwrite: Overwrite the output directory if it exists.

Blending Methods

The project supports the following blending methods:

SWITCH

This method initializes the Blended Diffusion Pipeline using the SwitchPipeline class. It blends the models by switching between different components during the diffusion process.

UNET

This method initializes the Blended in UNet Pipeline using the UnetPipeline class. It blends the models by combining the base UNet model with a blended UNet model.

TEXTUAL

This method initializes the Blended Interpolated Prompts Pipeline using the TextualPipeline class. It blends the models by interpolating between different textual prompts.

ALTERNATE

This method initializes the Blended Alternate UNet Pipeline using the AlternatePipeline class. It blends the models by alternating between different UNet models during the diffusion process.

Configuration

The configuration file (config.json) should follow the sample below:

{
    "device": "cuda:0",
    "seeds": [21, 314, 561, 41041, 9746347772161, 1007, 11051999, 27092000, 20071969, 4101957],
    "prompt_1": "cat",
    "prompt_2": "lion",
    "blend_methods": ["SWITCH", "UNET", "TEXTUAL", "ALTERNATE"],
    "timesteps": 25,
    "scheduler": "UniPCMultistepScheduler",
    "model_id": "CompVis/stable-diffusion-v1-4",
    "height": 512,
    "width": 512,
    "latent_scale": 8,
    "guidance_scale": 7.5,
    "from_timestep": 8, 
    "to_timestep": 25,
    "blend_ratio": 0.5, 
    "same_base_latent": true
}

device: Device to run the code (default: cuda:0).
seeds: List of seeds for the random number generator.
prompt_1: First textual prompt.
prompt_2: Second textual prompt.
blend_methods: List of blending methods to use.
timesteps: Number of timesteps for the diffusion process.
scheduler: Scheduler to use for the diffusion process.
model_id: Model ID for the diffusion model.
height: Height of the image.
width: Width of the image.
latent_scale: Latent scale for the diffusion model.
guidance_scale: Guidance scale for the diffusion model.
from_timestep: Exclusive to the SWITCH method, controls the timestep at which the switch is performed.
to_timestep: Exclusive to the SWITCH method, controls the timestep at which the synthesis of the image is completed.
TEXTUAL_scale: The ratio of the first prompt to the second prompt in the blended image.
same_base_latent: Whether to use the same base latent for all blending methods.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any changes or improvements.

Citation

@misc{olearo2025blending,
      title={Blending Concepts with Text-to-Image Diffusion Models}, 
      author={Lorenzo Olearo and Giorgio Longari and Alessandro Raganato and Rafael Peñaloza and Simone Melzi},
      year={2025},
      eprint={2506.23630},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.23630}, 
}

@misc{olearo2024blending,
      title={How to Blend Concepts in Diffusion Models}, 
      author={Lorenzo Olearo and Giorgio Longari and Simone Melzi and Alessandro Raganato and Rafael Peñaloza},
      year={2024},
      eprint={2407.14280},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.14280}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
assets		assets
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
sample-config.json		sample-config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blending Concepts with Text-to-Image Diffusion Models

Abstract

Table of Contents

Installation

Usage

Blending Methods

SWITCH

UNET

TEXTUAL

ALTERNATE

Configuration

Contributing

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

LorenzoOlearo/blending-diffusion-models

Folders and files

Latest commit

History

Repository files navigation

Blending Concepts with Text-to-Image Diffusion Models

Abstract

Table of Contents

Installation

Usage

Blending Methods

SWITCH

UNET

TEXTUAL

ALTERNATE

Configuration

Contributing

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages