DualSight: Learning to Disentangle Artifact and Semantic Features for Detection of Diffusion-Generated Images

Official repository of the ICPR26 paper "DualSight: Learning to Disentangle Artifact and Semantic Features for Detection of Diffusion-Generated Images"

Ahmed Abdullah, Nikolas Ebert & Oliver Wasenmüller
CeMOS - Research and Transfer Center, Technische Hochschule Mannheim

This repository includes everything needed to evaluate, train and build on DualSight.

Installation (Dataset Preparation)

We use a custom, webdataset-based format for our training and evaluation. For converting the OpenSDI dataset to this format, we recommend setting up a separate conda environment as follows:

conda create -n dualsight-data-prep python=3.10.16 -y
conda activate dualsight-data-prep
pip install "packaging==24.1" "pip==24.2" "setuptools==75.1.0" "wheel==0.44.0"
pip install -r requirements_data_prep.txt

Details on how to run the dataset preparation pipeline can be found in the "Data Preperation" section below.

Data Preparation

For training and evaluation we use the WebDataset format and the WebDataset libary
The WebDataset format files are .tar files, with two conventions:
- within each .tar file, files that belong together and make up a training sample share the same basename when stripped of all filename extensions
- the shards of a .tar file are numbered like 000000.tar to 012345.tar
- You can find a longer, more detailed specification of the WebDataset format in the WebDataset Format Specification
Ensure that all data is stored in the data/ folder. The folder structure for the different subsets of OpenSDI should be as follows:

data
├── train
│   └── real
│   |   ├── 000000.tar
│   |   └── ...
│   └── Stable_Diffusion_1.5_Diffusion_Model_Generated
|   |   ├── 000000.tar
│   |   └── ...
│   └── Stable_Diffusion_1.5_Diffusion_Model_Inpainted
|       ├── 000000.tar
│       └── ...
├── test
│   └── sd15
│   |   ├── shard-size
|   |   |   ├── 000000.tar
│   |   |   └── ...
│   └── sd2
│   |   └── shard-size
|   |       ├── 000000.tar
│   |       └── ...
|   └── ...

The dataset pathing in the config files should be similar to:

DATASET:
  ...
  DATAPATH:
    TRAIN:
      - data/train/real
      - data/train/Stable_Diffusion_1.5_Diffusion_Model_Inpainted
      - data/train/Stable_Diffusion_1.5_Diffusion_Model_Generated
    VAL: 
      - data/test/sd15 
  NUM_SHARDS_TRAIN: ['None', 'None', 'None'] 
  NUM_SHARDS_VAL: ['None']
  
# Test-Set
TEST:
  DATAPATH:
      - data/test/sd15
      - data/test/sd2
      - data/test/sdxl
      - data/test/sd3
      - data/test/flux
  NUM_SHARDS: ['None', 'None', 'None', 'None', 'None']

How to Convert

Each .tar file contains n images along with a corresponding .json label file. Each label entry corresponds to a single image and includes metadata used for forgery detection. Below is a description of each field:

image_id: Unique identifier for the image.
file_name: File name of the image (typically .png format).
gen_caption: Automatically generated caption describing the full image.
gen_caption_cropped: Caption describing the manipulated region (in case of inpainting).
ds_label: Ground truth class label (e.g., from ImageNet).
crop_bbox: Bounding box used to manipulate (e.g. inpinting roi) the image, in [x, y, width, height] format.
fake_type: Description of how the image was generated or manipulated (e.g., diffusion model, GAN, 'real', etc.).
ds_license: License of the dataset (e.g., MIT).
orig_license: License of the original image sample (if any).
dataset: Source dataset name (e.g., OpenSDI).

For easier dataset setup, we recommend running our conversion pipeline as follows:

Setup a separate conda environment, as detailed in the installation section above.
Ensure that you have enough storage space for both the original huggingface download (~72GB) and the converted dataset (~270GB).
Download the ClipCap weights. Refer to their Github repo for downloading the weight files.
For preparing the OpenSDI train set, run:

python3 process_opensdi_train.py \
--processed_dataset_path=/path/to/data \
--clipcap_model_path=/path/to/clipcap/conceptual_weights.pt  \
--shard_n=500 \

For preparing the OpenSDI test set, run:

python3 process_opensdi_test.py \
--processed_dataset_path=/path/to/data \
--clipcap_model_path=/path/to/clipcap/conceptual_weights.pt  \
--shard_n=10000 \

Installation (Training and Testing)

The train and test code has been tested with CUDA 11.8, Python 3.10, and PyTorch 2.2.0.
Multi-GPU setups are not supported.
Train and test configurations and scripts are optimized for usage within a Docker environment.
The setup and usage instructions for training and testing assume you are running the code inside Docker.
Tested on a Docker container using 1x Nvidia-H100 GPU.

Clone this Repo

git clone https://github.com/CeMOS-IS/dualsight
cd dualsight

Docker Set-Up

To build the docker image, run the following

docker build -t dualsight .

Adjust visible GPUs, shared memory size and mounted local directory according to your setup, then run the container.:

docker run -it --name dualsight \
    --shm-size 100G --gpus '"device=0"' \
    -v /path/to/repo/:/dualsight \
    -v /path/to/data/:/dualsight/data \
    dualsight bash

Results and Models

Results OpenSDI

Method	SD1.5 Acc.	SD2.1 Acc.	SDXL Acc.	SD3 Acc.	Flux Acc.	Avg. Acc.
CNNDet	85.04	75.94	68.72	67.08	57.57	70.87
GramNet	80.35	76.66	70.76	70.29	63.37	72.29
FreqNet	77.70	68.37	64.02	64.37	57.08	66.31
NPR	79.28	81.84	74.28	75.47	71.36	76.45
UniFD	77.60	81.92	74.83	75.17	69.06	75.72
RINE	90.98	88.12	78.76	76.78	67.02	80.33
MVSS-Net	93.65	82.33	70.42	72.13	56.78	75.06
CAT-Net	96.15	82.46	73.34	73.61	55.26	76.16
PSCC-Net	96.14	80.94	68.81	70.89	67.04	76.76
ObjectFormer	75.22	72.55	62.92	62.54	58.05	66.26
TruFor	97.73	55.62	66.41	67.51	61.62	69.78
DeCLIP	78.31	82.77	70.55	68.40	65.61	73.13
IML-ViT	75.73	61.19	49.95	51.25	43.62	56.35
MaskCLIP	92.72	89.45	81.22	78.01	68.50	81.98
DualSight (CLIP224)	85.00	93.07	91.72	90.99	80.97	88.35

Weights for the CLIP-224 variant of DualSight can be downloaded here.

Configuration

You can configure all training, evaluation, and model parameters using the provided example configuration file, based on the YACS library.

Inference

In the configs/ folder, you will find the configuration file for the CLIP-224 variant of DualSight. To run inference, download the corresponding model checkpoint from the results section and place it in the checkpoints/ directory. To run inference on a folder of test images using a trained DualSight model, use the inference.py script with the following arguments:

python3 inference.py \
    path/to/config.yaml \
    path/to/weights.pth \ 
    --data path/to/images \
    --save_dir output_folder \
    --save_file output_file_name

Arguments:

cfg (str, required): Path to the YAML config file.
ckpt (str, required): Path to the model checkpoint file (.pth).
--data (str, required): Path to the folder containing images for inference.
--save_dir (str, default=results/): Directory where predictions will be saved.
--save_file (str, default=output.json): Name of the output file (JSON format) storing predictions.

Example:

python3 inference.py configs/example_DualSight_A.yaml checkpoints/dualsight_a.pth --data example/ --save_dir results/ --save_file example.json

This will generate predictions for all images in the example/ folder and save them in results/example.json.

The inference.py script generates a JSON file that contains one entry per input image. Each entry provides classification results and additional metadata. Below is a description of the fields:

[
  {
    "image_name": "example.png",         // Filename of the input image
    "cls": "fake",                       // Predicted class label: 'Real' or 'Fake'
    "prob": 0.997,                        // Confidence of the prediction in percentage (0–100%)
    "clip_score": [                      // Optional: CLIP similarity scores (only for 'Fake' predictions)
        [0.2143, "FAKE-GAN" ],
        [0.5227, "FAKE-DIFFUSION"],
        [0.1790, "FAKE-VAE"],
        [0.0838, "FAKE-OTHERS"]
    ],
    "logit": 0.99748                     // Raw model logit value (after sigmoid)
  },
  ...
]

Evaluation

In the configs/ folder, you will find the example_config.yaml along with the configuration file for the CLIP-224 variant of DualSight. These configs can be freely modified to suit your needs. To run evaluation, download the corresponding model checkpoint from the results section and place it in the checkpoints/ directory. The evaluation data should be stored in the data/ folder, as described in the "Data Preperation" section.

python3 test.py path/to/config.yaml \
    path/to/weights.pth \
    --thresh binary_classification_threshold

Training

In the configs/ folder, you will find the example_config.yaml along with the configuration file for the CLIP-224 variant of DualSight. These configs can be freely modified to suit your needs. To train a model on OpenSDI from scratch, run:

python3 train.py path/to/config.yaml

Arguments:

--resume: loads the last checkpoint of the model

Acknowledgement

This codebase borrows from UniFD, Open-SDI and CLIP-LoRA. The dataset preparation pipeline we use employs ClipCap for image-captioning. Many thanks to the authors of these works for their contribution!

This work was partially funded by the German Federal Agency for Breakthrough Innovation (SPRIN-D).

Citing

If you have used DualSight in your research, please cite our work. 🎓

@inproceedings{Abdullah2026DualSight,
    title = {DualSight: Learning to Disentangle Artifact and Semantic Features for Detection of Diffusion-Generated Images},
    author = {Abdullah, Ahmed and Ebert, Nikolas and Wasenm{\"u}ller, Oliver},
    booktitle = {International Conference on Pattern Recognition (ICPR)},
    year = {2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
checkpoints		checkpoints
configs		configs
dataset_loaders		dataset_loaders
example		example
loralib		loralib
models		models
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
clip_caption.py		clip_caption.py
inference.py		inference.py
process_opensdi_test.py		process_opensdi_test.py
process_opensdi_train.py		process_opensdi_train.py
requirements.txt		requirements.txt
requirements_data_prep.txt		requirements_data_prep.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DualSight: Learning to Disentangle Artifact and Semantic Features for Detection of Diffusion-Generated Images

Installation (Dataset Preparation)

Data Preparation

How to Convert

Installation (Training and Testing)

Clone this Repo

Docker Set-Up

Results and Models

Results OpenSDI

Configuration

Inference

Evaluation

Training

Acknowledgement

Citing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DualSight: Learning to Disentangle Artifact and Semantic Features for Detection of Diffusion-Generated Images

Installation (Dataset Preparation)

Data Preparation

How to Convert

Installation (Training and Testing)

Clone this Repo

Docker Set-Up

Results and Models

Results OpenSDI

Configuration

Inference

Evaluation

Training

Acknowledgement

Citing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages