DualSight: Learning to Disentangle Artifact and Semantic Features for Detection of Diffusion-Generated Images
Official repository of the ICPR26 paper "DualSight: Learning to Disentangle Artifact and Semantic Features for Detection of Diffusion-Generated Images"
Ahmed Abdullah, Nikolas Ebert & Oliver Wasenmüller
CeMOS - Research and Transfer Center, Technische Hochschule Mannheim
This repository includes everything needed to evaluate, train and build on DualSight.
We use a custom, webdataset-based format for our training and evaluation. For converting the OpenSDI dataset to this format, we recommend setting up a separate conda environment as follows:
conda create -n dualsight-data-prep python=3.10.16 -y
conda activate dualsight-data-prep
pip install "packaging==24.1" "pip==24.2" "setuptools==75.1.0" "wheel==0.44.0"
pip install -r requirements_data_prep.txt
Details on how to run the dataset preparation pipeline can be found in the "Data Preperation" section below.
- For training and evaluation we use the WebDataset format and the WebDataset libary
- The WebDataset format files are
.tarfiles, with two conventions:- within each
.tarfile, files that belong together and make up a training sample share the same basename when stripped of all filename extensions - the shards of a
.tarfile are numbered like000000.tarto012345.tar - You can find a longer, more detailed specification of the WebDataset format in the WebDataset Format Specification
- within each
- Ensure that all data is stored in the
data/folder. The folder structure for the different subsets of OpenSDI should be as follows:
data
├── train
│ └── real
│ | ├── 000000.tar
│ | └── ...
│ └── Stable_Diffusion_1.5_Diffusion_Model_Generated
| | ├── 000000.tar
│ | └── ...
│ └── Stable_Diffusion_1.5_Diffusion_Model_Inpainted
| ├── 000000.tar
│ └── ...
├── test
│ └── sd15
│ | ├── shard-size
| | | ├── 000000.tar
│ | | └── ...
│ └── sd2
│ | └── shard-size
| | ├── 000000.tar
│ | └── ...
| └── ...
- The dataset pathing in the config files should be similar to:
DATASET:
...
DATAPATH:
TRAIN:
- data/train/real
- data/train/Stable_Diffusion_1.5_Diffusion_Model_Inpainted
- data/train/Stable_Diffusion_1.5_Diffusion_Model_Generated
VAL:
- data/test/sd15
NUM_SHARDS_TRAIN: ['None', 'None', 'None']
NUM_SHARDS_VAL: ['None']
# Test-Set
TEST:
DATAPATH:
- data/test/sd15
- data/test/sd2
- data/test/sdxl
- data/test/sd3
- data/test/flux
NUM_SHARDS: ['None', 'None', 'None', 'None', 'None']
Each .tar file contains n images along with a corresponding .json label file. Each label entry corresponds to a single image and includes metadata used for forgery detection. Below is a description of each field:
image_id: Unique identifier for the image.file_name: File name of the image (typically .png format).gen_caption: Automatically generated caption describing the full image.gen_caption_cropped: Caption describing the manipulated region (in case of inpainting).ds_label: Ground truth class label (e.g., from ImageNet).crop_bbox: Bounding box used to manipulate (e.g. inpinting roi) the image, in [x, y, width, height] format.fake_type: Description of how the image was generated or manipulated (e.g., diffusion model, GAN, 'real', etc.).ds_license: License of the dataset (e.g., MIT).orig_license: License of the original image sample (if any).dataset: Source dataset name (e.g., OpenSDI).
For easier dataset setup, we recommend running our conversion pipeline as follows:
- Setup a separate conda environment, as detailed in the installation section above.
- Ensure that you have enough storage space for both the original huggingface download (~72GB) and the converted dataset (~270GB).
- Download the ClipCap weights. Refer to their Github repo for downloading the weight files.
- For preparing the OpenSDI train set, run:
python3 process_opensdi_train.py \
--processed_dataset_path=/path/to/data \
--clipcap_model_path=/path/to/clipcap/conceptual_weights.pt \
--shard_n=500 \
- For preparing the OpenSDI test set, run:
python3 process_opensdi_test.py \
--processed_dataset_path=/path/to/data \
--clipcap_model_path=/path/to/clipcap/conceptual_weights.pt \
--shard_n=10000 \
- The train and test code has been tested with CUDA 11.8, Python 3.10, and PyTorch 2.2.0.
- Multi-GPU setups are not supported.
- Train and test configurations and scripts are optimized for usage within a Docker environment.
- The setup and usage instructions for training and testing assume you are running the code inside Docker.
- Tested on a Docker container using 1x Nvidia-H100 GPU.
git clone https://github.com/CeMOS-IS/dualsight
cd dualsight
To build the docker image, run the following
docker build -t dualsight .
Adjust visible GPUs, shared memory size and mounted local directory according to your setup, then run the container.:
docker run -it --name dualsight \
--shm-size 100G --gpus '"device=0"' \
-v /path/to/repo/:/dualsight \
-v /path/to/data/:/dualsight/data \
dualsight bash
| Method | SD1.5 Acc. | SD2.1 Acc. | SDXL Acc. | SD3 Acc. | Flux Acc. | Avg. Acc. |
|---|---|---|---|---|---|---|
| CNNDet | 85.04 | 75.94 | 68.72 | 67.08 | 57.57 | 70.87 |
| GramNet | 80.35 | 76.66 | 70.76 | 70.29 | 63.37 | 72.29 |
| FreqNet | 77.70 | 68.37 | 64.02 | 64.37 | 57.08 | 66.31 |
| NPR | 79.28 | 81.84 | 74.28 | 75.47 | 71.36 | 76.45 |
| UniFD | 77.60 | 81.92 | 74.83 | 75.17 | 69.06 | 75.72 |
| RINE | 90.98 | 88.12 | 78.76 | 76.78 | 67.02 | 80.33 |
| MVSS-Net | 93.65 | 82.33 | 70.42 | 72.13 | 56.78 | 75.06 |
| CAT-Net | 96.15 | 82.46 | 73.34 | 73.61 | 55.26 | 76.16 |
| PSCC-Net | 96.14 | 80.94 | 68.81 | 70.89 | 67.04 | 76.76 |
| ObjectFormer | 75.22 | 72.55 | 62.92 | 62.54 | 58.05 | 66.26 |
| TruFor | 97.73 | 55.62 | 66.41 | 67.51 | 61.62 | 69.78 |
| DeCLIP | 78.31 | 82.77 | 70.55 | 68.40 | 65.61 | 73.13 |
| IML-ViT | 75.73 | 61.19 | 49.95 | 51.25 | 43.62 | 56.35 |
| MaskCLIP | 92.72 | 89.45 | 81.22 | 78.01 | 68.50 | 81.98 |
| DualSight (CLIP224) | 85.00 | 93.07 | 91.72 | 90.99 | 80.97 | 88.35 |
Weights for the CLIP-224 variant of DualSight can be downloaded here.
You can configure all training, evaluation, and model parameters using the provided example configuration file, based on the YACS library.
In the configs/ folder, you will find the configuration file for the CLIP-224 variant of DualSight. To run inference, download the corresponding model checkpoint from the results section and place it in the checkpoints/ directory. To run inference on a folder of test images using a trained DualSight model, use the inference.py script with the following arguments:
python3 inference.py \
path/to/config.yaml \
path/to/weights.pth \
--data path/to/images \
--save_dir output_folder \
--save_file output_file_name
Arguments:
cfg(str, required): Path to the YAML config file.ckpt(str, required): Path to the model checkpoint file (.pth).--data(str, required): Path to the folder containing images for inference.--save_dir(str, default=results/): Directory where predictions will be saved.--save_file(str, default=output.json): Name of the output file (JSON format) storing predictions.
Example:
python3 inference.py configs/example_DualSight_A.yaml checkpoints/dualsight_a.pth --data example/ --save_dir results/ --save_file example.json
This will generate predictions for all images in the example/ folder and save them in results/example.json.
The inference.py script generates a JSON file that contains one entry per input image. Each entry provides classification results and additional metadata. Below is a description of the fields:
[
{
"image_name": "example.png", // Filename of the input image
"cls": "fake", // Predicted class label: 'Real' or 'Fake'
"prob": 0.997, // Confidence of the prediction in percentage (0–100%)
"clip_score": [ // Optional: CLIP similarity scores (only for 'Fake' predictions)
[0.2143, "FAKE-GAN" ],
[0.5227, "FAKE-DIFFUSION"],
[0.1790, "FAKE-VAE"],
[0.0838, "FAKE-OTHERS"]
],
"logit": 0.99748 // Raw model logit value (after sigmoid)
},
...
]In the configs/ folder, you will find the example_config.yaml along with the configuration file for the CLIP-224 variant of DualSight. These configs can be freely modified to suit your needs. To run evaluation, download the corresponding model checkpoint from the results section and place it in the checkpoints/ directory. The evaluation data should be stored in the data/ folder, as described in the "Data Preperation" section.
python3 test.py path/to/config.yaml \
path/to/weights.pth \
--thresh binary_classification_threshold
In the configs/ folder, you will find the example_config.yaml along with the configuration file for the CLIP-224 variant of DualSight. These configs can be freely modified to suit your needs. To train a model on OpenSDI from scratch, run:
python3 train.py path/to/config.yaml
Arguments:
--resume: loads the last checkpoint of the model
This codebase borrows from UniFD, Open-SDI and CLIP-LoRA. The dataset preparation pipeline we use employs ClipCap for image-captioning. Many thanks to the authors of these works for their contribution!
This work was partially funded by the German Federal Agency for Breakthrough Innovation (SPRIN-D).
If you have used DualSight in your research, please cite our work. 🎓
@inproceedings{Abdullah2026DualSight,
title = {DualSight: Learning to Disentangle Artifact and Semantic Features for Detection of Diffusion-Generated Images},
author = {Abdullah, Ahmed and Ebert, Nikolas and Wasenm{\"u}ller, Oliver},
booktitle = {International Conference on Pattern Recognition (ICPR)},
year = {2026},
}