Skip to content

CeMOS-IS/dualsight

Repository files navigation

DualSight: Learning to Disentangle Artifact and Semantic Features for Detection of Diffusion-Generated Images

Official repository of the ICPR26 paper "DualSight: Learning to Disentangle Artifact and Semantic Features for Detection of Diffusion-Generated Images"

Ahmed Abdullah, Nikolas Ebert & Oliver Wasenmüller
CeMOS - Research and Transfer Center, Technische Hochschule Mannheim

 
 

This repository includes everything needed to evaluate, train and build on DualSight.

Installation (Dataset Preparation)

We use a custom, webdataset-based format for our training and evaluation. For converting the OpenSDI dataset to this format, we recommend setting up a separate conda environment as follows:

conda create -n dualsight-data-prep python=3.10.16 -y
conda activate dualsight-data-prep
pip install "packaging==24.1" "pip==24.2" "setuptools==75.1.0" "wheel==0.44.0"
pip install -r requirements_data_prep.txt 

Details on how to run the dataset preparation pipeline can be found in the "Data Preperation" section below.

Data Preparation

  • For training and evaluation we use the WebDataset format and the WebDataset libary
  • The WebDataset format files are .tar files, with two conventions:
    • within each .tar file, files that belong together and make up a training sample share the same basename when stripped of all filename extensions
    • the shards of a .tar file are numbered like 000000.tar to 012345.tar
    • You can find a longer, more detailed specification of the WebDataset format in the WebDataset Format Specification
  • Ensure that all data is stored in the data/ folder. The folder structure for the different subsets of OpenSDI should be as follows:
data
├── train
│   └── real
│   |   ├── 000000.tar
│   |   └── ...
│   └── Stable_Diffusion_1.5_Diffusion_Model_Generated
|   |   ├── 000000.tar
│   |   └── ...
│   └── Stable_Diffusion_1.5_Diffusion_Model_Inpainted
|       ├── 000000.tar
│       └── ...
├── test
│   └── sd15
│   |   ├── shard-size
|   |   |   ├── 000000.tar
│   |   |   └── ...
│   └── sd2
│   |   └── shard-size
|   |       ├── 000000.tar
│   |       └── ...
|   └── ...
  • The dataset pathing in the config files should be similar to:
DATASET:
  ...
  DATAPATH:
    TRAIN:
      - data/train/real
      - data/train/Stable_Diffusion_1.5_Diffusion_Model_Inpainted
      - data/train/Stable_Diffusion_1.5_Diffusion_Model_Generated
    VAL: 
      - data/test/sd15 
  NUM_SHARDS_TRAIN: ['None', 'None', 'None'] 
  NUM_SHARDS_VAL: ['None']
  
# Test-Set
TEST:
  DATAPATH:
      - data/test/sd15
      - data/test/sd2
      - data/test/sdxl
      - data/test/sd3
      - data/test/flux
  NUM_SHARDS: ['None', 'None', 'None', 'None', 'None']

How to Convert

Each .tar file contains n images along with a corresponding .json label file. Each label entry corresponds to a single image and includes metadata used for forgery detection. Below is a description of each field:

  • image_id: Unique identifier for the image.
  • file_name: File name of the image (typically .png format).
  • gen_caption: Automatically generated caption describing the full image.
  • gen_caption_cropped: Caption describing the manipulated region (in case of inpainting).
  • ds_label: Ground truth class label (e.g., from ImageNet).
  • crop_bbox: Bounding box used to manipulate (e.g. inpinting roi) the image, in [x, y, width, height] format.
  • fake_type: Description of how the image was generated or manipulated (e.g., diffusion model, GAN, 'real', etc.).
  • ds_license: License of the dataset (e.g., MIT).
  • orig_license: License of the original image sample (if any).
  • dataset: Source dataset name (e.g., OpenSDI).

For easier dataset setup, we recommend running our conversion pipeline as follows:

  • Setup a separate conda environment, as detailed in the installation section above.
  • Ensure that you have enough storage space for both the original huggingface download (~72GB) and the converted dataset (~270GB).
  • Download the ClipCap weights. Refer to their Github repo for downloading the weight files.
  • For preparing the OpenSDI train set, run:
python3 process_opensdi_train.py \
--processed_dataset_path=/path/to/data \
--clipcap_model_path=/path/to/clipcap/conceptual_weights.pt  \
--shard_n=500 \
  • For preparing the OpenSDI test set, run:
python3 process_opensdi_test.py \
--processed_dataset_path=/path/to/data \
--clipcap_model_path=/path/to/clipcap/conceptual_weights.pt  \
--shard_n=10000 \

Installation (Training and Testing)

  • The train and test code has been tested with CUDA 11.8, Python 3.10, and PyTorch 2.2.0.
  • Multi-GPU setups are not supported.
  • Train and test configurations and scripts are optimized for usage within a Docker environment.
  • The setup and usage instructions for training and testing assume you are running the code inside Docker.
  • Tested on a Docker container using 1x Nvidia-H100 GPU.

Clone this Repo

git clone https://github.com/CeMOS-IS/dualsight
cd dualsight

Docker Set-Up

To build the docker image, run the following

docker build -t dualsight .

Adjust visible GPUs, shared memory size and mounted local directory according to your setup, then run the container.:

docker run -it --name dualsight \
    --shm-size 100G --gpus '"device=0"' \
    -v /path/to/repo/:/dualsight \
    -v /path/to/data/:/dualsight/data \
    dualsight bash

Results and Models

Results OpenSDI

Method SD1.5 Acc. SD2.1 Acc. SDXL Acc. SD3 Acc. Flux Acc. Avg. Acc.
CNNDet 85.04 75.94 68.72 67.08 57.57 70.87
GramNet 80.35 76.66 70.76 70.29 63.37 72.29
FreqNet 77.70 68.37 64.02 64.37 57.08 66.31
NPR 79.28 81.84 74.28 75.47 71.36 76.45
UniFD 77.60 81.92 74.83 75.17 69.06 75.72
RINE 90.98 88.12 78.76 76.78 67.02 80.33
MVSS-Net 93.65 82.33 70.42 72.13 56.78 75.06
CAT-Net 96.15 82.46 73.34 73.61 55.26 76.16
PSCC-Net 96.14 80.94 68.81 70.89 67.04 76.76
ObjectFormer 75.22 72.55 62.92 62.54 58.05 66.26
TruFor 97.73 55.62 66.41 67.51 61.62 69.78
DeCLIP 78.31 82.77 70.55 68.40 65.61 73.13
IML-ViT 75.73 61.19 49.95 51.25 43.62 56.35
MaskCLIP 92.72 89.45 81.22 78.01 68.50 81.98
DualSight (CLIP224) 85.00 93.07 91.72 90.99 80.97 88.35

Weights for the CLIP-224 variant of DualSight can be downloaded here.

Configuration

You can configure all training, evaluation, and model parameters using the provided example configuration file, based on the YACS library.

Inference

In the configs/ folder, you will find the configuration file for the CLIP-224 variant of DualSight. To run inference, download the corresponding model checkpoint from the results section and place it in the checkpoints/ directory. To run inference on a folder of test images using a trained DualSight model, use the inference.py script with the following arguments:

python3 inference.py \
    path/to/config.yaml \
    path/to/weights.pth \ 
    --data path/to/images \
    --save_dir output_folder \
    --save_file output_file_name

Arguments:

  • cfg (str, required): Path to the YAML config file.
  • ckpt (str, required): Path to the model checkpoint file (.pth).
  • --data (str, required): Path to the folder containing images for inference.
  • --save_dir (str, default=results/): Directory where predictions will be saved.
  • --save_file (str, default=output.json): Name of the output file (JSON format) storing predictions.

Example:

python3 inference.py configs/example_DualSight_A.yaml checkpoints/dualsight_a.pth --data example/ --save_dir results/ --save_file example.json

This will generate predictions for all images in the example/ folder and save them in results/example.json.

The inference.py script generates a JSON file that contains one entry per input image. Each entry provides classification results and additional metadata. Below is a description of the fields:

[
  {
    "image_name": "example.png",         // Filename of the input image
    "cls": "fake",                       // Predicted class label: 'Real' or 'Fake'
    "prob": 0.997,                        // Confidence of the prediction in percentage (0–100%)
    "clip_score": [                      // Optional: CLIP similarity scores (only for 'Fake' predictions)
        [0.2143, "FAKE-GAN" ],
        [0.5227, "FAKE-DIFFUSION"],
        [0.1790, "FAKE-VAE"],
        [0.0838, "FAKE-OTHERS"]
    ],
    "logit": 0.99748                     // Raw model logit value (after sigmoid)
  },
  ...
]

Evaluation

In the configs/ folder, you will find the example_config.yaml along with the configuration file for the CLIP-224 variant of DualSight. These configs can be freely modified to suit your needs. To run evaluation, download the corresponding model checkpoint from the results section and place it in the checkpoints/ directory. The evaluation data should be stored in the data/ folder, as described in the "Data Preperation" section.

python3 test.py path/to/config.yaml \
    path/to/weights.pth \
    --thresh binary_classification_threshold

Training

In the configs/ folder, you will find the example_config.yaml along with the configuration file for the CLIP-224 variant of DualSight. These configs can be freely modified to suit your needs. To train a model on OpenSDI from scratch, run:

python3 train.py path/to/config.yaml

Arguments:

  • --resume: loads the last checkpoint of the model

Acknowledgement

This codebase borrows from UniFD, Open-SDI and CLIP-LoRA. The dataset preparation pipeline we use employs ClipCap for image-captioning. Many thanks to the authors of these works for their contribution!

This work was partially funded by the German Federal Agency for Breakthrough Innovation (SPRIN-D).

Citing

If you have used DualSight in your research, please cite our work. 🎓

@inproceedings{Abdullah2026DualSight,
    title = {DualSight: Learning to Disentangle Artifact and Semantic Features for Detection of Diffusion-Generated Images},
    author = {Abdullah, Ahmed and Ebert, Nikolas and Wasenm{\"u}ller, Oliver},
    booktitle = {International Conference on Pattern Recognition (ICPR)},
    year = {2026},
}

About

[ICPR 2026] Official repository of the paper "DualSight: Learning to Disentangle Artifact and Semantic Features for Detection of Diffusion-Generated Images"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors