Skip to content

RomeoCorrec/Fake_image_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fake Image Detection

Forensic ensemble detector for AI-generated images, combining three independent analysis pipelines with a Gradio interface that renders the evidence visually.

Built as an academic project exploring what actually distinguishes modern diffusion-model outputs (FLUX, SDXL, Midjourney) from real photographs — at the signal level, not just perceptually.


Results

Evaluated on 400 balanced images (200 real · 200 AI) from Parveshiiii/AI-vs-Real:

Method AUC-ROC F1
FFT spectral curvature 0.785 0.667
SRM noise residual kurtosis 0.584 0.663
Frequency ensemble (FFT + SRM) 0.753 0.667
Semantic (vanishing points + shadow) 0.456 0.443
DINOv2 linear probe 0.998 0.975

Architecture

Three independent pipelines, each grounded in a different forensic hypothesis:

Image ──→ FFT radial spectrum ──┐
      ──→ SRM noise residual   ─┼──→ Frequency score  ──┐
                                                          │
      ──→ Vanishing points     ─┐                        ├──→ Synthesis
      ──→ Shadow direction     ─┴──→ Semantic score   ──┤
                                                          │
      ──→ DINOv2-Small (frozen)────→ Learned score    ──┘

Frequency analysis · src/frequency/

FFT radial spectrum (fft_analysis.py) — computes the 1D radial average of the 2D log-magnitude FFT, then measures spectral curvature (mean squared second derivative of the normalized spectrum). Modern diffusion models produce smoother spectra than real photographs: real images carry sensor noise, optical aberrations, and JPEG compression texture that all raise curvature. Low curvature → higher AI probability.

This is the inverse of the classical GAN-era result, where periodic upsampling artifacts created peaks in the radial spectrum. Diffusion models have no such artifact; calibration direction depends on the generative family.

SRM residuals (srm_residuals.py) — subtracts a Gaussian-blurred copy of the image to isolate the noise layer, then computes the kurtosis of the residual distribution. On this dataset the AI images show higher kurtosis (more structured residuals) — empirically reversing the original "camera noise is spikier" hypothesis, likely because the real images are web/stock photos with heavy JPEG normalization.

Semantic analysis · src/semantic/

Vanishing points (vanishing_points.py) — Canny edges → Hough line transform → pairwise line intersections → 2D histogram clustering over a ±2× image-size grid. Real perspective scenes typically produce 1–6 coherent VP clusters; AI images often yield either zero convergence (many lines, no shared structure) or 7+ scattered clusters indicating incoherent geometry.

Shadow direction (shadow_direction.py) — isolates the top-20% gradient orientations by magnitude, then applies circular statistics (mean resultant length R, circular standard deviation). A tight angular distribution (std < 0.5 rad, ~29°) signals a single dominant light source and is a strong indicator of a real image. Ambiguous or complex scenes return a neutral score (0.5) to avoid false positives.

Visual overlays (visualizer.py) — renders VP lines + clusters on the image, draws a shadow direction arrow color-coded by consistency, and produces a false-colour SRM heatmap using COLORMAP_INFERNO.

DINOv2 linear probe · src/classifier/

facebook/dinov2-small is used as a frozen feature extractor — no fine-tuning, weights unchanged. Its 384-dimensional CLS token is extracted for each image and fed to a scikit-learn LogisticRegression. This is a standard linear probe: DINOv2's pretrained visual representations are rich enough to near-perfectly separate real from AI images with a single linear boundary.


Interface

python -m src.app

Opens at http://127.0.0.1:7860 — four tabs:

Tab Output
Frequency FFT radial spectrum plot + SRM false-colour heatmap
Semantic Vanishing point overlay + shadow direction arrow
DINOv2 Classifier score with confidence (P(AI-generated))
Synthesis Average across all available methods

DINOv2 degrades gracefully: if models/dinov2_head.pkl is absent, that tab reports "model not loaded" and the synthesis skips it.


Setup

pip install -r requirements.txt

Python 3.10+ · PyTorch · HuggingFace Transformers · OpenCV · Gradio 4+ · scikit-learn · SciPy


Reproducing the dataset

Downloads and balances Parveshiiii/AI-vs-Real (MIT license). Raw distribution: 3,333 AI / 10,666 real — the script auto-detects the minority class, balances both to 3,333 images, and creates an 80/20 train/test split.

python scripts/build_dataset.py

Output: data/train/{real,ai}/ and data/test/{real,ai}/


Training the DINOv2 head

Upload data/train/ to Google Drive and open notebooks/train_dinov2_colab.ipynb on a free Colab T4. The notebook:

  1. Extracts DINOv2-Small features for all training images (~5 min on GPU)
  2. Fits a LogisticRegression on the 384-dim features (~5 s)
  3. Saves models/dinov2_head.pkl

Feature extraction on CPU takes ~1 hour for 2,666 images; the classifier fit itself is instantaneous.


Evaluation

python -m src.evaluate --real data/test/real --ai data/test/ai --max 200

Reports AUC-ROC and F1 for each method individually and for the frequency and semantic ensembles. Runs DINOv2 automatically if models/dinov2_head.pkl exists.


Key findings

The frequency scoring direction is inverted for diffusion models. Classical forensic literature (McNally, Frank-Wilson) targets GAN upsampling artifacts, which create spectral peaks. Diffusion models produce no such peaks — instead, they are smoother than real photos at the signal level. The correct hypothesis for this era is: low spectral curvature → AI. Both FFT and SRM scores were inverted during empirical calibration on this dataset.

Semantic methods are conservative by design. Vanishing point and shadow consistency analysis return neutral scores on complex, multi-plane scenes to avoid false positives. This limits their discriminative power on a diverse dataset (AUC ~0.46) but makes them interpretable and reliable for controlled inputs such as portraits or architectural shots.

DINOv2 makes the ensemble moot for raw accuracy. AUC 0.998 is essentially perfect on this dataset. The frequency and semantic methods contribute something the learned model cannot: visible reasoning — the user can see which lines converge, where the shadow points, and how the spectrum looks.


Project structure

src/
  frequency/          # FFT radial spectrum + SRM noise residuals
  semantic/           # vanishing points, shadow direction, visual overlays
  classifier/         # DINOv2 feature extractor + LogisticRegression predictor
  app.py              # Gradio 4-tab interface
  evaluate.py         # evaluation CLI (AUC-ROC + F1)
scripts/
  build_dataset.py    # HuggingFace download + class balancing
notebooks/
  train_dinov2_colab.ipynb
models/               # trained LogisticRegression head (gitignored)
data/                 # train/ and test/ image splits (gitignored)
tests/                # 13 pytest unit tests

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors