Forensic ensemble detector for AI-generated images, combining three independent analysis pipelines with a Gradio interface that renders the evidence visually.
Built as an academic project exploring what actually distinguishes modern diffusion-model outputs (FLUX, SDXL, Midjourney) from real photographs — at the signal level, not just perceptually.
Evaluated on 400 balanced images (200 real · 200 AI) from Parveshiiii/AI-vs-Real:
| Method | AUC-ROC | F1 |
|---|---|---|
| FFT spectral curvature | 0.785 | 0.667 |
| SRM noise residual kurtosis | 0.584 | 0.663 |
| Frequency ensemble (FFT + SRM) | 0.753 | 0.667 |
| Semantic (vanishing points + shadow) | 0.456 | 0.443 |
| DINOv2 linear probe | 0.998 | 0.975 |
Three independent pipelines, each grounded in a different forensic hypothesis:
Image ──→ FFT radial spectrum ──┐
──→ SRM noise residual ─┼──→ Frequency score ──┐
│
──→ Vanishing points ─┐ ├──→ Synthesis
──→ Shadow direction ─┴──→ Semantic score ──┤
│
──→ DINOv2-Small (frozen)────→ Learned score ──┘
FFT radial spectrum (fft_analysis.py) — computes the 1D radial average of the 2D log-magnitude FFT, then measures spectral curvature (mean squared second derivative of the normalized spectrum). Modern diffusion models produce smoother spectra than real photographs: real images carry sensor noise, optical aberrations, and JPEG compression texture that all raise curvature. Low curvature → higher AI probability.
This is the inverse of the classical GAN-era result, where periodic upsampling artifacts created peaks in the radial spectrum. Diffusion models have no such artifact; calibration direction depends on the generative family.
SRM residuals (srm_residuals.py) — subtracts a Gaussian-blurred copy of the image to isolate the noise layer, then computes the kurtosis of the residual distribution. On this dataset the AI images show higher kurtosis (more structured residuals) — empirically reversing the original "camera noise is spikier" hypothesis, likely because the real images are web/stock photos with heavy JPEG normalization.
Vanishing points (vanishing_points.py) — Canny edges → Hough line transform → pairwise line intersections → 2D histogram clustering over a ±2× image-size grid. Real perspective scenes typically produce 1–6 coherent VP clusters; AI images often yield either zero convergence (many lines, no shared structure) or 7+ scattered clusters indicating incoherent geometry.
Shadow direction (shadow_direction.py) — isolates the top-20% gradient orientations by magnitude, then applies circular statistics (mean resultant length R, circular standard deviation). A tight angular distribution (std < 0.5 rad, ~29°) signals a single dominant light source and is a strong indicator of a real image. Ambiguous or complex scenes return a neutral score (0.5) to avoid false positives.
Visual overlays (visualizer.py) — renders VP lines + clusters on the image, draws a shadow direction arrow color-coded by consistency, and produces a false-colour SRM heatmap using COLORMAP_INFERNO.
facebook/dinov2-small is used as a frozen feature extractor — no fine-tuning, weights unchanged. Its 384-dimensional CLS token is extracted for each image and fed to a scikit-learn LogisticRegression. This is a standard linear probe: DINOv2's pretrained visual representations are rich enough to near-perfectly separate real from AI images with a single linear boundary.
python -m src.appOpens at http://127.0.0.1:7860 — four tabs:
| Tab | Output |
|---|---|
| Frequency | FFT radial spectrum plot + SRM false-colour heatmap |
| Semantic | Vanishing point overlay + shadow direction arrow |
| DINOv2 | Classifier score with confidence (P(AI-generated)) |
| Synthesis | Average across all available methods |
DINOv2 degrades gracefully: if models/dinov2_head.pkl is absent, that tab reports "model not loaded" and the synthesis skips it.
pip install -r requirements.txtPython 3.10+ · PyTorch · HuggingFace Transformers · OpenCV · Gradio 4+ · scikit-learn · SciPy
Downloads and balances Parveshiiii/AI-vs-Real (MIT license). Raw distribution: 3,333 AI / 10,666 real — the script auto-detects the minority class, balances both to 3,333 images, and creates an 80/20 train/test split.
python scripts/build_dataset.pyOutput: data/train/{real,ai}/ and data/test/{real,ai}/
Upload data/train/ to Google Drive and open notebooks/train_dinov2_colab.ipynb on a free Colab T4. The notebook:
- Extracts DINOv2-Small features for all training images (~5 min on GPU)
- Fits a
LogisticRegressionon the 384-dim features (~5 s) - Saves
models/dinov2_head.pkl
Feature extraction on CPU takes ~1 hour for 2,666 images; the classifier fit itself is instantaneous.
python -m src.evaluate --real data/test/real --ai data/test/ai --max 200Reports AUC-ROC and F1 for each method individually and for the frequency and semantic ensembles. Runs DINOv2 automatically if models/dinov2_head.pkl exists.
The frequency scoring direction is inverted for diffusion models. Classical forensic literature (McNally, Frank-Wilson) targets GAN upsampling artifacts, which create spectral peaks. Diffusion models produce no such peaks — instead, they are smoother than real photos at the signal level. The correct hypothesis for this era is: low spectral curvature → AI. Both FFT and SRM scores were inverted during empirical calibration on this dataset.
Semantic methods are conservative by design. Vanishing point and shadow consistency analysis return neutral scores on complex, multi-plane scenes to avoid false positives. This limits their discriminative power on a diverse dataset (AUC ~0.46) but makes them interpretable and reliable for controlled inputs such as portraits or architectural shots.
DINOv2 makes the ensemble moot for raw accuracy. AUC 0.998 is essentially perfect on this dataset. The frequency and semantic methods contribute something the learned model cannot: visible reasoning — the user can see which lines converge, where the shadow points, and how the spectrum looks.
src/
frequency/ # FFT radial spectrum + SRM noise residuals
semantic/ # vanishing points, shadow direction, visual overlays
classifier/ # DINOv2 feature extractor + LogisticRegression predictor
app.py # Gradio 4-tab interface
evaluate.py # evaluation CLI (AUC-ROC + F1)
scripts/
build_dataset.py # HuggingFace download + class balancing
notebooks/
train_dinov2_colab.ipynb
models/ # trained LogisticRegression head (gitignored)
data/ # train/ and test/ image splits (gitignored)
tests/ # 13 pytest unit tests