Skip to content

abhikarki/Deepfake_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deepfake Detection: An Artifact-Based Approach

An explainable deepfake detection framework that leverages temporal modeling to overcome limitations of frame-level analysis. This system integrates video preprocessing, facial detection, feature extraction, and CNN-based temporal aggregation to classify videos as real or fake with interpretable outputs.

Problem Statement

Existing deepfake detection systems face two critical limitations:

  1. Frame-level analysis: Most systems treat each frame independently, ignoring temporal relationships that are key to detecting subtle artifacts
  2. Lack of interpretability: Systems function as black-boxes, providing only a verdict without explaining which parts of the video drove the decision

This project aims to address both challenges through temporal consistency analysis and explainability, making it suitable for high-stakes applications like legal verification and investigative journalism.

System Architecture

system_architecture_stage_3

Demo

Video.Project.3.mp4

Performance Results

Dataset

The system is trained and evaluated using the FaceForensics++ (C23 compression) dataset, containing 1,000 original videos and 1,000 deepfake videos generated using the autoencoder-based face-swap technique. All videos were preprocessed through the complete pipeline: FFmpeg decoding at 10 FPS with light denoising, RetinaFace face detection, OpenCV-based cropping and geometric alignment, and ResNeXt-50 feature extraction (final classification head removed, producing 2048-dimensional embeddings per frame). The feature vectors were split at the video level into 80% training, 10% validation, and 10% test sets.

Baseline Comparison & Fairness

To ensure a fair and rigorous evaluation, the proposed temporal CNN was compared against a frame-level MLP baseline. Both models were trained from scratch on identical data with the exact same train/validation/test split, eliminating factors such as preprocessing differences or data distribution variations. This approach differs from using external third-party models and provides a controlled comparison where the only variable is the temporal modeling strategy.

The baseline model (SimpleFrameMLP) takes individual 2048-dimensional frame embeddings as input, passes them through two fully-connected layers (2048→256→1) with ReLU activation, and produces frame-level predictions that are mean-pooled to generate video-level classifications. Videos with deepfake probabilities ≥0.50 were classified as deepfake.

Results

Metric Temporal CNN Baseline (Frame-based)
Accuracy 90.00% 88.00%
Precision 0.8774 0.8958
Recall 0.9300 0.8600
F1-score 0.9029 0.8776
AUC-ROC 0.9374 0.9315
AUC-PR 0.9381 0.9435
False Negative Rate 7.0% 14.0%
False Positive Rate 13.0% 10.0%

Key Finding: The temporal CNN achieves better accuracy (90.00% vs. 88.00%) and significantly better recall (93% vs 86%), reducing false negatives by half. This is critical for high-stakes applications where missed deepfakes have serious consequences.

Metrics Visualization: AUC-ROC and Confusion Matrix

The following graphs demonstrate the discriminative ability of both models:

temp_cnn_roc baseline_roc_pr
Temporal CNN AUC-ROC: 0.9374 Baseline AUC-ROC: 0.9315

The AUC-ROC curves demonstrate that the temporal CNN achieves superior distinction between real and deepfake videos across all classification thresholds. The higher area under the curve (0.9374 vs 0.9315) indicates that the temporal model is more effective at ranking deepfake videos higher than real videos, making it more reliable for threshold-based classification in practical deployments.

temp_cnn_confusion_matrix baseline_confusion_matrix
Temporal CNN Confusion Matrix Baseline Confusion Matrix

The confusion matrices reveal the trade-offs between the two approaches. While the temporal CNN has a slightly higher false positive rate (13% vs 10%), it dramatically reduces false negatives from 14% to 7%. In high-stakes applications such as legal verification and investigative journalism, this trade-off is justified because missing a deepfake carries far greater consequences than falsely flagging a real video for manual review.

Backend Setup

cd backend
pip install -r requirements.txt
python main.py

Frontend Setup

cd frontend
npm install
npm start

Future Enhancements

  • Cross-dataset evaluation on Face2Face, FaceSwap, and other manipulation methods
  • Exploration of LSTM and GRU architectures
  • Threshold calibration to improve precision-recall tradeoff
  • Real-time streaming inference for live video analysis
  • Ensemble methods combining multiple temporal models

Ethical Considerations

  • False Positives & Real-World Harm: System provides confidence scores and flags low-confidence predictions for mandatory human review rather than automatic flagging. By combining automated detection with human oversight, it ensures that no individual is wrongfully accused based solely on machine decisions. The system is designed as a decision support tool, not a replacement for human judgment in high-stakes scenarios.

  • Dataset Bias & Responsible Deployment: The current system is trained on a subset of the FaceForensics++ dataset. Responsible deployment requires rigorous evaluation across diverse datasets and manipulation methods to ensure the system performs equitably and does not exhibit disparate performance across different demographic groups, video qualities, or manipulation methods.

  • Evolving Threat Landscape: Deepfake generation techniques continue to evolve rapidly, particularly with the rise of adversarial methods designed to bypass standard classifiers. No detection system should be treated as a permanent or complete solution. The modular architecture of this system is specifically designed to address this reality; components can be updated independently, so that the system can evolve alongside the manipulation techniques.

  • Dual-Use Implications & Responsible AI: Making this system open-source maximizes its benefit to legitimate users . However, open-source availability also exposes the system's detection logic to adversaries who may seek to defeat it. This dual-use consideration requires that developers and deployers of such systems maintain awareness of potential misuse.

License

MIT

About

An explainable deepfake detection framework that leverages temporal modeling to overcome limitations of frame-level analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors