Distributed Fraud Detection Pipeline

End-to-end credit card fraud detection pipeline using the Kaggle Credit Card Fraud Dataset — 284,807 transactions, 492 fraudulent (0.17% fraud rate).

Architecture

creditcard.csv → PySpark Preprocessing → Feature Engineering → Model Training → FastAPI Service → Docker
                                                                      │
                                                         ┌────────────┴────────────┐
                                                         │  Logistic Regression     │
                                                         │  XGBoost                 │
                                                         │  PyTorch MLP             │
                                                         └─────────────────────────┘

Predictions are logged to PostgreSQL via SQLAlchemy. API request/response schemas are validated with Pydantic.

Stack

Preprocessing: PySpark
Modeling: scikit-learn, XGBoost, PyTorch
Class imbalance: SMOTE + undersampling (imbalanced-learn)
Serving: FastAPI + Uvicorn
Storage: PostgreSQL + SQLAlchemy
Evaluation: Precision, Recall, F1, AUC-ROC (accuracy is misleading at 0.17% fraud rate)

Setup

# Download the dataset from Kaggle and place it at data/creditcard.csv

python -m venv venv
source venv/bin/activate
pip install -r files/requirements.txt

Usage

# EDA
python files/01_eda.py

# Modeling
python files/02_modeling.py

# Or run as notebooks
jupyter notebook

Key Findings (EDA)

Features V14, V17, V12, and V10 are the strongest fraud discriminators based on distribution separation between fraud and non-fraud classes.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
files		files
notebooks		notebooks
venv		venv
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Fraud Detection Pipeline

Architecture

Stack

Setup

Usage

Key Findings (EDA)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Distributed Fraud Detection Pipeline

Architecture

Stack

Setup

Usage

Key Findings (EDA)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages