Skip to content

knownas-chayan/Fraud_Detection_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💳 Credit Card Fraud Detection

Python Scikit-Learn XGBoost Streamlit

A full end-to-end machine learning pipeline for detecting fraudulent credit card transactions using anomaly detection (Isolation Forest, LOF) and supervised learning (XGBoost), served via an interactive Streamlit dashboard.


📁 Project Structure

fraud-detection/
│
├── data/                        # Raw and processed datasets
│   └── .gitkeep
│
├── models/                      # Saved model artifacts
│   └── .gitkeep
│
├── notebooks/
│   └── fraud_detection.ipynb    # Full exploratory + training notebook
│
├── src/
│   ├── __init__.py
│   ├── data_loader.py           # Dataset download & loading
│   ├── preprocessor.py          # Scaling, SMOTE balancing
│   ├── anomaly_detection.py     # Isolation Forest & LOF
│   ├── classifier.py            # XGBoost training & evaluation
│   ├── visualizer.py            # ROC curve, confusion matrix plots
│   └── predictor.py             # Inference on new transactions
│
├── tests/
│   └── test_pipeline.py         # Unit tests
│
├── app.py                       # Streamlit dashboard entry point
├── train.py                     # Full training pipeline script
├── requirements.txt
└── README.md

🚀 Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Download the Dataset

Get the dataset from Kaggle: Credit Card Fraud Detection

Place creditcard.csv in the data/ directory.

Or use the Kaggle API:

pip install kaggle
kaggle datasets download -d mlg-ulb/creditcardfraud -p data/ --unzip

3. Train the Models

python train.py

6. Launch the Dashboard

streamlit run app.py

📊 Dataset

  • Source: Kaggle - ULB Credit Card Fraud
  • Size: 284,807 transactions
  • Fraud Rate: ~0.17% (highly imbalanced)
  • Features: 30 (V1–V28 PCA features + Time + Amount)

🧠 Models Used

Model Type Purpose
Isolation Forest Unsupervised Anomaly detection
Local Outlier Factor Unsupervised Anomaly detection
XGBoost Classifier Supervised Final classification

⚖️ Handling Class Imbalance

  • SMOTE (Synthetic Minority Oversampling Technique) is applied on the training set
  • StandardScaler is used to normalize Amount and Time
  • Stratified train/test split preserves fraud ratio

📈 Evaluation Metrics

  • Confusion Matrix
  • ROC-AUC Curve
  • Precision, Recall, F1-Score
  • Average Precision Score

🖥️ Dashboard Features

  • Upload or enter transaction data manually
  • Real-time fraud probability prediction
  • Anomaly score visualization
  • Model performance comparison charts
  • Interactive ROC curve

🧪 Running Tests

python -m pytest tests/

📦 Requirements

See requirements.txt for full list. Key packages:

  • pandas, numpy
  • scikit-learn
  • xgboost
  • imbalanced-learn (SMOTE)
  • streamlit
  • matplotlib, seaborn, plotly
  • joblib

About

End-to-end credit card fraud detection pipeline — Isolation Forest, LOF & XGBoost with SMOTE balancing — served via an interactive Streamlit dashboard.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors