A full end-to-end machine learning pipeline for detecting fraudulent credit card transactions using anomaly detection (Isolation Forest, LOF) and supervised learning (XGBoost), served via an interactive Streamlit dashboard.
fraud-detection/
│
├── data/ # Raw and processed datasets
│ └── .gitkeep
│
├── models/ # Saved model artifacts
│ └── .gitkeep
│
├── notebooks/
│ └── fraud_detection.ipynb # Full exploratory + training notebook
│
├── src/
│ ├── __init__.py
│ ├── data_loader.py # Dataset download & loading
│ ├── preprocessor.py # Scaling, SMOTE balancing
│ ├── anomaly_detection.py # Isolation Forest & LOF
│ ├── classifier.py # XGBoost training & evaluation
│ ├── visualizer.py # ROC curve, confusion matrix plots
│ └── predictor.py # Inference on new transactions
│
├── tests/
│ └── test_pipeline.py # Unit tests
│
├── app.py # Streamlit dashboard entry point
├── train.py # Full training pipeline script
├── requirements.txt
└── README.md
pip install -r requirements.txtGet the dataset from Kaggle: Credit Card Fraud Detection
Place creditcard.csv in the data/ directory.
Or use the Kaggle API:
pip install kaggle kaggle datasets download -d mlg-ulb/creditcardfraud -p data/ --unzip
python train.pystreamlit run app.py- Source: Kaggle - ULB Credit Card Fraud
- Size: 284,807 transactions
- Fraud Rate: ~0.17% (highly imbalanced)
- Features: 30 (V1–V28 PCA features + Time + Amount)
| Model | Type | Purpose |
|---|---|---|
| Isolation Forest | Unsupervised | Anomaly detection |
| Local Outlier Factor | Unsupervised | Anomaly detection |
| XGBoost Classifier | Supervised | Final classification |
- SMOTE (Synthetic Minority Oversampling Technique) is applied on the training set
- StandardScaler is used to normalize
AmountandTime - Stratified train/test split preserves fraud ratio
- Confusion Matrix
- ROC-AUC Curve
- Precision, Recall, F1-Score
- Average Precision Score
- Upload or enter transaction data manually
- Real-time fraud probability prediction
- Anomaly score visualization
- Model performance comparison charts
- Interactive ROC curve
python -m pytest tests/See requirements.txt for full list. Key packages:
pandas,numpyscikit-learnxgboostimbalanced-learn(SMOTE)streamlitmatplotlib,seaborn,plotlyjoblib