A desktop-based machine learning application for fraud detection using multiple ML models. Built with Python Tkinter and integrated with MLflow for experiment tracking, model logging, and performance visualization.
-
Train multiple ML models:
- Decision Tree
- Random Forest
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
-
Full data preprocessing pipeline:
- Feature engineering (age extraction from DOB)
- Label encoding for categorical variables
- Feature scaling (StandardScaler)
- Class balancing using upsampling
-
Model evaluation metrics:
- Accuracy
- Precision
- Recall
- F1-score
- ROC-AUC
- Confusion Matrix
- Classification Report
-
MLflow integration:
- Experiment tracking
- Parameter logging
- Metric logging
- Model saving
-
Interactive GUI:
- Load train/test CSV files
- Select models to train
- Real-time training logs
- Progress bar
- Performance charts
- Confusion matrix visualization
- Best model selection
Install dependencies:
pip install pandas numpy scikit-learn matplotlib mlflowNote: tkinter is included with Python by default.
python fraud_detection_app.pyThe dataset must include:
is_fraud (0 = Legit, 1 = Fraud)
Supported features:
- Transaction-related fields
- Categorical variables
- Date of birth (dob)
- Metadata fields
Automatically removed columns:
- Unnamed: 0, trans_num, first, last, street, cc_num, trans_date_trans_time
Default tracking database: sqlite:///fraud_mlflow.db
You can change:
- Experiment name
- Tracking URI
directly from the GUI.
The application provides:
- Model comparison dashboard
- Performance charts
- Confusion matrices
- Classification reports
- Best model selection (based on F1-score)
- MLflow experiment logs
- Load training and test datasets
- Preprocess data (cleaning, encoding, scaling)
- Balance dataset (fraud upsampling)
- Train selected models
- Evaluate performance
- Log results to MLflow
- Visualize results in GUI
- Decision Tree Classifier
- Random Forest Classifier
- K-Nearest Neighbors
- Support Vector Machine (RBF kernel)
fraud_detection_app.py
- Deep learning models
- Hyperparameter tuning (GridSearch / Optuna)
- REST API deployment (Flask/FastAPI)
- Real-time fraud prediction
- Feature importance visualization
- Python
- Tkinter
- Scikit-learn
- Matplotlib
- MLflow
This project is open-source and free to use for educational and research purposes.