A reproducible, end-to-end pipeline for training and evaluating a fraud detection model on financial transactions. The project is designed to be easy to adapt to different datasets by defining a clear data contract (via YAML), keeping preprocessing consistent, and tracking experiments.
- Loads a transaction dataset (CSV/Parquet)
- Validates the dataset using a schema/data contract
- Builds baseline + ML features
- Trains one or more fraud detection models
- Evaluates performance with fraud-appropriate metrics (PR-AUC, recall at fixed precision, etc.)
- Saves artifacts (models, metrics, plots) for reproducibility
.
├── configs/
│ └── dataset_schema.yaml
├── data/
│ ├── raw/ # raw input files (not committed)
│ └── processed/ # cleaned/feature-ready data
├── notebooks/ # exploration / sanity checks
├── src/
│ ├── data/ # loading + validation + splits
│ ├── features/ # feature engineering
│ ├── models/ # training + inference
│ ├── evaluation/ # metrics + plots
│ └── utils/ # helpers (logging, seeding, paths)
├── artifacts/
│ ├── models/
│ └── reports/
├── tests/
├── requirements.txt
├── README.md
└── .gitignore
This project expects a dataset with a binary target label (e.g., is_fraud) and a timestamped transaction history.
Example: configs/dataset_schema.yaml
target: is_fraud
required_columns:
- transaction_id
- timestamp
- amount
- is_fraud
optional_columns:
- customer_id
- merchant_id
- channel
- country
- city
- device_id
timestamp_format: "auto" # parse with pandas
split_strategy:
type: time
train_end: "2020-09-30"
val_end: "2020-11-30"
test_end: "2020-12-31"Notes:
- Time-based splits are recommended to reduce leakage (train on past → test on future).
- If you don’t have a timestamp, use a random split but document the risk.
python -m venv .venv
# Windows:
.venv\Scripts\activate
# Mac/Linux:
source .venv/bin/activatepip install -r requirements.txtExample:
data/raw/transactions.csv
python -m src.data.make_dataset \
--input data/raw/transactions.csv \
--schema configs/dataset_schema.yaml \
--out data/processed/transactions.parquetpython -m src.models.train \
--data data/processed/transactions.parquet \
--schema configs/dataset_schema.yaml \
--out artifacts/models/python -m src.evaluation.evaluate \
--data data/processed/transactions.parquet \
--model artifacts/models/model.pkl \
--out artifacts/reports/Fraud detection is typically imbalanced, so accuracy is not useful by itself. Recommended metrics:
- PR-AUC (Average Precision)
- Recall at fixed precision (e.g., recall when precision ≥ 90%)
- Precision@K / Recall@K (top K alerts)
- Confusion matrix at an operational threshold
- Optional: cost-based evaluation (false positives vs false negatives)
-
Fix random seeds in training and data splits
-
Log:
- dataset version/hash
- schema used
- feature set version
- hyperparameters
- metrics and plots
-
Save artifacts to
artifacts/(notebooks should be optional, not required)
- Avoid label leakage (features that directly encode the target or post-event signals).
- Consider fairness and disparate impact if attributes correlate with protected classes.
- Treat this repo as a prototype unless it has been validated with production constraints (latency, drift, monitoring, auditability).
- Add a simple baseline (logistic regression) + stronger model (LightGBM/XGBoost)
- Add feature store-style pipeline (consistent train/serve features)
- Add threshold selection aligned to ops goals (precision target, alert budget)
- Add model monitoring plan (drift, performance decay, data quality)
- Add explainability reports (global + per-transaction)
Choose one:
- MIT (open, permissive)
Maintainer: Nafisat Ibrahim