An end-to-end MLOps project for predicting vehicle insurance cross-sell outcomes — from raw data ingestion to a Dockerized API deployed on AWS EC2 with a full CI/CD pipeline.
This project implements a production-grade machine learning pipeline to predict whether a customer is likely to opt for vehicle insurance. It demonstrates best practices in MLOps: modular code structure, automated training pipelines, cloud model storage, and continuous deployment via GitHub Actions.
MongoDB Atlas (Raw Data)
↓
Data Ingestion → Data Validation → Data Transformation
↓
Model Trainer → Model Evaluation → Model Pusher (AWS S3)
↓
FastAPI Prediction App (Dockerized)
↓
AWS EC2 via GitHub Actions CI/CD
| Category | Tools |
|---|---|
| Language | Python 3.10 |
| ML & Data | Scikit-learn, Pandas, NumPy |
| Database | MongoDB Atlas |
| Cloud Storage | AWS S3 |
| Deployment | AWS EC2, Docker, ECR |
| CI/CD | GitHub Actions (Self-hosted Runner) |
| API | Flask / FastAPI |
| Environment | Conda |
vehicle-insurance-mlops/
├── src/
│ ├── components/ # Pipeline components (ingestion, validation, etc.)
│ ├── configuration/ # MongoDB & AWS connection configs
│ ├── cloud_storage/ # AWS S3 storage utilities
│ ├── data_access/ # Data fetch & transform layer
│ ├── entity/ # Config & artifact entities, estimator
│ ├── pipeline/ # Training pipeline orchestration
│ └── utils/ # Shared utilities
├── config/
│ └── schema.yaml # Dataset schema for validation
├── notebook/ # EDA & feature engineering notebooks
├── static/ & templates/ # Frontend for prediction UI
├── app.py # Application entry point
├── demo.py # Pipeline test runner
├── Dockerfile
├── requirements.txt
├── setup.py
└── pyproject.toml
git clone https://github.com/<your-username>/vehicle-insurance-mlops.git
cd vehicle-insurance-mlopsconda create -n vehicle python=3.10 -y
conda activate vehicle
pip install -r requirements.txtBash:
export MONGODB_URL="mongodb+srv://<username>:<password>@cluster.mongodb.net/"
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"PowerShell:
$env:MONGODB_URL = "mongodb+srv://<username>:<password>@cluster.mongodb.net/"
$env:AWS_ACCESS_KEY_ID = "your-access-key"
$env:AWS_SECRET_ACCESS_KEY = "your-secret-key"python demo.pypython app.py- Create a free M0 cluster
- Add
0.0.0.0/0to Network Access for broad connectivity - Push data to the cluster using
notebook/mongoDB_demo.ipynb
- IAM: Create a user with
AdministratorAccessand generate access keys - S3: Create a bucket named
my-model-mlopsproj(region:us-east-1) for model registry - ECR: Create a repository named
vehicleprojto store Docker images - EC2: Launch an Ubuntu 24.04
t2.mediuminstance with 30GB storage; expose port5080
The pipeline is triggered on every push to the main branch via GitHub Actions.
Workflow:
- Build Docker image
- Push to AWS ECR
- Pull and run on self-hosted EC2 runner
Required GitHub Secrets:
| Secret | Description |
|---|---|
AWS_ACCESS_KEY_ID |
AWS IAM access key |
AWS_SECRET_ACCESS_KEY |
AWS IAM secret key |
AWS_DEFAULT_REGION |
us-east-1 |
ECR_REPO |
ECR repository URI |
| Route | Description |
|---|---|
/ |
Prediction UI |
/training |
Trigger model training |
App is accessible at: http://<EC2-PUBLIC-IP>:5080
| Component | Responsibility |
|---|---|
| Data Ingestion | Fetches data from MongoDB, splits into train/test sets |
| Data Validation | Schema checks against schema.yaml |
| Data Transformation | Feature engineering, preprocessing |
| Model Trainer | Trains and evaluates candidate model |
| Model Evaluation | Compares against production model (threshold: 0.02) |
| Model Pusher | Pushes best model to AWS S3 registry |
This project is for educational and portfolio purposes.