An end-to-end machine learning system for predicting patient churn risk in healthcare settings — enabling proactive retention before disengagement occurs.
Live App → patientchurnpredictor.streamlit.app
Healthcare providers lose patients silently — missed appointments escalate into full disengagement. This system scores individual patients on their likelihood to churn using behavioral, clinical, financial, and satisfaction signals, then surfaces actionable retention recommendations.
Input a patient profile → get a real-time churn probability + risk tier + intervention guidance.
- Risk Assessment — Real-time churn probability score with Low / Medium / High risk classification
- Engineered Metrics — Engagement Score, Cost-Per-Visit, Satisfaction Average, and Visit Frequency computed from raw inputs
- Detailed Analysis — Feature-level breakdown showing which factors are driving risk for each patient
- Intervention Guide — Actionable retention recommendations based on risk profile
- Batch Prediction — Upload a CSV of patients and get churn scores across a full cohort
| Category | Features |
|---|---|
| Demographics | Age, Gender, State |
| Clinical | Specialty, Insurance Type, Tenure (months), Referrals Made |
| Engagement | Visits Last Year, Missed Appointments, Days Since Last Visit, Patient Portal Usage |
| Satisfaction | Overall, Wait Time, Staff Satisfaction (1–5 scale) |
| Detail | Value |
|---|---|
| Algorithm | Random Forest (300 estimators) |
| Baseline comparison | Logistic Regression, XGBoost (500 estimators) |
| Best ROC-AUC | 0.647 (Random Forest) |
| Training records | 2,000 |
| Class balancing | Stratified split |
| Evaluation metrics | ROC-AUC, MAE, Precision, Recall, F1 |
Note on model performance: The dataset used is synthetic, generated to simulate realistic healthcare churn patterns. A ROC-AUC of 0.647 reflects the signal available in this simulated data. Real-world performance would depend on the richness of EHR and claims data available.
Engineered features (not present in raw data — derived during preprocessing):
Engagement Score— composite of visit frequency, portal usage, and appointment adherenceCost-Per-Visit— total cost normalized by visit countSatisfaction Average— mean across three satisfaction dimensionsRisk Score— 3-tier label (Low / Medium / High) derived from churn probability
| Layer | Tools |
|---|---|
| ML Pipeline | Python, Scikit-learn, XGBoost, Pandas, NumPy |
| Model Serialization | pickle (.pkl) |
| Web App | Streamlit |
| Visualization | Plotly, Matplotlib, Seaborn |
| Deployment | Streamlit Community Cloud |
patient-churn-ml/
│
├── app.py # Streamlit application (main entry point)
├── churn_analysis.py # ML pipeline: preprocessing, training, evaluation
├── eda.py # Exploratory data analysis
├── requirements.txt
│
├── model/
│ ├── churn_model.pkl # Trained Random Forest model
│ ├── model_columns.pkl # Feature schema for inference
│ └── best_threshold.pkl # Optimized classification threshold
│
├── data/
│ ├── patient_churn_main.csv # Training dataset (synthetic)
│ ├── patient_churn_validation.csv
│ └── patient_conversion_marketing.csv
│
└── docs/
├── report.md
└── dashboard_screenshot.png
git clone https://github.com/prabhasteja007/patient-churn-ml.git
cd patient-churn-ml
pip install -r requirements.txt
streamlit run app.py- End-to-end ML pipeline (EDA → feature engineering → modeling → deployment)
- Feature engineering from domain knowledge (healthcare engagement signals)
- Multi-model comparison with evaluation on imbalanced data
- Production-style Streamlit deployment with real-time and batch inference
- Preprocessing pipelines: imputation, encoding, IQR-based outlier removal
