Skip to content

prabhasteja007/Patient_Churn_Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Patient Churn Predictor

An end-to-end machine learning system for predicting patient churn risk in healthcare settings — enabling proactive retention before disengagement occurs.

Live App → patientchurnpredictor.streamlit.app

Patient Churn Predictor Dashboard


What It Does

Healthcare providers lose patients silently — missed appointments escalate into full disengagement. This system scores individual patients on their likelihood to churn using behavioral, clinical, financial, and satisfaction signals, then surfaces actionable retention recommendations.

Input a patient profile → get a real-time churn probability + risk tier + intervention guidance.


Features

  • Risk Assessment — Real-time churn probability score with Low / Medium / High risk classification
  • Engineered Metrics — Engagement Score, Cost-Per-Visit, Satisfaction Average, and Visit Frequency computed from raw inputs
  • Detailed Analysis — Feature-level breakdown showing which factors are driving risk for each patient
  • Intervention Guide — Actionable retention recommendations based on risk profile
  • Batch Prediction — Upload a CSV of patients and get churn scores across a full cohort

Input Features

Category Features
Demographics Age, Gender, State
Clinical Specialty, Insurance Type, Tenure (months), Referrals Made
Engagement Visits Last Year, Missed Appointments, Days Since Last Visit, Patient Portal Usage
Satisfaction Overall, Wait Time, Staff Satisfaction (1–5 scale)

Model

Detail Value
Algorithm Random Forest (300 estimators)
Baseline comparison Logistic Regression, XGBoost (500 estimators)
Best ROC-AUC 0.647 (Random Forest)
Training records 2,000
Class balancing Stratified split
Evaluation metrics ROC-AUC, MAE, Precision, Recall, F1

Note on model performance: The dataset used is synthetic, generated to simulate realistic healthcare churn patterns. A ROC-AUC of 0.647 reflects the signal available in this simulated data. Real-world performance would depend on the richness of EHR and claims data available.

Engineered features (not present in raw data — derived during preprocessing):

  • Engagement Score — composite of visit frequency, portal usage, and appointment adherence
  • Cost-Per-Visit — total cost normalized by visit count
  • Satisfaction Average — mean across three satisfaction dimensions
  • Risk Score — 3-tier label (Low / Medium / High) derived from churn probability

Tech Stack

Layer Tools
ML Pipeline Python, Scikit-learn, XGBoost, Pandas, NumPy
Model Serialization pickle (.pkl)
Web App Streamlit
Visualization Plotly, Matplotlib, Seaborn
Deployment Streamlit Community Cloud

Repository Structure

patient-churn-ml/
│
├── app.py                        # Streamlit application (main entry point)
├── churn_analysis.py             # ML pipeline: preprocessing, training, evaluation
├── eda.py                        # Exploratory data analysis
├── requirements.txt
│
├── model/
│   ├── churn_model.pkl           # Trained Random Forest model
│   ├── model_columns.pkl         # Feature schema for inference
│   └── best_threshold.pkl        # Optimized classification threshold
│
├── data/
│   ├── patient_churn_main.csv    # Training dataset (synthetic)
│   ├── patient_churn_validation.csv
│   └── patient_conversion_marketing.csv
│
└── docs/
    ├── report.md
    └── dashboard_screenshot.png

Run Locally

git clone https://github.com/prabhasteja007/patient-churn-ml.git
cd patient-churn-ml
pip install -r requirements.txt
streamlit run app.py

Skills Demonstrated

  • End-to-end ML pipeline (EDA → feature engineering → modeling → deployment)
  • Feature engineering from domain knowledge (healthcare engagement signals)
  • Multi-model comparison with evaluation on imbalanced data
  • Production-style Streamlit deployment with real-time and batch inference
  • Preprocessing pipelines: imputation, encoding, IQR-based outlier removal

Releases

No releases published

Packages

 
 
 

Contributors