Predict whether a customer will churn (leave) a telecom company using logistic regression.
This project includes data cleaning, exploratory data analysis (EDA), feature engineering, model building, and actionable business insights.
- Source: Kaggle Telco Customer Churn
- Number of records: 7,043 customers
- Features:
- Customer demographic info (gender, senior citizen, partner, dependents)
- Service details (Internet service, phone service, streaming)
- Billing info (monthly charges, total charges, payment method, contract type)
- Target:
Churn(Yes/No)
-
Data Loading & Cleaning
- Handle missing and inconsistent values
- Convert
TotalChargesto numeric - Clean categorical features
-
Exploratory Data Analysis (EDA)
- Visualized churn distribution
- Studied relationships between churn and key features:
- Tenure
- Contract type
- Monthly charges
- Services subscribed
-
Feature Encoding & Engineering
- Converted categorical features to numeric (One-Hot Encoding & 0/1 mapping)
- Encoded target variable (
ChurnFlag)
-
Model Building
- Logistic Regression
- Trained on 80% of data, tested on 20%
- Evaluated using:
- Accuracy (~85%)
- Confusion Matrix
- ROC-AUC
-
Insights & Recommendations
- Month-to-month contract customers are most likely to churn
- High monthly charges slightly increase churn probability
- Long-tenure customers are less likely to churn
- Actionable strategies:
- Encourage month-to-month customers to switch to longer contracts
- Offer retention incentives for high-paying customers
- Promote value-added services (Tech Support, Online Security)
- Onboard new customers proactively
- Churn Distribution
- Tenure vs Churn
- Monthly Charges vs Churn
- Contract Type vs Churn
- Feature Importance (Logistic Regression Coefficients)
Tip: Save plots in
outputs/folder and reference in README with