HR Analytics: The $12.25M Hidden Cost of Employee Turnover

A data-driven analysis quantifying employee attrition and proving the ROI of retention programs. Stack: Python (pandas, scipy, scikit-learn) · SQL · Power BI

🎯 The Business Question

How much does employee turnover really cost a company — and can a retention program pay for itself?

This project analyzes the IBM HR Analytics dataset (1,470 employees, 35 features) to quantify the financial impact of attrition and identify the operational levers that drive it.

📊 Key Findings

Metric	Value	Insight
Annual Cost of Turnover	$12.25M	Hidden in HR, recruiting, and productivity budgets
Cost per Departure	$52k	~90% of annual salary (industry benchmark)
Attrition Rate	16.1%	Above the 10-15% market benchmark
High-Risk Employees	17	Identified through composite risk scoring
Retention Program ROI	819%	$200k investment → $1.84M annual savings
Payback Period	1.3 months	Faster than most marketing investments

🔥 The Three Critical Drivers (statistically validated)

1. Overtime: the 3x Multiplier

Employees working overtime are 3x more likely to quit. Despite being only 28% of the workforce, they account for 54% of all departures.

Recommended action: Strict overtime policy with quarterly review.

2. R&D + Sales = 96% of the Total Cost

Out of 3 departments, two concentrate almost all the financial damage:

R&D: $5.9M annual turnover cost
Sales: $5.87M annual turnover cost
HR: ~$0.5M

Recommended action: Focus 100% of retention programs on R&D and Sales first.

3. The Critical First Two Years

29.82% of new hires leave within their first 2 years — nearly 4x higher than employees with 10+ years (8.13%).

Recommended action: Mandatory mentoring program for the first 24 months.

💡 Three Counter-Intuitive Insights

These findings broke common assumptions and are worth highlighting:

Counter-Intuitive #1: Perfect Work-Life Balance Doesn't Retain

Employees who rate their work-life balance as "4 — excellent" have a 17.65% attrition rate — higher than those who rate it 2 or 3 (~16%).

Hypothesis: too much balance may signal under-engagement.

Counter-Intuitive #2: Salary Hikes Above 20% Don't Help Either

Low hike (<12%): 19.52% attrition (expected)
Top hike (20%+): 16.37% attrition (still high!)

Hypothesis: large raises may be "retention counter-offers" that come too late, or signal increased pressure.

Counter-Intuitive #3: HR Itself Has a Turnover Problem

The HR department has an attrition rate of 19.05% — the very team responsible for retention is failing to retain itself.

🛠️ Technical Approach

Data Pipeline

Raw CSV (Kaggle)
    ↓
Notebook 1: Data Cleaning + EDA (pandas, seaborn)
    ↓
Notebook 2: Statistical Validation (scipy)
    ↓
Notebook 3: Cost Analysis + Risk Scoring (pandas, numpy)
    ↓
Processed CSVs (fact + dim tables)
    ↓
Power BI Dashboard (5 pages, 15+ DAX measures)

Statistical Tests Applied

Mann-Whitney U: salary differences between leavers and stayers
Chi-square + Cramér's V: categorical relationships (Overtime × Attrition)
Multivariate ranking: identified top 5 predictors of turnover

Cost Model Assumptions

Based on SHRM and Gallup research, the turnover cost was estimated as 90% of annual salary, broken down into:

Recruitment (20%)
Onboarding (15%)
Productivity loss (40%)
Intangible cost (15%)

Risk Scoring (composite score)

Each active employee received a 0-10 risk score based on weighted factors:

Overtime (+2 points)
Distance >20km (+2 points)
Low satisfaction (+2 points)
Low salary (+1 point)
Tenure <3 years (+1 point)
Age <30 (+1 point)

Resulting classification:

HIGH RISK: score ≥ 6 → 17 employees identified
MEDIUM RISK: score 3-5
LOW RISK: score 0-2

📂 Project Structure

hr-analytics/
├── README.md
├── .gitignore
├── data/
│   ├── raw/                                  (not pushed — see .gitignore)
│   └── processed/
│       ├── hr_employees_processed.csv
│       ├── hr_at_risk.csv
│       ├── financial_summary.csv
│       ├── statistical_ranking.csv
│       ├── numeric_tests.csv
│       └── categorical_tests.csv
├── notebooks/
│   ├── 01_data_cleaning_eda.ipynb
│   ├── 02_statistical_analysis.ipynb
│   └── 03_cost_analysis.ipynb
├── sql/
│   └── hr_analysis.sql
├── dashboard/
│   └── hr_dashboard.pbix
└── docs/
    ├── 01_executive_summary.png
    ├── 02_demographics.png
    ├── 03_compensation.png
    ├── 04_satisfaction.png
    ├── 05_cost_action.png
    └── hr_dashboard.pdf

🚀 How to Reproduce

Prerequisites

Python 3.9+
Power BI Desktop (free)
Jupyter Notebook

Setup

git clone https://github.com/igorrocha-tech/hr-analytics.git
cd hr-analytics

# Install Python dependencies
pip install pandas numpy matplotlib seaborn scipy scikit-learn jupyter

# Download the dataset from Kaggle (link in /data/raw/README.md)
# Place WA_Fn-UseC_-HR-Employee-Attrition.csv in data/raw/

# Run the notebooks in order
jupyter notebook

Run order:

01_data_cleaning_eda.ipynb → generates hr_employees_processed.csv
02_statistical_analysis.ipynb → generates statistical results
03_cost_analysis.ipynb → generates hr_at_risk.csv + financial_summary.csv
Open dashboard/hr_dashboard.pbix in Power BI Desktop

🖼️ Dashboard Preview

Page 1 — Executive Summary

The hero number ($12.25M) reframes turnover from "HR problem" to "CFO problem."

Page 2 — Demographics & Tenure

Reveals the critical 0-2 year window, Single status as risk factor, and HR's own retention issue.

Page 3 — Compensation & Career

The Salary × Tenure scatter shows the "danger zone": low salary + short tenure concentrates attrition.

Page 4 — Satisfaction & Work-Life

Home of the 3x overtime multiplier insight — the single most actionable finding.

Page 5 — Cost & Action Plan

Three retention scenarios with quantified savings, top 17 high-risk employees, and 5-point action plan.

📝 Recommended Actions (in priority order)

Overtime Policy (impact: $$$$) Implement strict overtime limits with quarterly review. Expected: 5-10% turnover reduction.
High-Risk 1:1s (impact: $$$) Schedule conversations with the 17 high-risk employees within 30 days.
R&D + Sales Focus (impact: $$$) These departments concentrate 96% of total turnover cost. Prioritize all retention programs here.
Mentoring (0-2 years) (impact: $$) 30% of new hires leave in the first 2 years. Implement onboarding mentorship.
Engagement Survey (impact: $) Quarterly pulse surveys to catch issues early.

🧠 Skills Demonstrated

Category	Skills
Python	pandas, numpy, matplotlib, seaborn, scipy.stats, scikit-learn
Statistical Analysis	Hypothesis testing (t-test, chi-square, Mann-Whitney), correlation analysis, Cramér's V
Data Engineering	Data cleaning, feature engineering, dimensional modeling, CSV transformations
Power BI	15+ DAX measures, conditional formatting, treemap, what-if scenarios, calculated columns
Business Analysis	Cost modeling, ROI calculation, risk scoring, scenario simulation
Storytelling	Executive dashboard design, actionable recommendations, counter-intuitive insights

📬 Contact

Igor Rocha Nunes — Data Analyst

💼 LinkedIn: igor-rocha-nunes
🐙 GitHub: igorrocha-tech
📊 Other portfolio project: Olist E-commerce Analysis

Open to remote freelance and contract analytics projects.

📄 License

This project uses the publicly available IBM HR Analytics Dataset from Kaggle (synthetic data). Analysis, code, and dashboard are licensed under MIT — feel free to learn from and adapt.

⚖️ Disclaimer

This is a portfolio project using synthetic data from IBM. Cost figures, ROI calculations, and risk scores are based on industry benchmarks (SHRM, Gallup) applied to the dataset values. The analysis methodology is real and reproducible; the specific numbers should be interpreted as illustrative of the approach, not as findings from a real company.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dashboard		dashboard
data		data
docs		docs
notebooks		notebooks
sql		sql
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

HR Analytics: The $12.25M Hidden Cost of Employee Turnover

🎯 The Business Question

📊 Key Findings

🔥 The Three Critical Drivers (statistically validated)

1. Overtime: the 3x Multiplier

2. R&D + Sales = 96% of the Total Cost

3. The Critical First Two Years

💡 Three Counter-Intuitive Insights

Counter-Intuitive #1: Perfect Work-Life Balance Doesn't Retain

Counter-Intuitive #2: Salary Hikes Above 20% Don't Help Either

Counter-Intuitive #3: HR Itself Has a Turnover Problem

🛠️ Technical Approach

Data Pipeline

Statistical Tests Applied

Cost Model Assumptions

Risk Scoring (composite score)

📂 Project Structure

🚀 How to Reproduce

Prerequisites

Setup

Run order:

🖼️ Dashboard Preview

Page 1 — Executive Summary

Page 2 — Demographics & Tenure

Page 3 — Compensation & Career

Page 4 — Satisfaction & Work-Life

Page 5 — Cost & Action Plan

📝 Recommended Actions (in priority order)

🧠 Skills Demonstrated

📬 Contact

📄 License

⚖️ Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages