Skip to content

igorrocha-tech/hr-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HR Analytics: The $12.25M Hidden Cost of Employee Turnover

A data-driven analysis quantifying employee attrition and proving the ROI of retention programs. Stack: Python (pandas, scipy, scikit-learn) · SQL · Power BI


🎯 The Business Question

How much does employee turnover really cost a company — and can a retention program pay for itself?

This project analyzes the IBM HR Analytics dataset (1,470 employees, 35 features) to quantify the financial impact of attrition and identify the operational levers that drive it.


📊 Key Findings

Metric Value Insight
Annual Cost of Turnover $12.25M Hidden in HR, recruiting, and productivity budgets
Cost per Departure $52k ~90% of annual salary (industry benchmark)
Attrition Rate 16.1% Above the 10-15% market benchmark
High-Risk Employees 17 Identified through composite risk scoring
Retention Program ROI 819% $200k investment → $1.84M annual savings
Payback Period 1.3 months Faster than most marketing investments

🔥 The Three Critical Drivers (statistically validated)

1. Overtime: the 3x Multiplier

Employees working overtime are 3x more likely to quit. Despite being only 28% of the workforce, they account for 54% of all departures.

Recommended action: Strict overtime policy with quarterly review.

2. R&D + Sales = 96% of the Total Cost

Out of 3 departments, two concentrate almost all the financial damage:

  • R&D: $5.9M annual turnover cost
  • Sales: $5.87M annual turnover cost
  • HR: ~$0.5M

Recommended action: Focus 100% of retention programs on R&D and Sales first.

3. The Critical First Two Years

29.82% of new hires leave within their first 2 years — nearly 4x higher than employees with 10+ years (8.13%).

Recommended action: Mandatory mentoring program for the first 24 months.


💡 Three Counter-Intuitive Insights

These findings broke common assumptions and are worth highlighting:

Counter-Intuitive #1: Perfect Work-Life Balance Doesn't Retain

Employees who rate their work-life balance as "4 — excellent" have a 17.65% attrition rate — higher than those who rate it 2 or 3 (~16%).

Hypothesis: too much balance may signal under-engagement.

Counter-Intuitive #2: Salary Hikes Above 20% Don't Help Either

  • Low hike (<12%): 19.52% attrition (expected)
  • Top hike (20%+): 16.37% attrition (still high!)

Hypothesis: large raises may be "retention counter-offers" that come too late, or signal increased pressure.

Counter-Intuitive #3: HR Itself Has a Turnover Problem

The HR department has an attrition rate of 19.05% — the very team responsible for retention is failing to retain itself.


🛠️ Technical Approach

Data Pipeline

Raw CSV (Kaggle)
    ↓
Notebook 1: Data Cleaning + EDA (pandas, seaborn)
    ↓
Notebook 2: Statistical Validation (scipy)
    ↓
Notebook 3: Cost Analysis + Risk Scoring (pandas, numpy)
    ↓
Processed CSVs (fact + dim tables)
    ↓
Power BI Dashboard (5 pages, 15+ DAX measures)

Statistical Tests Applied

  • Mann-Whitney U: salary differences between leavers and stayers
  • Chi-square + Cramér's V: categorical relationships (Overtime × Attrition)
  • Multivariate ranking: identified top 5 predictors of turnover

Cost Model Assumptions

Based on SHRM and Gallup research, the turnover cost was estimated as 90% of annual salary, broken down into:

  • Recruitment (20%)
  • Onboarding (15%)
  • Productivity loss (40%)
  • Intangible cost (15%)

Risk Scoring (composite score)

Each active employee received a 0-10 risk score based on weighted factors:

  • Overtime (+2 points)
  • Distance >20km (+2 points)
  • Low satisfaction (+2 points)
  • Low salary (+1 point)
  • Tenure <3 years (+1 point)
  • Age <30 (+1 point)

Resulting classification:

  • HIGH RISK: score ≥ 6 → 17 employees identified
  • MEDIUM RISK: score 3-5
  • LOW RISK: score 0-2

📂 Project Structure

hr-analytics/
├── README.md
├── .gitignore
├── data/
│   ├── raw/                                  (not pushed — see .gitignore)
│   └── processed/
│       ├── hr_employees_processed.csv
│       ├── hr_at_risk.csv
│       ├── financial_summary.csv
│       ├── statistical_ranking.csv
│       ├── numeric_tests.csv
│       └── categorical_tests.csv
├── notebooks/
│   ├── 01_data_cleaning_eda.ipynb
│   ├── 02_statistical_analysis.ipynb
│   └── 03_cost_analysis.ipynb
├── sql/
│   └── hr_analysis.sql
├── dashboard/
│   └── hr_dashboard.pbix
└── docs/
    ├── 01_executive_summary.png
    ├── 02_demographics.png
    ├── 03_compensation.png
    ├── 04_satisfaction.png
    ├── 05_cost_action.png
    └── hr_dashboard.pdf

🚀 How to Reproduce

Prerequisites

  • Python 3.9+
  • Power BI Desktop (free)
  • Jupyter Notebook

Setup

git clone https://github.com/igorrocha-tech/hr-analytics.git
cd hr-analytics

# Install Python dependencies
pip install pandas numpy matplotlib seaborn scipy scikit-learn jupyter

# Download the dataset from Kaggle (link in /data/raw/README.md)
# Place WA_Fn-UseC_-HR-Employee-Attrition.csv in data/raw/

# Run the notebooks in order
jupyter notebook

Run order:

  1. 01_data_cleaning_eda.ipynb → generates hr_employees_processed.csv
  2. 02_statistical_analysis.ipynb → generates statistical results
  3. 03_cost_analysis.ipynb → generates hr_at_risk.csv + financial_summary.csv
  4. Open dashboard/hr_dashboard.pbix in Power BI Desktop

🖼️ Dashboard Preview

Page 1 — Executive Summary

Executive Summary

The hero number ($12.25M) reframes turnover from "HR problem" to "CFO problem."

Page 2 — Demographics & Tenure

Demographics

Reveals the critical 0-2 year window, Single status as risk factor, and HR's own retention issue.

Page 3 — Compensation & Career

Compensation

The Salary × Tenure scatter shows the "danger zone": low salary + short tenure concentrates attrition.

Page 4 — Satisfaction & Work-Life

Satisfaction

Home of the 3x overtime multiplier insight — the single most actionable finding.

Page 5 — Cost & Action Plan

Cost & Action

Three retention scenarios with quantified savings, top 17 high-risk employees, and 5-point action plan.


📝 Recommended Actions (in priority order)

  1. Overtime Policy (impact: $$$$) Implement strict overtime limits with quarterly review. Expected: 5-10% turnover reduction.

  2. High-Risk 1:1s (impact: $$$) Schedule conversations with the 17 high-risk employees within 30 days.

  3. R&D + Sales Focus (impact: $$$) These departments concentrate 96% of total turnover cost. Prioritize all retention programs here.

  4. Mentoring (0-2 years) (impact: $$) 30% of new hires leave in the first 2 years. Implement onboarding mentorship.

  5. Engagement Survey (impact: $) Quarterly pulse surveys to catch issues early.


🧠 Skills Demonstrated

Category Skills
Python pandas, numpy, matplotlib, seaborn, scipy.stats, scikit-learn
Statistical Analysis Hypothesis testing (t-test, chi-square, Mann-Whitney), correlation analysis, Cramér's V
Data Engineering Data cleaning, feature engineering, dimensional modeling, CSV transformations
Power BI 15+ DAX measures, conditional formatting, treemap, what-if scenarios, calculated columns
Business Analysis Cost modeling, ROI calculation, risk scoring, scenario simulation
Storytelling Executive dashboard design, actionable recommendations, counter-intuitive insights

📬 Contact

Igor Rocha Nunes — Data Analyst

Open to remote freelance and contract analytics projects.


📄 License

This project uses the publicly available IBM HR Analytics Dataset from Kaggle (synthetic data). Analysis, code, and dashboard are licensed under MIT — feel free to learn from and adapt.


⚖️ Disclaimer

This is a portfolio project using synthetic data from IBM. Cost figures, ROI calculations, and risk scores are based on industry benchmarks (SHRM, Gallup) applied to the dataset values. The analysis methodology is real and reproducible; the specific numbers should be interpreted as illustrative of the approach, not as findings from a real company.

About

$12.25M hidden cost of employee turnover — IBM HR Analytics with Python, SQL & Power BI. ROI of 819% on retention.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors