SQL and Power BI analysis of cervical cancer risk factors
This project presents a full end-to-end data analysis of cervical cancer risk factors using a real-world medical dataset containing 858 patient records across 36 clinical variables.
The analysis identifies key risk factors associated with cervical cancer diagnosis (measured via Biopsy) and visualises findings in an interactive Power BI dashboard.
- Identify the strongest risk factors for cervical cancer
- Compare cancer rates across patient groups (smokers vs non-smokers, STD positive vs negative, IUD users vs non-users)
- Visualise findings in a clear, professional dashboard
- Derive actionable insights from medical data using SQL
| Tool | Purpose |
|---|---|
| SQL Server 2025 | Data storage and querying |
| SSMS | SQL development environment |
| Power BI Desktop | Data visualisation and dashboard |
| Microsoft Excel/CSV | Raw data preparation |
| Property | Detail |
|---|---|
| Source | UCI Machine Learning Repository — Cervical Cancer Risk Factors |
| Rows | 858 patients |
| Columns | 36 clinical variables |
| Target Variable | Biopsy (0 = Negative, 1 = Positive) |
Key Variables:
- Demographics: Age
- Lifestyle: Smokes, Smokes_years, Smokes_packs_year
- Medical history: STDs, IUD, Hormonal_Contraceptives
- Diagnosis: Hinselmann, Schiller, Citology, Biopsy
| Result | Count | Percentage |
|---|---|---|
| Negative (0) | 803 | 93.6% |
| Positive (1) | 55 | 6.4% |
| Risk Factor | Cancer Rate | Risk Level |
|---|---|---|
| Non-Smoker | 6.1% | 🟢 Low |
| No STDs | 6.1% | 🟢 Low |
| No IUD | 6.5% | 🟢 Low |
| No Contraceptives | 7.1% | 🟢 Low |
| Smoker | 8.1% | 🟡 Medium |
| Uses Contraceptives | 7.5% | 🟡 Medium |
| Uses IUD | 10.8% | 🟠 High |
| Has STDs | 15.2% | 🔴 Highest |
STDs are the strongest risk factor for cervical cancer in this dataset — patients with STDs have a 15.2% cancer rate, more than double the baseline rate of 6.1%.
The dashboard includes:
- 🍩 Donut Chart — Overall cancer positive vs negative rate
- 📊 Bar Chart — Smoking vs cancer count
- 📈 Column Chart — Age distribution of patients
- 📊 Bar Chart — STDs vs cancer rate
- 📊 Bar Chart — IUD usage vs cancer rate
Cervical-Cancer-SQL-Analysis/
│
├── 📄 README.md
├── 📊 Cervical_Cancer_Analysis.sql # All SQL queries
├── 📈 Cervical_Cancer_Dashboard.pbix # Power BI dashboard
└── 📁 data/
└── cervical_cancer.csv # Raw dataset (CSV)
SQL Analysis:
- Install SQL Server 2025 and SSMS
- Create a database called
cervicalsqltable - Import
cervical_cancer.csvusing SSMS Import Flat File wizard - Open and run
Cervical_Cancer_Analysis.sql
Power BI Dashboard:
- Install Power BI Desktop
- Open
Cervical_Cancer_Dashboard.pbix - Update the SQL Server connection to your local server
- Refresh data
- STD screening should be prioritised in cervical cancer prevention programs
- Smokers face a 33% higher relative risk — smoking cessation programs are important
- IUD users show elevated risk (10.8%) — regular screening is recommended
- The youngest positive patient was 16 — early screening programs are critical
Uzoma
- 🎓 Masters Student — Data Science, University of Europe for Applied Sciences (Potsdam, Germany)
- 💼 10+ years experience in the downstream energy sector
- 🌍 LinkedIn: [uzomaeze]
- 📧 Email: [uzomaneze@gmail.com]
This project is open source and available under the MIT License.
This project was completed as part of an ongoing journey transitioning from energy sector expertise into data science.