Skip to content

screamuzor/Cervical-Cancer-SQL-Analysis

Repository files navigation

Cervical-Cancer-SQL-Analysis

SQL and Power BI analysis of cervical cancer risk factors

🏥 Cervical Cancer Risk Factor Analysis

SQL Server | Power BI | Data Analysis

SQL Server Power BI Status


📋 Project Overview

This project presents a full end-to-end data analysis of cervical cancer risk factors using a real-world medical dataset containing 858 patient records across 36 clinical variables.

The analysis identifies key risk factors associated with cervical cancer diagnosis (measured via Biopsy) and visualises findings in an interactive Power BI dashboard.


🎯 Objectives

  • Identify the strongest risk factors for cervical cancer
  • Compare cancer rates across patient groups (smokers vs non-smokers, STD positive vs negative, IUD users vs non-users)
  • Visualise findings in a clear, professional dashboard
  • Derive actionable insights from medical data using SQL

🛠️ Tools & Technologies

Tool Purpose
SQL Server 2025 Data storage and querying
SSMS SQL development environment
Power BI Desktop Data visualisation and dashboard
Microsoft Excel/CSV Raw data preparation

📂 Dataset

Property Detail
Source UCI Machine Learning Repository — Cervical Cancer Risk Factors
Rows 858 patients
Columns 36 clinical variables
Target Variable Biopsy (0 = Negative, 1 = Positive)

Key Variables:

  • Demographics: Age
  • Lifestyle: Smokes, Smokes_years, Smokes_packs_year
  • Medical history: STDs, IUD, Hormonal_Contraceptives
  • Diagnosis: Hinselmann, Schiller, Citology, Biopsy

📊 Key Findings

Cancer Detection Rate

Result Count Percentage
Negative (0) 803 93.6%
Positive (1) 55 6.4%

Risk Factor Analysis

Risk Factor Cancer Rate Risk Level
Non-Smoker 6.1% 🟢 Low
No STDs 6.1% 🟢 Low
No IUD 6.5% 🟢 Low
No Contraceptives 7.1% 🟢 Low
Smoker 8.1% 🟡 Medium
Uses Contraceptives 7.5% 🟡 Medium
Uses IUD 10.8% 🟠 High
Has STDs 15.2% 🔴 Highest

🔑 Key Insight

STDs are the strongest risk factor for cervical cancer in this dataset — patients with STDs have a 15.2% cancer rate, more than double the baseline rate of 6.1%.


📈 Power BI Dashboard

The dashboard includes:

  • 🍩 Donut Chart — Overall cancer positive vs negative rate
  • 📊 Bar Chart — Smoking vs cancer count
  • 📈 Column Chart — Age distribution of patients
  • 📊 Bar Chart — STDs vs cancer rate
  • 📊 Bar Chart — IUD usage vs cancer rate

🗂️ Project Structure

Cervical-Cancer-SQL-Analysis/
│
├── 📄 README.md
├── 📊 Cervical_Cancer_Analysis.sql       # All SQL queries
├── 📈 Cervical_Cancer_Dashboard.pbix     # Power BI dashboard
└── 📁 data/
    └── cervical_cancer.csv               # Raw dataset (CSV)

🚀 How to Run This Project

SQL Analysis:

  1. Install SQL Server 2025 and SSMS
  2. Create a database called cervicalsqltable
  3. Import cervical_cancer.csv using SSMS Import Flat File wizard
  4. Open and run Cervical_Cancer_Analysis.sql

Power BI Dashboard:

  1. Install Power BI Desktop
  2. Open Cervical_Cancer_Dashboard.pbix
  3. Update the SQL Server connection to your local server
  4. Refresh data

💡 Insights & Recommendations

  1. STD screening should be prioritised in cervical cancer prevention programs
  2. Smokers face a 33% higher relative risk — smoking cessation programs are important
  3. IUD users show elevated risk (10.8%) — regular screening is recommended
  4. The youngest positive patient was 16 — early screening programs are critical

👤 Author

Uzoma

  • 🎓 Masters Student — Data Science, University of Europe for Applied Sciences (Potsdam, Germany)
  • 💼 10+ years experience in the downstream energy sector
  • 🌍 LinkedIn: [uzomaeze]
  • 📧 Email: [uzomaneze@gmail.com]

📜 License

This project is open source and available under the MIT License.


This project was completed as part of an ongoing journey transitioning from energy sector expertise into data science.

About

SQL and Power BI analysis of cervical cancer risk factors

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages