HealthWithSevgi

An interactive, browser-based machine learning education tool for healthcare professionals.

SENG 430 - Software Quality Assurance Cankaya University - Spring 2025-2026 Instructor: Dr. Sevgi Koyuncu Tunç

HealthWithSevgi guides clinicians through a complete ML pipeline in 7 steps — from selecting a medical specialty to training a model, interpreting predictions with SHAP, and auditing fairness — all with zero coding required.

Live Demo | Jira Board | Figma Designs | Setup Guide

Overview

Healthcare professionals increasingly encounter AI/ML in clinical settings but rarely get hands-on experience with how these systems work. HealthWithSevgi bridges that gap by providing an intuitive, wizard-style interface that walks users through every stage of the machine learning lifecycle using real clinical datasets.

Key capabilities:

20 medical specialties with real-world clinical datasets (Cardiology, Oncology, Nephrology, Neurology, ICU/Sepsis, Dermatology, and more)
8 ML classifiers with interactive hyperparameter tuning via sliders
SHAP-based explainability — global feature importance and single-patient waterfall explanations
Fairness auditing — subgroup performance analysis across demographics with bias detection
EU AI Act compliance checklist with downloadable PDF certificate
No server-side data storage — all session data is held in-memory and evicted automatically

The 7-Step Pipeline

Step	Name	What Happens
1	Clinical Context	Introduces the medical problem the AI will address. Displays the clinical question, why it matters, and the 7-step roadmap.
2	Data Exploration	Upload a CSV file (up to 50 MB) or load a built-in clinical dataset. Inspect column statistics, missing values, and class distribution. Confirm the target variable.
3	Data Preparation	Configure preprocessing: train/test split ratio, missing value strategy (median/mode/drop), normalization (z-score/min-max), SMOTE for class imbalance, and outlier handling (IQR/z-score clipping).
4	Model & Parameters	Choose from 8 ML models. Adjust hyperparameters with intuitive sliders. Optionally enable hyperparameter tuning (RandomizedSearchCV) and feature selection (VarianceThreshold + SelectKBest).
5	Results & Evaluation	View accuracy, sensitivity, specificity, precision, F1, AUC-ROC, and MCC. Explore interactive ROC curves, precision-recall curves, and confusion matrices. Detect overfitting via cross-validation comparison.
6	Explainability	Global feature importance ranking with clinical name mapping. Single-patient SHAP waterfall charts with plain-language summaries (e.g., "High glucose increases diabetes risk by 0.23").
7	Ethics & Bias	Subgroup fairness audit (by age, gender, ethnicity). Bias warnings for performance gaps >10%. EU AI Act compliance checklist. Real-world case studies of AI bias in healthcare. Downloadable PDF compliance certificate.

Supported Specialties

#	Specialty	Prediction Task	Dataset	Samples
1	Cardiology	30-day heart failure mortality	Heart Failure Clinical Records	~300
2	Radiology	Pneumonia detection (chest X-ray metadata)	NIH Chest X-ray	100K+
3	Nephrology	Chronic kidney disease detection	UCI CKD	400
4	Oncology - Breast	Malignant vs. benign biopsy	Wisconsin Breast Cancer	569
5	Neurology - Parkinson's	Parkinson's from voice biomarkers	UCI Parkinson's	195
6	Endocrinology - Diabetes	Diabetes onset within 5 years	Pima Indians	768
7	Hepatology - Liver	Liver disease detection	Indian Liver Patient	583
8	Cardiology - Stroke	Stroke risk prediction	Kaggle Stroke Prediction	5,110
9	Mental Health	Depression severity (PHQ-9)	Kaggle Depression	~1,000
10	Pulmonology - COPD	COPD exacerbation risk	PhysioNet + Kaggle	~1,000
11	Haematology - Anaemia	Anaemia type classification	Kaggle Anaemia	~400
12	Dermatology	Benign vs. malignant skin lesion	HAM10000 metadata	~10K
13	Ophthalmology	Diabetic retinopathy detection	UCI Diabetic Retinopathy	1,151
14	Orthopaedics - Spine	Disc herniation / spondylolisthesis	UCI Vertebral Column	310
15	ICU / Sepsis	Sepsis onset within 6 hours	PhysioNet Sepsis	~40K
16	Obstetrics - Fetal Health	Fetal health classification (CTG)	UCI Fetal Health	2,126
17	Cardiology - Arrhythmia	Arrhythmia detection (ECG)	UCI Arrhythmia	452
18	Oncology - Cervical	Cervical cancer risk	UCI Cervical Cancer	858
19	Thyroid / Endocrinology	Thyroid function classification	UCI Thyroid	9,172
20	Pharmacy - Readmission	Hospital readmission risk	UCI Diabetes 130-US	101,766

ML Models

Model	Category	Key Hyperparameters
K-Nearest Neighbors	Instance-based	k (1-25), distance metric
Support Vector Machine	Boundary-based	C (0.01-100), kernel (linear/rbf/poly)
Decision Tree	Tree-based	max_depth (1-20), criterion (gini/entropy)
Random Forest	Ensemble	n_estimators (10-500), max_depth
Logistic Regression	Linear	C (0.001-100), solver (lbfgs/saga)
Naive Bayes	Probabilistic	var_smoothing (1e-12 to 1e-3)
XGBoost	Gradient Boosting	n_estimators, max_depth, learning_rate
LightGBM	Gradient Boosting	n_estimators, max_depth, learning_rate

All models are trained with balanced class weights where supported. Optional hyperparameter tuning uses RandomizedSearchCV (20 iterations, 3-fold CV). Feature selection combines VarianceThreshold with SelectKBest (mutual information).

Tech Stack

Layer	Technology	Purpose
Frontend	React 18, TypeScript, Vite	Single-page wizard application
UI Components	Recharts, Lucide Icons, react-dropzone	Charts, icons, file uploads
State Management	TanStack React Query	Server state caching and synchronization
Backend	FastAPI, Python 3.12	REST API with auto-generated OpenAPI docs
ML Engine	scikit-learn, XGBoost, LightGBM	Model training, evaluation, cross-validation
Explainability	SHAP	TreeExplainer (tree models), KernelExplainer (linear), permutation importance
Data Processing	pandas, numpy, imbalanced-learn	Data cleaning, normalization, SMOTE
PDF Generation	ReportLab	Compliance certificate export
Containerization	Docker (multi-stage)	Production deployment
Hosting	HuggingFace Spaces	Live demo environment
Package Manager	pnpm (frontend), pip (backend)	Dependency management

Architecture

📐 Full Architecture Diagrams (Google Drive) — C4 model diagrams (System Context, Container, Component, Code levels), toolchain diagrams, and data flow sequences.

                          +---------------------+
                          |   Browser (React)   |
                          |   Wizard UI (SPA)   |
                          +----------+----------+
                                     |
                            HTTP/REST (JSON)
                                     |
                          +----------v----------+
                          |   FastAPI Backend    |
                          +----------+----------+
                                     |
              +----------------------+----------------------+
              |              |              |                |
     +--------v---+  +------v-----+  +-----v------+  +-----v--------+
     | DataService|  | MLService  |  |ExplainSvc  |  | EthicsService|
     |            |  |            |  |            |  |              |
     | - Explore  |  | - Train    |  | - SHAP     |  | - Subgroup   |
     | - Prepare  |  | - Evaluate |  | - Waterfall|  | - Bias detect|
     | - SMOTE    |  | - Compare  |  | - Clinical |  | - EU AI Act  |
     +-----+------+  +------+-----+  +------+-----+  +------+-------+
           |                |                |                |
           v                v                v                v
     +-----------+   +------------+   +------------+   +-----------+
     | In-Memory |   | In-Memory  |   |   SHAP     |   | ReportLab |
     | Sessions  |   | Models     |   |  Library   |   |  PDF Gen  |
     | (LRU 50)  |   | (LRU 100+)|   |            |   |           |
     +-----------+   +------------+   +------------+   +-----------+

Data flow: Upload CSV -> Explore columns -> Preprocess (split, normalize, SMOTE) -> Train model -> Evaluate metrics -> SHAP explanations -> Fairness audit -> PDF certificate

Project Structure

HealthWithSevgi/
|
+-- frontend/                         # React 18 + Vite + TypeScript
|   +-- src/
|   |   +-- pages/                    # Step 1-7 wizard pages
|   |   |   +-- Step1ClinicalContext.tsx
|   |   |   +-- Step2DataExploration.tsx
|   |   |   +-- Step3DataPreparation.tsx
|   |   |   +-- Step4ModelParameters.tsx
|   |   |   +-- Step5Results.tsx
|   |   |   +-- Step6Explainability.tsx
|   |   |   +-- Step7Ethics.tsx
|   |   +-- components/               # Reusable UI components
|   |   |   +-- NavBar.tsx            # Specialty switcher, glossary
|   |   |   +-- WizardProgress.tsx    # Step progress tracker
|   |   |   +-- SpecialtySelector.tsx # 20-specialty grid
|   |   |   +-- ColumnMapperModal.tsx # Target column confirmation
|   |   |   +-- ErrorModal.tsx       # Error display modal
|   |   |   +-- charts/              # Visualization components
|   |   |       +-- ConfusionMatrixChart.tsx  # 2x2 confusion matrix
|   |   |       +-- KNNScatterCanvas.tsx     # KNN decision boundary
|   |   |       +-- PRCurveChart.tsx         # Precision-Recall curve
|   |   |       +-- ROCCurveChart.tsx        # ROC curve with AUC badge
|   |   +-- api/                      # API client layer
|   |   |   +-- client.ts            # Axios instance + interceptors
|   |   |   +-- specialties.ts       # Specialty endpoints
|   |   |   +-- data.ts              # Explore + Prepare endpoints
|   |   |   +-- ml.ts                # Train + Compare endpoints
|   |   |   +-- explain.ts           # Explainability + Ethics + Certificate
|   |   +-- types/index.ts           # Shared TypeScript interfaces
|   |   +-- styles/globals.css        # Global CSS + theme variables
|   |   +-- App.tsx                   # Main wizard state manager
|   |   +-- main.tsx                  # Application entry point
|   +-- package.json
|   +-- vite.config.ts
|
+-- backend/                          # FastAPI REST API + ML engine
|   +-- app/
|   |   +-- main.py                   # FastAPI setup, CORS, routers
|   |   +-- routers/
|   |   |   +-- data_router.py        # /specialties, /explore, /prepare
|   |   |   +-- ml_router.py          # /train, /compare, /models
|   |   |   +-- explain_router.py     # /explain/*, /ethics, /certificate
|   |   +-- services/
|   |   |   +-- data_service.py       # Dataset loading, exploration, preprocessing
|   |   |   +-- ml_service.py         # Model building, training, evaluation
|   |   |   +-- explain_service.py    # SHAP explanations, clinical mapping
|   |   |   +-- ethics_service.py     # Fairness audit, bias detection
|   |   |   +-- certificate_service.py # PDF certificate generation
|   |   |   +-- specialty_registry.py # 20 specialty definitions + datasets
|   |   +-- models/
|   |   |   +-- schemas.py            # Data exploration/preparation DTOs
|   |   |   +-- ml_schemas.py         # Training/evaluation DTOs
|   |   |   +-- explain_schemas.py    # Explainability/ethics DTOs
|   |   +-- utils/                    # Utility modules
|   +-- data_cache/                   # Cached clinical CSV datasets
|   +-- datasets/                     # Additional dataset storage
|   +-- tests/                        # pytest test suite (178 tests)
|   |   +-- conftest.py              # Shared fixtures
|   |   +-- test_step1_clinical_context.py
|   |   +-- test_step2_data_exploration.py
|   |   +-- test_step3_data_preparation.py
|   |   +-- test_step6_explainability.py
|   |   +-- test_step7_ethics.py
|   |   +-- test_certificate.py
|   +-- pytest.ini
|   +-- requirements.txt
|
+-- hf-space/                         # HuggingFace Spaces deployment
|   +-- main_hf.py                    # Combined API + SPA entrypoint
|   +-- Dockerfile                    # HF-specific Docker build
|   +-- README.md                     # HF Space metadata
|
+-- docs/                             # Documentation & design specs
|   +-- ML_Tool_User_Guide.md         # Course user manual
|   +-- Sprint_1_Assignment.md        # Sprint 1 requirements
|   +-- Clinical_Specialties_Dataset_Collection.pdf
|   +-- diagrams/                     # C4 architecture + toolchain PDFs
|   +-- drawio/                       # Editable draw.io source files
|   +-- mermaid/                      # C4 architecture (Mermaid source)
|   +-- iso42001/                     # ISO 42001 AI governance report
|   +-- seng430-sprints/              # Sprint requirements from instructor
|   +-- qa/                           # QA test reports (PDF)
|   +-- reports/                      # Progress reports + screenshots
|
+-- jira/                             # Jira backlog documentation
|   +-- JIRA.md                       # Product backlog report
|   +-- SPRINT_1_TASK_BOARD.md        # Sprint 1 task breakdown
|
+-- local/                            # Local-only extensions
|   +-- model-arena/                  # Model Arena comparison feature
|       +-- arena/                    # Backend (router, service, schemas)
|       +-- frontend/                 # Frontend (ArenaPage, charts, hooks)
|
+-- .github/
|   +-- pull_request_template.md      # PR template linked to Jira
|   +-- workflows/deploy-hf.yml      # Auto-deploy to HuggingFace on release
|
+-- Dockerfile                        # Multi-stage build (Node + Python)
+-- docker-compose.yml                # Local development orchestration
+-- .dockerignore
+-- .gitignore
+-- CLAUDE.md                         # AI coding assistant context
+-- SETUP.md                          # Local development setup guide
+-- README.md

Live Demo & Docker

🌐 Live Demo

The application is deployed on HuggingFace Spaces — no installation required:

➡️ 0xbatuhan4-healthwithsevgi.hf.space

🐳 Docker (single command)

Pull and run the pre-built container image from GitHub Container Registry:

docker run -p 7860:7860 ghcr.io/eudalabs/healthwithsevgi:latest

Open http://localhost:7860 — that's it.

Alternatively, build from source:

git clone https://github.com/EudaLabs/HealthWithSevgi.git
cd HealthWithSevgi
docker build -t healthwithsevgi .
docker run -p 7860:7860 healthwithsevgi

Docker Compose (local development)

git clone https://github.com/EudaLabs/HealthWithSevgi.git
cd HealthWithSevgi
docker-compose up --build

This starts both the backend API and frontend dev server with hot-reload.

Quick Start

Prerequisites (for local development)

Tool	Version	Required For
Python	>= 3.10	Backend
Node.js	>= 18	Frontend
Git	latest	Version control

Local Development

Backend:

cd backend

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate        # macOS / Linux
# venv\Scripts\activate         # Windows

# Install dependencies
pip install -r requirements.txt

# Start the API server
uvicorn app.main:app --reload --port 8001

API docs available at: http://localhost:8001/docs (Swagger UI)

Frontend (in a separate terminal):

cd frontend

# Install dependencies
pnpm install

# Start the dev server
pnpm dev

App available at: http://localhost:5173 (proxies /api requests to port 8001)

Environment Variables

Create a .env file in the project root:

# Backend
BACKEND_PORT=8001
DEBUG=true

# Frontend (Vite uses VITE_ prefix)
VITE_API_URL=http://localhost:8001

API Reference

All endpoints are prefixed with /api. Full interactive documentation is available at /docs when the backend is running.

Specialties

Method	Endpoint	Description
`GET`	`/api/specialties`	List all 20 specialties
`GET`	`/api/specialties/{id}`	Get specialty details (description, features, clinical context)

Data

Method	Endpoint	Description
`POST`	`/api/explore`	Upload CSV or load built-in dataset; returns column stats + class distribution
`POST`	`/api/prepare`	Preprocess data (split, normalize, SMOTE); returns `session_id`

ML Training

Method	Endpoint	Description
`POST`	`/api/train`	Train a model; returns `model_id` + evaluation metrics
`POST`	`/api/compare/{model_id}`	Add model to comparison table
`GET`	`/api/compare/{session_id}`	Get all compared models for a session
`DELETE`	`/api/compare/{session_id}`	Clear comparison table
`GET`	`/api/models/{model_id}`	Get model metadata

Explainability

Method	Endpoint	Description
`GET`	`/api/explain/global/{model_id}`	Global feature importance (top 10 features + clinical names)
`GET`	`/api/explain/patient/{model_id}/{index}`	Single-patient SHAP waterfall explanation

Ethics & Certificate

Method	Endpoint	Description
`GET`	`/api/ethics/{model_id}`	Subgroup fairness audit + bias warnings + checklist
`POST`	`/api/ethics/checklist`	Update EU AI Act checklist item
`POST`	`/api/certificate`	Generate and download PDF compliance certificate

Health

Method	Endpoint	Description
`GET`	`/`	Status check (`{status: "ok"}`)
`GET`	`/health`	Health probe (`{status: "healthy"}`)

Testing

The project includes a comprehensive pytest suite covering all 7 steps of the pipeline — 178 tests across 6 test files.

cd backend

# Run all tests
pytest -v

# Run a specific test file
pytest -v tests/test_step1_clinical_context.py

# Run only slow tests (domain context validation)
pytest -v -m slow

Test coverage:

Test File	Covers	Key Assertions
`test_step1_clinical_context.py`	Specialty registry	All 20 specialties present, required fields non-empty, clinical context > 50 chars, 404 handling
`test_step2_data_exploration.py`	Data exploration	CSV upload validation, missing value detection, class distribution, imbalance warnings
`test_step3_data_preparation.py`	Preprocessing	Missing strategies (median/mode/drop), normalization, train/test split, SMOTE, data leakage prevention
`test_step6_explainability.py`	SHAP explanations	Global importance, patient explanation, What-If analysis, sample patient selection
`test_step7_ethics.py`	Fairness audit	Ethics endpoint, case study severity, checklist toggle, bias detection thresholds
`test_certificate.py`	PDF generation	Certificate content type, PDF magic bytes, checklist state persistence

Total: 178 tests — all passing.

Deployment

HuggingFace Spaces

The production deployment runs on HuggingFace Spaces as a Docker container. The multi-stage Dockerfile:

Stage 1 — Builds the React frontend with pnpm
Stage 2 — Installs Python dependencies
Stage 3 — Combines both into a slim Python 3.12 runtime serving the SPA + API on port 7860

hf-space/main_hf.py serves both the FastAPI backend and the static React build from a single process.

Live demo: 0xbatuhan4-healthwithsevgi.hf.space

Branch Strategy

Branch	Purpose
`main`	Production-ready, protected
`develop`	Integration branch for sprint work
`feature/US-XXX`	One branch per user story

Rules:

All changes go through Pull Requests (use the PR template)
PRs require at least 1 approval
main and develop are protected — no direct pushes
PR titles follow: feat/fix/docs(US-XXX): description

Team

Role	Name	Student ID
Product Owner + Developer	Efe Çelik	202128016
UX Designer	Burak Aydoğmuş	202128028
Lead Developer + Scrum Master	Batuhan Bayazıt	202228008
Developer	Berat Mert Gökkaya	202228019
QA / Documentation Lead	Berfin Duru Alkan	202228005

Links

Live Demo: 0xbatuhan4-healthwithsevgi.hf.space
Jira Board: Jira
Figma Designs: Figma
GitHub Wiki: Wiki
API Docs: http://localhost:8001/docs (when running locally)

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
.github		.github
backend		backend
docs		docs
frontend		frontend
hf-space		hf-space
jira		jira
local/model-arena		local/model-arena
.dockerignore		.dockerignore
.gitignore		.gitignore
ATTRIBUTION.md		ATTRIBUTION.md
CLAUDE.md		CLAUDE.md
DATA_LICENSES.md		DATA_LICENSES.md
Dockerfile		Dockerfile
README.md		README.md
SETUP.md		SETUP.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

HealthWithSevgi

Table of Contents

Overview

The 7-Step Pipeline

Supported Specialties

ML Models

Tech Stack

Architecture

Project Structure

Live Demo & Docker

🌐 Live Demo

🐳 Docker (single command)

Docker Compose (local development)

Quick Start

Prerequisites (for local development)

Local Development

Environment Variables

API Reference

Specialties

Data

ML Training

Explainability

Ethics & Certificate

Health

Testing

Deployment

HuggingFace Spaces

Branch Strategy

Team

Links

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages