HealthPlatform 🫀

Intelligent health analytics from your Apple Watch — powered by LSTM + XGBoost + GPT-4o

HealthPlatform is a full-stack AI health analytics application that ingests personal wearable data from Apple Watch, processes it through a multi-stage machine learning pipeline, and delivers personalised insights through an interactive Bento Grid dashboard and a GPT-4o conversational assistant.

✨ Features

Feature	Description
📱 Apple Health parser	Streaming XML parser handles `export.xml` files up to 800 MB without memory issues
🔄 Universal pipeline	Auto-detects columns from any health CSV via fuzzy matching (SequenceMatcher ≥ 75%)
🤖 Hybrid ML model	Bidirectional LSTM + XGBoost ensemble — 88.4% accuracy on real Apple Watch data
🛡️ Data validation	4-layer validation system with Trust Score (0–1) blocks bad data before prediction
💬 GPT-4o assistant	Health context + mood journal injected into LLM for personalised conversational advice
📊 Bento Grid dashboard	Glassmorphism UI with expandable bar charts, Activity Rings, BMI scale, and insights
👤 User profile	Body metrics, fitness goals, allergies, emotional wellness journal
💡 Health Tips	15 personalised tips across 5 categories with completion tracking
🔐 JWT auth	Secure login with bcrypt passwords and 7-day token sessions

🖥️ Screenshots

Dashboard · Import · Profile · Health Tips

🏗️ Architecture

HealthPlatform/
├── app/
│   ├── api/
│   │   └── routes/
│   │       ├── upload.py       
│   │       ├── predict.py      
│   │       ├── analysis.py    
│   │       ├── chat.py        
│   │       └── auth.py        
│   ├── core/
│   │   ├── parser.py           
│   │   ├── universal_data_pipeline.py 
│   │   └── data_validator.py  
│   ├── models/
│   │   ├── lstm_model.py      
│   │   ├── xgb_model.py      
│   │   └── hybrid_model.py    
│   └── services/
│       ├── auth_service.py    
│       └── gpt_service.py      
├── frontend/
│   └── src/
│       └── App.jsx            
├── data/
│   └── parsed_health_120days.csv 
└── run.py

🔄 Data Pipeline

Apple Health XML / CSV
        │
        ▼
┌───────────────┐
│  1. Parse     │  Streaming iterparse — constant ~50 MB memory for 800 MB files
└──────┬────────┘
       │
       ▼
┌───────────────┐
│  2. Validate  │  4-layer check → Trust Score 0.0–1.0
│               │  Types · Ranges · Semantics · Identity
└──────┬────────┘
       │  trust < 0.45 → BLOCKED
       ▼
┌───────────────┐
│  3. Pipeline  │  Schema detect → Column map (fuzzy) → Feature engineer
└──────┬────────┘
       │
       ▼
┌───────────────────────────────────┐
│  17-dimensional feature vector    │
│  HR · Steps · Sleep · Energy ·    │
│  BMI · fatigue_index · ...        │
└──────┬────────────────────────────┘
       │
       ├──────────────────┐
       ▼                  ▼
┌────────────┐    ┌────────────────┐
│  XGBoost   │    │  LSTM (30h     │
│  snapshot  │    │  window)       │
│  84.7%     │    │  86.2%         │
└──────┬─────┘    └──────┬─────────┘
       │                  │
       └────────┬─────────┘
                ▼
       ┌────────────────┐
       │  Hybrid 70/30  │  88.4% accuracy · F1 = 0.89
       └────────┬───────┘
                │
                ▼
       Dashboard + GPT-4o Chat

🧠 Multimodal Data Analysis

HealthPlatform fuses 6 data modalities into a unified 17-dimensional feature vector:

Modality	Signals	Role
Physiological	HeartRate, HRV, BloodOxygen	Primary LSTM temporal input
Activity	Steps, ActiveEnergy, Distance	XGBoost top features
Sleep	Awake / Light / REM / Deep	Encoded as `sleep_stage` + `fatigue_index`
Body composition	Weight, Height, BMI	Forward-filled, personalises advice
Workout events	Type, Duration, Energy	Encoded as `workout_encoded`, `is_workout`
User-reported	Mood emoji, journal, goal	Injected into GPT-4o system prompt

The last modality (qualitative) is not fed into the ML model — instead it enables language-model-level multimodal reasoning: physiological data + mood state → personalised natural language advice.

🛡️ Data Validation — Trust Score

Every uploaded file passes through 4 validation layers before the model runs:

Layer 1 — Type validation     (weight 20%)
  • All numeric columns parseable
  • datetime column valid

Layer 2 — Range validation     (weight 30%)
  • HeartRate: 28–220 bpm
  • BloodOxygen: 70–100 %
  • Steps: 0–80,000
  • Weight: 20–300 kg
  • ... 10 more physiological bounds

Layer 3 — Semantic validation  (weight 35%)
  • Minimum 24 rows (hourly records)
  • At least 6 distinct hours
  • HR standard deviation > 2 bpm
  • No impossible HR jumps > 60 bpm

Layer 4 — Identity check       (weight 15%)
  • Rejects research datasets (patient_id, os_months, fiber, collagen…)
  • Rejects multi-user data (> 5 unique IDs)

Trust Score = weighted average of all layers (0.0 – 1.0)
  ≥ 0.80  →  high    — full confidence predictions
  ≥ 0.60  →  medium  — predictions with warnings
  ≥ 0.45  →  low     — predictions with capped confidence
  < 0.45  →  BLOCKED — model does not run

📊 Model Performance

Dataset	Model	Accuracy	F1 Macro
Synthetic (15,000 rows)	Hybrid	99.8%	0.99
Weka Apple Watch (24 people)	LSTM	86.2%	0.87
Weka Apple Watch (24 people)	XGB	84.7%	0.85
Weka Apple Watch (24 people)	Hybrid	88.4%	0.89
Stress test (15% missing data)	Hybrid	43.9%	0.43

🚀 Getting Started

Prerequisites

Python 3.10+
Node.js 18+
OpenAI API key (for chat feature)

Installation

# 1. Clone
git clone https://github.com/yourusername/HealthPlatform.git
cd HealthPlatform

# 2. Backend
pip install -r requirements.txt

# 3. Environment
cp .env.example .env
# Edit .env and add your OpenAI API key

# 4. Frontend
cd frontend
npm install
cd ..

Running

# Terminal 1 — Backend (port 8000)
python run.py

# Terminal 2 — Frontend (port 5173)
cd frontend && npm run dev

Open http://localhost:5173

Environment variables

SECRET_KEY=your-secret-key-here
OPENAI_API_KEY=sk-...
DATABASE_URL=sqlite:///./users.db
SEQUENCE_LENGTH=30

📡 API Endpoints

Method	Endpoint	Description
`POST`	`/api/v1/auth/register`	Create account → JWT token
`POST`	`/api/v1/auth/login`	Sign in → JWT token
`PATCH`	`/api/v1/auth/update`	Update name / email
`POST`	`/api/v1/auth/change-password`	Change password (bcrypt)
`POST`	`/api/v1/upload`	Upload health file → validate → parse
`GET`	`/api/v1/analysis/{user_id}`	Health stats + chart data
`POST`	`/api/v1/predict/{user_id}`	Run LSTM + XGB → activity prediction
`POST`	`/api/v1/chat`	GPT-4o health assistant
`GET`	`/health`	Server health check

Interactive docs: http://localhost:8000/docs

🧪 Supported File Formats

Format	Parser	Contents
`export.xml`	Streaming iterparse (800 MB+)	All Apple Health metrics
`export_cda.xml`	Regex chunked (multi-root XML)	HR, Weight, Height (HL7/CDA)
`*.csv`	pandas + fuzzy column mapping	Any smartwatch export

The universal pipeline accepts columns named HeartRate, heart_rate, Heart Rate (BPM), HR — fuzzy matching handles any naming convention with ≥ 75% similarity.

🛠️ Tech Stack

Layer	Technology
ML — Sequential	TensorFlow / Keras — Bidirectional LSTM
ML — Tabular	XGBoost
ML — Preprocessing	scikit-learn StandardScaler
Backend	FastAPI + Uvicorn
Auth	python-jose (JWT) + bcrypt
Database	SQLAlchemy + SQLite
AI Chat	OpenAI API — GPT-4o
Frontend	React 18 + Vite
Styling	Glassmorphism + Bento Grid + Playfair Display / Nunito
XML Parsing	xml.etree.ElementTree iterparse
Data	pandas + numpy

📁 Data

The repository includes data/parsed_health_120days.csv — 2,802 real hourly records from Apple Watch (December 2025 – April 2026) used for model fine-tuning and evaluation.

⚠️ This file contains anonymised personal health data. Do not use it for purposes other than model evaluation.

🗺️ Roadmap

Fine-tune models on individual user data for personalisation
Data Mining module — clustering, anomaly detection, weekly patterns
Weekly trend analysis with longitudinal LSTM window
Docker + cloud deployment (AWS / GCP) with PostgreSQL
Mobile PWA with push notifications for daily health tips
Sleep stage detailed breakdown and sleep debt tracker

📄 License

MIT License — see LICENSE for details.

👤 Author

Nataliia Nechyporenko

GitHub: @Ffightik
Email: [@nechyporenkonatalia31@gmail.com]

References

[1]-Smartwatch Health Data (Uncleaned), Mohammed Arfath R,2025. URL: https://www.kaggle.com/datasets/mohammedarfathr/smartwatch-health-data�uncleaned

[2] - Fitbit Fitness Tracker Dataset, Nneka Ekwemuk, 2023. URL: https://www.kaggle.com/datasets/nnekaekwemuka/fitbit-fitness-tracker-dataset

[3] - - iPhone User Guide, Download health records on iPhone, Apple Inc.,2026.URL: https://support.apple.com/en-om/guide/iphone/iphc30019594/ios

[4] - iPhone User Guide, Share your data in Health on iPhone, Apple Inc., 2026.URL:https://support.apple.com/en-om/guide/iphone/iph5ede58c3d/ios

_{Built with ❤️ and a lot of Apple Watch data}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
.idea		.idea
app		app
data		data
frontend		frontend
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HealthPlatform 🫀

✨ Features

🖥️ Screenshots

🏗️ Architecture

🔄 Data Pipeline

🧠 Multimodal Data Analysis

🛡️ Data Validation — Trust Score

📊 Model Performance

🚀 Getting Started

Prerequisites

Installation

Running

Environment variables

📡 API Endpoints

🧪 Supported File Formats

🛠️ Tech Stack

📁 Data

🗺️ Roadmap

📄 License

👤 Author

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HealthPlatform 🫀

✨ Features

🖥️ Screenshots

🏗️ Architecture

🔄 Data Pipeline

🧠 Multimodal Data Analysis

🛡️ Data Validation — Trust Score

📊 Model Performance

🚀 Getting Started

Prerequisites

Installation

Running

Environment variables

📡 API Endpoints

🧪 Supported File Formats

🛠️ Tech Stack

📁 Data

🗺️ Roadmap

📄 License

👤 Author

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages