A production-ready machine learning pipeline for detecting sleep apnea from ECG signals using real clinical data from PhysioNet's Apnea-ECG Database.
Author: Brian Smith
Year: 2026
This project implements a complete end-to-end system for sleep apnea detection, including:
- Data ingestion from PhysioNet databases
- Signal quality assessment and automated quality control
- Feature extraction (HRV time/frequency domain, QRS morphology)
- Machine learning models for apnea detection
- Patient-level AHI estimation from minute-level predictions
- Comprehensive evaluation framework
β
Real Clinical Data: 70 overnight ECG recordings from PhysioNet
β
Automated Quality Control: Signal quality classifier with gating
β
Rich Feature Engineering: 36 HRV and QRS features per minute
β
Production-Ready: Modular architecture, model persistence, comprehensive testing
β
Transparent Evaluation: AUROC, sensitivity, specificity, calibration analysis
| Metric | Value |
|---|---|
| AUROC | 0.755 |
| Sensitivity | 55.8% |
| Specificity | 79.1% |
| Accuracy | 72.7% |
| AHI Correlation | +0.565 |
| AHI MAE | 11.81 events/hour |
Patient a16: True AHI=39.8 β Predicted=39.3 (Error: -0.5) β
Patient a18: True AHI=53.3 β Predicted=54.2 (Error: +0.9) β
Patient b03: True AHI=9.9 β Predicted=6.5 (Error: -3.4) β
Patient b05: True AHI=7.9 β Predicted=9.1 (Error: +1.2) β
- Python 3.8+
- pip
- Install dependencies
pip install -r requirements.txt- Download PhysioNet data
python scripts/download_physionet.pyThis will download 70 ECG recordings (~500MB) to ../data/raw/.
Quick training (5 patients, ~2 minutes):
python scripts/train_apnea_model.pyFull training (22 patients, balanced, ~10 minutes):
python scripts/train_balanced_model.pyModel comparison (Logistic vs Random Forest):
python scripts/compare_models_no_xgb.pyfrom models.baseline import ApneaClassifier
from features.signal_quality import SignalQualityMetrics
# Load trained model
model = ApneaClassifier.load('apnea_model_balanced.pkl')
# Check signal quality
qm = SignalQualityMetrics(fs=100)
quality = qm.comprehensive_quality_report(ecg_signal)
print(f"Signal Quality: {quality['quality_grade']} ({quality['overall_quality_score']:.1f}/100)")
# Make predictions (if quality is good)
if quality['overall_quality_score'] >= 60:
predictions = model.predict_proba(features)
ahi_estimate = (predictions >= 0.5).sum() / (len(predictions) / 60)
print(f"Estimated AHI: {ahi_estimate:.1f} events/hour")sleepsignalops/
βββ ingestion/ # Data loading modules
β βββ wfdb_loader.py # Generic WFDB loader
β βββ physionet_apnea.py # PhysioNet-specific loader
β βββ csv_loader.py # CSV time-series loader
βββ features/ # Feature extraction
β βββ signal_quality.py # Quality metrics (SNR, flatline, etc.)
β βββ rr_intervals.py # HRV features (time & frequency)
β βββ qrs_amplitude.py # QRS morphology features
β βββ demographic_features.py
βββ models/ # ML models
β βββ baseline.py # Logistic/RF/XGBoost classifiers
β βββ quality_classifier.py
βββ evaluation/ # Evaluation metrics
β βββ metrics.py # AUROC, sensitivity, specificity
β βββ calibration.py # ECE, Brier score
β βββ subgroup.py # Quality-stratified analysis
β βββ patient_level.py # AHI estimation
βββ serving/
β βββ quality_gate.py # Quality-gated inference
βββ scripts/ # Training & testing scripts
β βββ download_physionet.py
β βββ validate_pipeline.py
β βββ test_feature_extraction.py
β βββ test_quality_gate.py
β βββ train_apnea_model.py
β βββ train_balanced_model.py # β
Best model
β βββ compare_models_no_xgb.py
βββ tests/ # Unit tests
β βββ test_signal_quality.py
βββ requirements.txt
βββ LICENSE
βββ README.md
- Data Ingestion: Load ECG signals and apnea annotations from PhysioNet
- Signal Quality Assessment: Compute SNR, flatline %, outliers, artifacts
- Feature Extraction: Extract HRV and QRS features in 60-second windows
- Quality Gating: Reject predictions on poor-quality signals
- Apnea Classification: Predict apnea minute-by-minute
- AHI Estimation: Aggregate to patient-level AHI
HRV Time-Domain (5 features):
- SDNN, RMSSD, pNN50, mean HR, CV
HRV Frequency-Domain (5 features):
- VLF, LF, HF power, total power, LF/HF ratio
QRS Amplitude (26 features):
- R-peak amplitude (mean, std, min, max, CV)
- QRS width, area, morphology variability
- Signal Quality Classifier: Random Forest (100% accuracy)
- Apnea Detector: Random Forest (AUROC: 0.755)
- Alternative: Logistic Regression, XGBoost (optional)
python scripts/validate_pipeline.pypython scripts/test_feature_extraction.pypython scripts/test_quality_gate.pyfrom models.baseline import ApneaClassifier
import numpy as np
# Load model
model = ApneaClassifier.load('apnea_model_balanced.pkl')
# Your features (36 features per minute)
features = np.array([...]) # Shape: (n_minutes, 36)
# Predict
apnea_probabilities = model.predict_proba(features)
apnea_predictions = (apnea_probabilities >= 0.5).astype(int)
# Estimate AHI
duration_hours = len(features) / 60
ahi = apnea_predictions.sum() / duration_hours
print(f"Estimated AHI: {ahi:.1f} events/hour")-
Feature Availability
- Only ECG/HRV features available
- Missing SpO2 (oxygen desaturation - strongest apnea indicator)
- Missing respiratory effort signals
- Missing sleep stage information
-
Model Performance
- AHI MAE: 11.81 events/hour (moderate error)
- Severity agreement: 38.5% (room for improvement)
- Some patients have large prediction errors
-
Clinical Use
- NOT approved for clinical use
- Research and educational purposes only
- Clinical systems use multi-modal signals (ECG + SpO2 + respiratory)
Sleep apnea detection from ECG alone is inherently limited because:
- Apnea is primarily a respiratory event (breathing cessation)
- ECG captures cardiac response (secondary effect)
- SpO2 desaturation is the gold standard indicator
- Clinical systems use ECG + SpO2 + respiratory effort + sleep staging
This project demonstrates best practices for ECG-only detection but acknowledges these fundamental limitations.
-
Add SpO2 Features (if data available)
- Oxygen desaturation detection
- Desaturation index calculation
- Would significantly improve performance
-
Episode-Level Detection
- Detect apnea episodes (sequences) instead of minutes
- More clinically meaningful
- Better signal-to-noise ratio
-
Deep Learning Models
- CNN-BiLSTM for temporal patterns
- Transformer with attention
- Learn features automatically from raw signals
-
Cross-Validation
- K-fold CV on all 70 records
- More robust performance estimates
-
Threshold Optimization
- ROC curve analysis
- Find optimal operating point
-
Calibration Improvement
- Platt scaling or isotonic regression
-
API Deployment
- FastAPI endpoints
- Docker containerization
- Streamlit dashboard
-
Additional Datasets
- Test on other PhysioNet databases
- Cross-dataset validation
Run unit tests:
pytest tests/Run specific test:
pytest tests/test_signal_quality.py -v- PhysioNet Apnea-ECG Database
- https://physionet.org/content/apnea-ecg/1.0.0/
- 70 overnight ECG recordings
- Expert-annotated apnea events
- Goldberger et al. (2000)
- Penzel T, et al. "The Apnea-ECG Database" (2000)
- Task Force of ESC/NASPE. "Heart rate variability" (1996)
- Mendez MO, et al. "Sleep apnea screening by autoregressive models" (2007)
- Python 3.8+
- scikit-learn, pandas, numpy
- WFDB Python package
- PhysioNet databases
This project is licensed under the MIT License - see the LICENSE file for details.
- PhysioNet for providing the Apnea-ECG Database
- WFDB team for the excellent Python package
- scikit-learn community for ML tools
If you use this code in your research, please cite:
@software{sleepsignalops2026,
title={SleepSignalOps: Sleep Apnea Detection from ECG Signals},
author={Brian Smith},
year={2026},
url={https://github.com/B3smoove/SleepSignalOps}
}Built with real data. Evaluated rigorously. Documented thoroughly. π