A comprehensive research codebase for modelling and evaluating time-series models for neonatal sepsis detection.
Features Federated Learning simulations, Secure Aggregation, and Transformer-based architectures.
Neonatal-Sepsis addresses the critical challenge of early sepsis detection in neonates using time-series clinical data. This repository implements a complete pipeline that allows researchers to:
- Preprocess raw clinical logs (pipe-separated values) into deep-learning-ready tensors.
- Train state-of-the-art baselines (Transformers, GRU-D for missing data).
- Simulate a Federated Learning environment to preserve patient privacy.
- Visualize predictions via an interactive web dashboard.
- Key Features
- Repository Structure
- System Architecture
- Dataset & Format
- Installation
- Usage
- Evaluation Results
- Contributors
- Contact
| Component | Description |
|---|---|
| Preprocessing | Parallelized pipeline converting .psv to .pt objects. |
| Model | Includes Transformers and GRU-D (handling missingness via decay). |
| Federated Learning | Simulation of Server-Client architecture with local networking. |
| Privacy PoC | Secure Aggregation Proof-of-Concept using additive masking. |
| Visualization | Complete dashboard for AUROC/AUPRC metrics and real-time inference. |
Neonatal-Sepsis/
├── app.py # Streamlit entry point
├── app_pages/ # Dashboard UI pages
│ ├── 1_Project_Summary.py
│ ├── 2_Predict.py
│ └── 3_Model_Metrics.py
├── src/
│ ├── parallel_preprocess.py # Data cleaning pipeline
│ ├── model.py # Transformer Architecture
│ ├── model_grud.py # GRU-D Architecture
│ ├── fl_server.py # Federated Server Logic
│ ├── fl_client.py # Federated Client Logic
│ └── secure_agg_poc.py # Privacy Preservation Logic
├── data/ # Dataset storage (Gitignored)
└── requirements.txt # Python dependencies
graph TD
A[Raw Clinical Data .psv] -->|Parallel Preprocess| B(PyTorch Tensors .pt)
B --> C{Training Mode}
C -->|Local| D[Train Baseline<br>Transformer / GRU-D]
C -->|Federated| E[FL Simulation]
E --> F[Server Aggregation]
E --> G[Client Updates]
D --> H[Evaluation & Metrics]
F --> H
H --> I[Streamlit Dashboard]
The pipeline expects Pipe-Separated Values (.psv). Each file represents one patient encounter.
- Location: Place raw files in
data/raw/(e.g.,data/raw/patient_01.psv). - Key Columns:
HR,O2Sat,Temp,SBP,MAP,Resp,Lactate,Age,HospAdmTime. - Target:
SepsisLabel(Binary: 0 or 1).
Note: The
parallel_preprocess.pyscript automatically handlesNaNvalues and generates masking features required for the GRU-D model.
- Python 3.8+
- CUDA (Optional, for GPU acceleration)
git clone [https://github.com/pranay9981/Neonatal-Sepsis.git](https://github.com/pranay9981/Neonatal-Sepsis.git)
cd Neonatal-Sepsis
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\Activate.ps1
# Install dependencies
pip install -r requirements.txt
python src/parallel_preprocess.py \
--raw_folder data/raw \
--out_folder data/processed/patients \
--seq_len 48 \
--nprocs 8
Access the prediction interface locally:
streamlit run app.py
*Or visit the live deployment: neonatal-sepsis.streamlit.app*
Local Transformer Baseline:
python src/train_local.py --index data/processed/patients/index.pt --model transformer
Federated Simulation (Server):
python src/fl_server.py --model transformer --rounds 5 --min_clients 2
The table below summarizes the performance metrics of our Global Best (Federated) model compared to the Model Best (Local) baseline.
| Model | AUROC | AUPRC | Accuracy | F1-Score | Precision | Recall |
|---|---|---|---|---|---|---|
| Global Best | 0.894 | 0.567 | 0.947 | 0.579 | 0.712 | 0.487 |
| Model Best | 0.829 | 0.410 | 0.739 | 0.299 | 0.187 | 0.749 |
This project is developed and maintained by:
- Pranay Bagaria - Maintainer
- Ninad Amane - Collaborator
- Rakshak - Collaborator
- Kushagra - Collaborator
If you encounter any bugs or have feature requests, please open an issue on our GitHub Issues page.
Distributed under the MIT License. See LICENSE for more information.