This project is a hybrid, explainable, end-to-end toxicity prediction system using:
- Classical chemical descriptors + Morgan fingerprints
- Graph neural networks (GNN) on molecular graphs
- Ensemble learning for performance and robustness
- Explainability with SHAP + substructure highlighting
- Streamlit demo UI for real-time prediction and insight
- Create and activate a Python environment (3.10+).
- Install dependencies:
pip install -r requirements.txt-
Prepare data:
- Place Tox21 CSV in
data/tox21.csvwith at least:smilesand all 12 endpoint labels:NR-AR,NR-AR-LBD,NR-AhR,NR-Aromatase,NR-ER,NR-ER-LBD,NR-PPAR-gamma,SR-ARE,SR-ATAD5,SR-HSE,SR-MMP,SR-p53.
- Place Tox21 CSV in
-
Train:
python train.py --config configs/config.yaml- Run the app:
streamlit run app/streamlit_app.pydata/raw datasetsfeatures/feature generationmodels/baseline models, GNN, ensembleexplainability/SHAP + GNN explanationsapp/Streamlit demoutils/chemistry utilities, loggingconfigs/experiment configoutputs/trained artifacts and metrics
- Descriptors capture global physicochemical properties (MW, logP, TPSA, HBD/HBA) strongly linked to membrane permeability, bioavailability, and off-target toxicity.
- Morgan fingerprints encode substructures that can reveal toxicophores.
- Graph neural networks model local atomic environments and bond topology for mechanistic signal.
- Ensembles reduce variance and improve generalization across chemical space.
- Reliability diagrams:
python calibration.py --config configs/config.yaml- Ablation study:
python ablation.py --config configs/config.yaml- GNN training uses PyTorch + PyTorch Geometric. If not installed, the pipeline still runs baseline + ensemble.
- All explainability modules are designed to be lightweight and hackathon-demo friendly.