TR2-DDI

TR2-DDI is a drug-drug interaction prediction project built around three feature streams:

fingerprint-based drug similarity features
molecular encoder similarity features
MoleculeSTM-derived LLM feature similarity

The model fuses the feature streams for each drug pair, combines them with relation embeddings, and predicts DDI event labels with a residual MLP scorer.

Graphical Abstract

Overview

Repository Layout

drugbank/      DrugBank experiments and preprocessing pipeline
twosides/      TWOSIDES experiments and preprocessing pipeline
deng/          DrugBank experimental copy

Each dataset directory contains its own model, dataset loader, preprocessing script, training script, and logger utilities.

Environment

The code expects Python 3.10+ and a PyTorch environment with CUDA for the default training scripts.

Main Python packages:

torch
numpy
pandas
scikit-learn
tqdm
rdkit
subword-nmt

Install the dependencies in your preferred environment before running preprocessing or training.

Data Preparation

Run preprocessing from inside the dataset directory.

DrugBank:

cd drugbank
python data_preprocessing.py -d drugbank -o all

TWOSIDES:

cd twosides
python data_preprocessing.py -d twosides -o all

The preprocessing pipeline writes generated feature files and train/test splits under data/preprocessed/.

Training

DrugBank:

cd drugbank
python train.py --fold 0 --epochs 100 --batch_size 256 --hidden_size 384

TWOSIDES:

cd twosides
python train.py --fold 0 --epochs 200 --batch_size 256 --hidden_size 384

Use --save_model to write model checkpoints under save/. Use --load_model path/to/checkpoint.pt to initialize from a compatible checkpoint.

Molecular Pretraining

If you need to regenerate the molecular encoder checkpoint:

cd drugbank
python train_pretrain.py --data_path pretrain_data.csv --save_path pretrained_molecular_model.pth

The same pretraining scripts are available under the other dataset directories.

Notes

Generated caches, logs, checkpoints, and preprocessed binary artifacts are excluded by .gitignore. Keep large datasets and trained weights outside normal source commits unless they are intentionally released through Git LFS or a separate artifact link.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
drugbank		drugbank
fig		fig
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TR2-DDI

Graphical Abstract

Overview

Repository Layout

Environment

Data Preparation

Training

Molecular Pretraining

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TR2-DDI

Graphical Abstract

Overview

Repository Layout

Environment

Data Preparation

Training

Molecular Pretraining

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages