Skip to content

Yilong-sudo/TR2-DDI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

TR2-DDI

TR2-DDI is a drug-drug interaction prediction project built around three feature streams:

  • fingerprint-based drug similarity features
  • molecular encoder similarity features
  • MoleculeSTM-derived LLM feature similarity

The model fuses the feature streams for each drug pair, combines them with relation embeddings, and predicts DDI event labels with a residual MLP scorer.

Graphical Abstract

Graphical Abstract

Overview

Overview

Repository Layout

drugbank/      DrugBank experiments and preprocessing pipeline
twosides/      TWOSIDES experiments and preprocessing pipeline
deng/          DrugBank experimental copy

Each dataset directory contains its own model, dataset loader, preprocessing script, training script, and logger utilities.

Environment

The code expects Python 3.10+ and a PyTorch environment with CUDA for the default training scripts.

Main Python packages:

  • torch
  • numpy
  • pandas
  • scikit-learn
  • tqdm
  • rdkit
  • subword-nmt

Install the dependencies in your preferred environment before running preprocessing or training.

Data Preparation

Run preprocessing from inside the dataset directory.

DrugBank:

cd drugbank
python data_preprocessing.py -d drugbank -o all

TWOSIDES:

cd twosides
python data_preprocessing.py -d twosides -o all

The preprocessing pipeline writes generated feature files and train/test splits under data/preprocessed/.

Training

DrugBank:

cd drugbank
python train.py --fold 0 --epochs 100 --batch_size 256 --hidden_size 384

TWOSIDES:

cd twosides
python train.py --fold 0 --epochs 200 --batch_size 256 --hidden_size 384

Use --save_model to write model checkpoints under save/. Use --load_model path/to/checkpoint.pt to initialize from a compatible checkpoint.

Molecular Pretraining

If you need to regenerate the molecular encoder checkpoint:

cd drugbank
python train_pretrain.py --data_path pretrain_data.csv --save_path pretrained_molecular_model.pth

The same pretraining scripts are available under the other dataset directories.

Notes

Generated caches, logs, checkpoints, and preprocessed binary artifacts are excluded by .gitignore. Keep large datasets and trained weights outside normal source commits unless they are intentionally released through Git LFS or a separate artifact link.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages