TRACE-GFN: Transformer for Reaction-Aware Compound Exploration with GFlowNet in QSAR-Guided Molecular Design

TRACE-GFN is a generative flow network (GFlowNet) framework for designing drug-like molecules through interpretable chemical reaction pathways. Please refer to the paper for more detailed information.

Installation

The implementation was tested and confirmed to run on an Ubuntu operating system with Python 3.11 and CUDA version 12.1.

Quick Install

bash install.sh
source .venv/bin/activate

Manual Installation

# Install dependencies using uv
uv sync

# Activate virtual environment
source .venv/bin/activate

# Install PyTorch Geometric dependencies (CUDA 12.1)
uv pip install torch_scatter torch_sparse torch_cluster \
  -f https://data.pyg.org/whl/torch-2.1.2+cu121.html

For CPU-only Installation

If you don't have CUDA available, modify the PyTorch installation in pyproject.toml to use CPU-only versions.

Download Model Parameters

Please download trained weights for the Transformer from Figshare here, and place the weights as Transformer.pth in the src/gflownet/models/ckpts/Transformer/ directory. You can use the following command to download the weights directly:

wget -O src/gflownet/models/ckpts/Transformer/Transformer.pth https://ndownloader.figshare.com/files/46402633

Then, the directory substructure is as follows:

models/
└── ckpts/
    ├── GCN/
    │    └── GCN.pth
    └── Transformer/
         └── Transformer.pth

Usage

Basic Usage

Generate molecules optimized for DRD2 binding starting from a specific compound:

python -u src/gflownet/tasks/qsar_reactions.py \
  --protein_name "DRD2" \
  --init_compound_idx 1 \
  --condition 16.0 \
  --max_depth 5

Command-Line Arguments

Argument	Description	Default
`--protein_name`	Target protein: "DRD2", "AKT1", or "CXCR4"	Required
`--init_compound_idx`	Index of starting material	Required
`--condition`	Temperature parameter (higher = more exploitation)	16.0
`--max_depth`	Maximum number of reaction steps	5

Starting Materials

Starting materials are specified in SMILES format in:

src/gflownet/data/{PROTEIN_NAME}/init_compound_{IDX}.smi

For example, src/gflownet/data/DRD2/init_compound_1.smi contains:

OC1CCc2cc(F)ccc21

You can create custom starting materials by adding new .smi files with your desired SMILES strings.

Example Commands

DRD2 optimization with high exploration:

python -u src/gflownet/tasks/qsar_reactions.py \
  --protein_name "DRD2" \
  --init_compound_idx 1 \
  --condition 16.0 \
  --max_depth 5

AKT1 optimization with conservative exploration:

python -u src/gflownet/tasks/qsar_reactions.py \
  --protein_name "AKT1" \
  --init_compound_idx 6 \
  --condition 16.0 \
  --max_depth 5

CXCR4 optimization using probabilistic sampling:

python -u src/gflownet/tasks/qsar_reactions.py \
  --protein_name "CXCR4" \
  --init_compound_idx 11 \
  --condition 16.0 \
  --max_depth 5 \

Output

Training outputs are saved to:

./logs/{PROTEIN_NAME}_reactions_{TIMESTAMP}/

Monitoring Training

The training progress is logged to Weights & Biases (wandb) under the project {PROTEIN_NAME}_TRACER-GFN. Metrics include:

Rewards (binding affinity predictions)
Loss values (trajectory balance, GCN, Transformer)
Sampling diversity (unique molecule rate)
Training time and throughput

Configuration

Key hyperparameters can be modified in src/gflownet/config.py or via command-line arguments.

Supported Protein Targets

TRACE-GFN includes pre-trained QSAR models for three protein targets:

DRD2
- Relevant for: Antipsychotics, Parkinson's disease treatments
- QSAR model: src/gflownet/models/qsar_DRD2_optimized.pkl
AKT1
- Relevant for: Cancer therapies, metabolic disorders
- QSAR model: src/gflownet/models/qsar_AKT1_optimized.pkl
CXCR4
- Relevant for: HIV treatments, cancer metastasis inhibitors
- QSAR model: src/gflownet/models/qsar_CXCR4_optimized.pkl

Adding Custom Targets

To add a new protein target:

Train a QSAR model (e.g., using Morgan fingerprints and Random Forest)
Save the model as src/gflownet/models/qsar_{PROTEIN_NAME}_optimized.pkl
Update the protein name options in src/gflownet/tasks/qsar_reactions.py
Prepare starting materials in src/gflownet/data/{PROTEIN_NAME}/init_compound_*.smi

Reaction Templates

TRACE-GFN uses reaction templates derived from the USPTO dataset:

Template library: src/gflownet/data/label_template.json (1000 templates)
Training data: src/gflownet/data/USPTO/ (tokenized reaction examples)

Reaction templates are represented as SMARTS patterns that define molecular transformations. The GCN learns to predict which templates are applicable to each molecule based on structural features.

Project Structure

TRACE-GFN/
├── src/gflownet/
│   ├── tasks/
│   │   └── qsar_reactions.py       # Main entry point
│   ├── models/
│   │   ├── GCN/                    # Graph convolution network
│   │   ├── Transformer/            # Product generation model
│   │   ├── mlp.py                  # Partition function predictor
│   │   └── qsar_*.pkl              # Pre-trained QSAR models
│   ├── algo/
│   │   ├── trajectory_balance_synthesis.py  # Training objective
│   │   └── reaction_sampling.py    # Trajectory generation
│   ├── data/
│   │   ├── DRD2/, AKT1/, CXCR4/   # Protein-specific data
│   │   ├── USPTO/                  # Reaction templates
│   │   └── sampling_iterator.py    # Data loading
│   ├── trainer.py                  # Base trainer class
│   ├── online_trainer.py           # Online training implementation
│   └── config.py                   # Configuration dataclasses
├── install.sh                      # Installation script
└── pyproject.toml                  # Dependencies

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
logs		logs
src/gflownet		src/gflownet
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TRACE-GFN: Transformer for Reaction-Aware Compound Exploration with GFlowNet in QSAR-Guided Molecular Design

Installation

Quick Install

Manual Installation

For CPU-only Installation

Download Model Parameters

Usage

Basic Usage

Command-Line Arguments

Starting Materials

Example Commands

Output

Monitoring Training

Configuration

Supported Protein Targets

Adding Custom Targets

Reaction Templates

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TRACE-GFN: Transformer for Reaction-Aware Compound Exploration with GFlowNet in QSAR-Guided Molecular Design

Installation

Quick Install

Manual Installation

For CPU-only Installation

Download Model Parameters

Usage

Basic Usage

Command-Line Arguments

Starting Materials

Example Commands

Output

Monitoring Training

Configuration

Supported Protein Targets

Adding Custom Targets

Reaction Templates

Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages