Venus-ProSST

Code for ProSST: A Pre-trained Protein Sequence and Structure Transformer with Disentangled Attention. (NeurIPS 2024)

News

Our MSA-Enhanced model ProtREM has achieved 0.518 Spearman's rho in the ProteinGym benchmark.

1 Install

git clone https://github.com/Cassis-P/ProSST/raw/refs/heads/main/zero_shot/SST_Pro_3.6.zip
cd ProSST
pip install -r https://github.com/Cassis-P/ProSST/raw/refs/heads/main/zero_shot/SST_Pro_3.6.zip
export PYTHONPATH=$PYTHONPATH:$(pwd)

2 Structure quantizer

from https://github.com/Cassis-P/ProSST/raw/refs/heads/main/zero_shot/SST_Pro_3.6.zip import PdbQuantizer
processor = PdbQuantizer(structure_vocab_size=2048) # can be 20, 128, 512, 1024, 2048, 4096
result = processor("https://github.com/Cassis-P/ProSST/raw/refs/heads/main/zero_shot/SST_Pro_3.6.zip", return_residue_seq=False)

Output:

[407, 998, 1841, 1421, 653, 450, 117, 822, ...]

3 ProSST models have been uploaded to huggingface 🤗 Transformers

from transformers import AutoModelForMaskedLM, AutoTokenizer
model = https://github.com/Cassis-P/ProSST/raw/refs/heads/main/zero_shot/SST_Pro_3.6.zip("AI4Protein/ProSST-2048", trust_remote_code=True)
tokenizer = https://github.com/Cassis-P/ProSST/raw/refs/heads/main/zero_shot/SST_Pro_3.6.zip("AI4Protein/ProSST-2048", trust_remote_code=True)

See AI4Protein/ProSST-* for more models.

4 Zero-shot mutant effect prediction

4.1 Example notebook

Zero-shot mutant effect prediction

4.2 Run ProteinGYM Benchmark

Download dataset from Google Driver. (This file contains quantized structures within ProteinGYM).

cd example_data
unzip https://github.com/Cassis-P/ProSST/raw/refs/heads/main/zero_shot/SST_Pro_3.6.zip

python https://github.com/Cassis-P/ProSST/raw/refs/heads/main/zero_shot/SST_Pro_3.6.zip --model_path AI4Protein/ProSST-2048 \
--structure_dir example_data/structure_sequence/2048

Citation

If you use ProSST in your research, please cite the following paper:

@inproceedings{
li2024prosst,
title={ProSST: Protein Language Modeling with Quantized Structure and Disentangled Attention},
author={Mingchen Li and Yang Tan and Xinzhu Ma and Bozitao Zhong and Huiqun Yu and Ziyi Zhou and Wanli Ouyang and Bingxin Zhou and Pan Tan and Liang Hong},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024}
}

This project is licensed under the terms of the CC-BY-NC-ND-4.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
example_data		example_data
images		images
prosst/structure		prosst/structure
test		test
zero_shot		zero_shot
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Venus-ProSST

News

1 Install

2 Structure quantizer

3 ProSST models have been uploaded to huggingface 🤗 Transformers

4 Zero-shot mutant effect prediction

4.1 Example notebook

4.2 Run ProteinGYM Benchmark

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Venus-ProSST

News

1 Install

2 Structure quantizer

3 ProSST models have been uploaded to huggingface 🤗 Transformers

4 Zero-shot mutant effect prediction

4.1 Example notebook

4.2 Run ProteinGYM Benchmark

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages