Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
8b18680
added lassa and nipahvirus data and score calculations, reordered sit…
nikkithadani Apr 17, 2023
d0b1f25
update gisaid data
sarahgurev Apr 18, 2023
8238e4f
update gisaid mutation results
sarahgurev Apr 18, 2023
144665f
Update README.md
sarahgurev Apr 18, 2023
5b15903
Update README.md
sarahgurev Apr 18, 2023
c73ba23
Update EVEscape summary figure
sarahgurev Apr 18, 2023
1fdd415
Update README.md
sarahgurev Apr 19, 2023
c1408ec
Update README.md
sarahgurev Apr 19, 2023
d8e5bbf
Update README.md
sarahgurev Apr 19, 2023
e65fda7
Update README.md
sarahgurev Apr 19, 2023
2c4387d
requirements
sarahgurev Apr 19, 2023
b03b2c5
Merge branch 'main' of https://github.com/OATML-Markslab/EVEscape int…
sarahgurev Apr 19, 2023
74d3836
Update README.md
sarahgurev Apr 19, 2023
5bdb8e3
Update README.md
sarahgurev Apr 19, 2023
29a3315
Update README.md
sarahgurev Apr 19, 2023
cb2d135
Update README.md
sarahgurev Apr 19, 2023
f8014b5
Update README.md
sarahgurev Apr 19, 2023
9d815a8
Update README.md
sarahgurev Apr 19, 2023
f6b8e57
updated GISAID processing code to drop sequences with >2 month gap be…
nikkithadani Apr 19, 2023
5c34a87
added neutralization experiment data from Beguir et al.
nikkithadani Apr 19, 2023
c1df24c
Merge branch 'main' of github.com:OATML-Markslab/EVEscape into main
nikkithadani Apr 19, 2023
37e99c6
Update README.md
sarahgurev Apr 19, 2023
cfc1b56
Update requirements.txt
sarahgurev Apr 19, 2023
c414ffc
Update requirements.txt
sarahgurev Apr 19, 2023
0ae0030
Update README.md
sarahgurev Apr 19, 2023
9bd9b5c
Update requirements.txt
sarahgurev Apr 19, 2023
3145cc5
Update README.md
sarahgurev Apr 19, 2023
6d58890
removed evcouplings dependencies
nikkithadani Apr 19, 2023
ace7cd4
Merge branch 'main' of github.com:OATML-Markslab/EVEscape into main
nikkithadani Apr 19, 2023
c89c6d9
Remove old results
sarahgurev Apr 19, 2023
de22f0d
Delete .gitattributes
sarahgurev Apr 19, 2023
77619c3
Update acknowledgements.md
sarahgurev Apr 19, 2023
64db45e
Update README.md
sarahgurev Apr 19, 2023
d310183
Update README.md
sarahgurev Apr 19, 2023
004ab32
added openpyxl to requirements
nikkithadani Apr 19, 2023
370e0c1
rounded summary data columns to 7 decimal places
nikkithadani Apr 19, 2023
bcc305a
Merge branch 'main' of github.com:OATML-Markslab/EVEscape into main
nikkithadani Apr 19, 2023
a22cf99
removed lassa glycoprotein SSP from summary tables
nikkithadani Jun 30, 2023
e7f6fd3
Updated README.md with model checkpoints
pascalnotin Aug 4, 2023
5e09eb7
Update README.md
pascalnotin Aug 11, 2023
fc312ff
Update README.md
sarahgurev Oct 11, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitattributes

This file was deleted.

52 changes: 45 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# EVEscape

This is the official code repository for the paper ["Learning from pre-pandemic data to forecast viral antibody escape"](https://www.biorxiv.org/content/10.1101/2022.07.21.501023v1). This paper is a joint collaboration between the [Marks Lab](https://www.deboramarkslab.com/) and the [OATML group](https://oatml.cs.ox.ac.uk/).
This is the official code repository for the paper ["Learning from pre-pandemic data to forecast viral escape"](https://www.nature.com/articles/s41586-023-06617-0). This paper is a joint collaboration between the [Marks Lab](https://www.deboramarkslab.com/) and the [OATML group](https://oatml.cs.ox.ac.uk/).

## Overview
EVEscape is a model that computes the predicted likelihood of a given viral protein variant to induce immune escape from antibodies. For each protein, EVEscape predicts escape from data sources available pre-pandemic: sequence likelihood predictions from broader viral evolution, antibody accessibility information from protein structures, and changes in binding interaction propensity from residue chemical properties.
Expand All @@ -13,20 +13,26 @@ Computing EVEscape scores consists of three components:

The components are then standardized and fed into a temperature scaled logistic function, and we take the the log transform of the product of the 3 terms to obtain final EVEscape scores.

We also provide EVEscape scores for all single mutation variants of SARS-CoV-2 Spike and aggregate strain-level predictions for all PANGO lineages in our paper, and EVEscape rankings of newly occurring GISAID strains and visualization of likely future mutations will be available at evescape.org.
We also provide EVEscape scores for all single mutation variants of SARS-CoV-2 Spike and aggregate strain-level predictions for all GISAID strains, and EVEscape rankings of newly occurring GISAID strains and visualization of likely future mutations will be available at evescape.org.

## Scripts
The scripts folder contains python scripts to calculate EVEscape scores for all single mutations and aggregate deep mutational scanning data for SARS-CoV-2 RBD, Flu HA, and HIV Env from [data](/data).
The scripts folder contains python scripts to calculate EVEscape scores for all single mutations and aggregate available deep mutational scanning data for SARS-CoV-2 RBD, Flu HA, HIV Env, Lassa glycoprotein, and Nipah fusion and glycoproteins from [data](/data).
Specifically this includes the following two scripts:
- [process_protein_data.py](scripts/process_protein_data.py) calculates the three EVEscape components
- [evescape_scores.py](scripts/evescape_scores.py) creates the final evescape scores and outputs scores and processed DMS data in [summaries_with_scores](./results/summaries_with_scores)

The scripts folder also contains a python script [score_pandemic_strains.py](scripts/score_pandemic_strains.py) to calculate EVEscape scores for all strains in GISAID. The output strain scores (~150MB unzipped) can be downloaded as follows:
```
curl -o strain_scores_20230318.zip https://marks.hms.harvard.edu/evescape/strain_scores_20230318.zip
unzip strain_scores_20230318.zip
rm strain_scores_20230318.zip
```
The workflow of the scripts to create the data tables in [results](./results) needed for the main figures of the EVEscape paper is available in [evescape_summary.pdf](./evescape_summary.pdf). Additional data tables are available in the paper supplement.

## Data requirements
The data required to obtain EVEscape scores is one or multiple PDB files, EVE scores (see next subsection) and a fasta file of the wildtype sequence for the viral protein of interest.
The training data required to obtain EVEscape scores is one or multiple PDB files of the viral antigen, EVE scores (see next subsection) and a fasta file of the wildtype sequence for the viral protein of interest.

To download the RBD escape data used in this project (~120MB unzipped):
To download the RBD escape data used as validation data in this project (~120MB unzipped):
```
curl -o escape_dms_data_20220109.zip https://marks.hms.harvard.edu/evescape/escape_dms_data_20220109.zip
unzip escape_dms_data_20220109.zip
Expand All @@ -50,16 +56,48 @@ We train 5 independent models with different random seeds.
### Model scoring
For the 5 independently-trained models, we compute [evolutionary indices](https://github.com/OATML-Markslab/EVE/blob/master/compute_evol_indices.py) sampling 20k times from the approximate posterior distribution (ie., num_samples_compute_evol_indices=20000). We then average the resulting scores across the 5 models to obtain the final EVE scores used in EVEscape.

### Model checkpoints
We provide open access to the EVE models we trained for the various viruses discussed in the paper. These model checkpoints were obtained following the training procedure described above. To download checkpoints for a viral protein of interest, please adapt the following example with the relevant filename (filenames are listed in the table underneath).
```
curl -o EVE_checkpoints_I4EPC4.zip https://marks.hms.harvard.edu/evescape/EVE_checkpoints_I4EPC4.zip
unzip EVE_checkpoints_I4EPC4.zip
rm EVE_checkpoints_I4EPC4.zip
```
| Uniprot ID | Organism | Protein name | Filename |
| :---------------- | :---------------- | :---------------- | :---------------- |
| I4EPC4 |Influenza A virus | Hemagglutinin | EVE_checkpoints_I4EPC4.zip |
| P0DTC2 (pre2020 sequences only) |SARS-CoV-2 | Spike glycoprotein | EVE_checkpoints_P0DTC2_full_pre2020.zip |
| P0DTC2 |SARS-CoV-2 | Spike glycoprotein | EVE_checkpoints_P0DTC2_full.zip |
| Q2N0S5 | HIV-1 | Envelope glycoprotein gp160 | EVE_checkpoints_Q2N0S5.zip |
| FUS_NIPAV | Nipah virus | Fusion glycoprotein F0 | EVE_checkpoints_FUS_NIPAV.zip |
| GLYC_LASSJ | Lassa virus | Pre-glycoprotein polyprotein GP complex | EVE_checkpoints_GLYC_LASSJ.zip |
| GLYCP_NIPAV | Nipah virus | Glycoprotein G | EVE_checkpoints_GLYCP_NIPAV.zip |


## Software requirements
The entire codebase is written in python. The corresponding environment may be created via conda and the provided [requirements.txt](./requirements.txt) file as follows:
```
conda config --add channels conda-forge
conda create --name evescape_env --file requirements.txt
conda activate evescape_env
```
The environment installs in minutes.

## Runtime
After collecting the training data, generating EVEscape scores for all single mutations runs in minutes. Strain scoring of all GISAID strains runs in 2 hours on 64G of memory.

## License
This project is available under the MIT license.

## Reference
If you use this code, please cite the following paper:

Nicole N. Thadani*, Sarah Gurev*, Pascal Notin*, Noor Youssef, Nathan J. Rollins, Chris Sander, Yarin Gal, Debora S. Marks. Learning from pre-pandemic data to forecast viral antibody escape. BioRxiv. 2022.
Nicole N. Thadani*, Sarah Gurev*, Pascal Notin*, Noor Youssef, Nathan J. Rollins, Daniel Ritter, Chris Sander, Yarin Gal, Debora S. Marks. Learning from pre-pandemic data to forecast viral escape. _Nature_. 2023.

(* equal contribution)

Links:
- Pre-print: https://www.biorxiv.org/content/10.1101/2022.07.21.501023v1
- Publication: https://www.nature.com/articles/s41586-023-06617-0
- Website: https://www.evescape.org/

See new work using EVEscape to design infectious Spike proteins that forecast future neutralizing antibody escape on [BioRxiv](https://www.biorxiv.org/content/10.1101/2023.10.08.561389v1).
3 changes: 3 additions & 0 deletions acknowledgements.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@ EVEscape benchmarks against data from many published works, cited in the preprin

EVEscape uses RBD escape data compiled at [SARS2_RBD_Ab_escape_maps](https://github.com/jbloomlab/SARS2_RBD_Ab_escape_maps)

We acknowledge the authors and originating and submitting laboratories of the sequences from GISAID for sharing sequencing data (detailed acknowledgements in Data S6).


Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Lineage,WHO Nomenclature,Mutations,Reduction
B.1.1.7,Alpha,H69- V70- Y144- N501Y A570D D614G P681H T716I S982A D1118H,22.92%
B.1.1.7+E484K,Alpha,H69- V70- Y144- E484K N501Y A570D D614G P681H T716I S982A D1118H,74.65%
B.1.351,Beta,L18F D80A D215G L242- A243- L244- R246I K417N E484K N501Y D614G A701V,80.04%
B.1.351*,Beta,D80A D215G L242H K417N E484K N501Y D614G A701V,47.19%
B.1.351**,Beta,D80A D215G L242- A243- L244- K417N E484K N501Y D614G A701V,77.45%
P.1,Gamma,L18F T20N P26S D138Y R190S K417T E484K N501Y H655Y T1027I V1176F,47.66%
B.1.617.2,Delta,T19R G142D E156G F157- R158- K417N L452R T478K D614G P681R D950N,49.43%
AY.1,Delta,T19R T95I G142D E156G F157- R158- W258L K417N L452R T478K K558N D614G P681R D950N,48.09%
B.1.427/B.1.429,Epsilon,S13I W152C L452R D614G,52.57%
B.1.526,Iota,L5F T95I D253G E484K D614G A701V,10.94%
B.1.617.1,Kappa,L452R E484Q D614G P681R,18.34%
C.37,Lambda,G75V T76I R246- S247- Y248- L249- T250- P251- G252- D253N L452Q F490S D614G T859N,34.22%
C.37*,Lambda,G75V T76I L452Q F490S D614G T859N,16.88%
BA.1,Omicron,A67V H69- V70- T95I G142D V143- Y144- Y145- N211I L212V ins214EPE G339D S371L S373P S375F K417N N440K G446S S477N T478K E484A Q493R G496S Q498R N501Y Y505H T547K D614G H655Y N679K P681H N764K D796Y N856K Q954H N969K L981F,97.52%
BA.2,Omicron,T19I L24- P25- P26- A27S G142D V213G G339D S371L S373P S375F T376A D405N R408S K417N N440K S477N T478K E484A Q493R Q498R N501Y Y505H D614G H655Y N679K P681H N764K D796Y Q954H N969K,91.16%
BA.4/5,Omicron,T19I L24- P25- P26- A27S H69- V70- G142D V213G G339D S371F S373P S375F T376A D405N R408S K417N N440K L452R S477N T478K E484A F486V Q498R N501Y Y505H D614G H655Y N679K P681H N764K D796Y Q954H N969K,94.38%
A.VOI.V2,,D80Y Y144- I210- D215G R246- S247- Y248- L249M W258L R346K T478R E484K H655Y P681H Q957H,64.54%
B.1.1.298,,Y453F D614G I692V M1229I,5.42%
B.1.160,,S477N S494P D614G K1191N,3.81%
B.1.258,,H69- V70- L189F N439K D614G V772I,-9.28%
B.1.517,,G181V G252V N501T D614G P812L,-7.87%
2 changes: 1 addition & 1 deletion data/gisaid/covidcg_consensus_mutations.json

Large diffs are not rendered by default.

Loading