diff --git a/README.md b/README.md
index 5c453de..fa8abd7 100644
--- a/README.md
+++ b/README.md
@@ -106,7 +106,59 @@ Here is the summary of methods we have in AIRS. More methods will be included as
OpenODE
- - IFL-DIF
+ - 📦 https://github.com/luost26/3D-Generative-SBDD/blob/main/data/README.md (MIT License)
+
+Download and place the data under the `data/` directory before running training or evaluation.
+
+### Protein Encoder
+
+Frag2Seq uses the pre-trained **ESM-IF1** (GVP-Transformer) model to obtain protein pocket embeddings via cross-attention. The model weights are publicly available at:
+
+> 🔗 https://github.com/facebookresearch/esm (MIT License)
+
+### Data Directory Structure
+
+After downloading, your `data/` folder should look like:
+
+```
+data/
+├── crossdocked_pocket10/ # Processed protein-ligand complexes
+│ ├── train.lmdb
+│ ├── test.lmdb
+│ └── ...
+└── split_by_name.pt # Train/test split indices
+```
+
+### Citation
+
+If you use the CrossDocked2020 dataset, please also cite the original dataset paper:
+
+```bibtex
+@article{francoeur2020three,
+ title={Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design},
+ author={Francoeur, Paul G and Masuda, Tomohide and Sunseri, Jocelyn and Jain, Andrew and Bhatt, Richard G and Koes, David Ryan and Bhatt, David L},
+ journal={Journal of Chemical Information and Modeling},
+ volume={60},
+ number={9},
+ pages={4200--4215},
+ year={2020},
+}
+```
+>IFL-DIF