This repository contains the code used in the study Text Mining-Based Profiling of Chemical Environments in Protein–Ligand Binding Assays Across Analytical Techniques.
The data used in this study is available at:
- Hugging Face: ChemScope-Dataset
- GitHub: Integrated_Physicochemical_Dataset.csv
In order to run the project, you will need Python 3.8 or above. The required libraries listed in requirements.txt must be installed.
git clone https://github.com/erdemonal/ChemScope.git
cd ChemScope
pip install -r requirements.txtDownload chemical property datasets from OSF (Required as data/raw is not version controlled).
python scripts/fetch_resources.pyMine Europe PMC for protein-ligand associations.
Define your search queries in queries.txt (format: Name, "Search Query").
Example:
ITC, "isothermal titration calorimetry" AND ("protein-ligand binding" OR "binding affinity")
Run the miner:
python scripts/literature_mining.pypython scripts/data_processing.py -i data/interim -t folderpython scripts/chemometrics_analysis.pypython scripts/static_visualization.pypython scripts/interactive_visualization.py -i data/processedIf you use this code in your research, please cite the following paper:
Text Mining-Based Profiling of Chemical Environments in Protein–Ligand Binding Assays Across Analytical Techniques
Erdem Önal, Zeynep Kalaycıoğlu
Chemometrics and Intelligent Laboratory Systems, 2026, 105659
DOI: 10.1016/j.chemolab.2026.105659
@article{ONAL2026105659,
title = {Text Mining-Based Profiling of Chemical Environments in Protein–Ligand Binding Assays Across Analytical Techniques},
journal = {Chemometrics and Intelligent Laboratory Systems},
pages = {105659},
year = {2026},
issn = {0169-7439},
doi = {https://doi.org/10.1016/j.chemolab.2026.105659},
url = {https://www.sciencedirect.com/science/article/pii/S0169743926000328},
author = {Erdem Önal and Zeynep Kalaycıoğlu},
keywords = {Affinity, bibliometrics, drug, visualization}