Developed and tested on Windows 10 inside a venv environment using Python 3.7.7 and pip 19.2.3
Setup your environment by installing the requirements using pip.
pip install -r requirements.txtCopy the config.example file to config.py. A database with dummy data is provided here, place the file in /Database/miner_database.db. Data used in the research is available upon request (contact: a.j.vanaltena@amsterdamumc.nl) or may be collected from PubMed using the qrel files from the 2017 CLEF eHealth Lab. Follow the steps below to perform the experiments.
- Clean the raw articles
python clean_articles.py- Build the feature matrices
python create_feature_matrices.py- Do grid searches
python Grid_search/leaveoneout/rf_random_search.py
python Grid_search/onevsone/rf_random_search.pyNote: the results of the grid searches are placed in a csv file in the Grid_search/leaveoneout/ and Grid_search/onevsone/ directories respectively.
Create a folder with the name of the experiment run and edit the CLASSIFIER_LOCATION in the config.py file. The config.example file uses the foldername run1.
- Run the classifiers
python run_leaveoneout.py
python run_onevsone.py
python run_nvsone.py
python run_nvsone_random.py
# Fetch timing difference results for two training set sizes
python run_nvsone_timing.py- Interpret the outcomes
Note: for correlations calculation a metadata file is necessary. You may find this file for the fifty reviews used in our research here. For testing purposes we also provide a dummy set.
python make_plots.py
python calculate_correlations.py- When writing paper
python prepare_metadata.py