This script scrapes PhD positions from the first five pages of the website Fellowship Board. These positions are first saved in a CSV file. After that, there are two options to process the data:
- Use the
data_processing.ipynbJupyter notebook to filter and format the PhD positions based on specified keywords and countries. - Use the
chat_gpt_summary.pyscript to send the scraped data to OpenAI's GPT model for filtering using natural language.
-
Navigate to the project directory:
cd /path/to/PhD Scraper -
Install the required libraries:
pip install -r requirements.txt
-
Run the scraper:
python3 phd_scraper.py
-
Process the data:
- Using the
chat_gpt_summary.pyscript:
First add a .env file with your OpenAI API key to the project directory:
OPENAI_API_KEY=your_api_key_here
Then run the script:
python3 chat_gpt_summary.py
Your put in your prompt in the terminal when running the script.
- Using the
data_processing.ipynbJupyter notebook:
Open the
data_processing.ipynbnotebook, specify the keywords and countries to exclude at the top and run the whole notebook to filter and format the PhD positions. - Using the
-
Find the filtered results in the
outputfolder (filtered_phd_positions.txtfor thedata_processing.ipynbnotebook andgpt_summary.txtfor thechat_gpt_summary.pyscript).