A Python console application that retrives clean text data from multiple websites at once and saves the data to a text file.
Running The Text Scraper Program:
- Run the script
- Enter a search keyword - try "emoji" ;)
Text data output is saved to a "text_data.txt" file
Requires requests, and beautiful soup 4
pip install requests
pip install bs4
Requests Documentation: https://requests.readthedocs.io/en/master/
Beautiful Soup Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#
Example result text on "covid19" keyword searches:
Covid19 Text Corpus: https://drive.google.com/open?id=1YS8UJ-Qeamdo9aAcpIgUqVb0ohrKijHy