Skip to content

A Python application that retrives clean text data from websites.

Notifications You must be signed in to change notification settings

pythonpaul/Python-Webscraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python-Web-Text-Scraper

A Python console application that retrives clean text data from multiple websites at once and saves the data to a text file.

Running The Text Scraper Program:

  1. Run the script
  2. Enter a search keyword - try "emoji" ;)

Text data output is saved to a "text_data.txt" file

Requires requests, and beautiful soup 4

pip install requests
pip install bs4

Requests Documentation: https://requests.readthedocs.io/en/master/

Beautiful Soup Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#

Example result text on "covid19" keyword searches:
Covid19 Text Corpus: https://drive.google.com/open?id=1YS8UJ-Qeamdo9aAcpIgUqVb0ohrKijHy

About

A Python application that retrives clean text data from websites.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages