shhhoaib/Web-Scraper
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Repository files navigation
# π·οΈ Smart Web Scraper (GUI Based) A beginner-friendly yet powerful **GUI-based web scraper** built using Python. This tool allows users to extract useful information like **titles, links, posting time (e.g., β2 days agoβ), and prices** from websites and save the data into **CSV and Excel files**. --- ## π Features * π₯οΈ Simple and user-friendly GUI (Tkinter) * π Scrapes data from websites using BeautifulSoup * π Extracts: * Titles * Links * Posted time (e.g., "3 days ago") * Prices (Rs / $) * πΎ Saves data into: * `output.csv` * `output.xlsx` * β‘ Fast and lightweight (no browser automation required) --- ## π οΈ Technologies Used * **Python** * **Requests** (for fetching webpage data) * **BeautifulSoup (bs4)** (for HTML parsing) * **Pandas** (for data handling & export) * **Tkinter** (for GUI interface) * **Regex (re)** (for pattern extraction) --- ## π Project Structure ``` web-scraper-project/ β βββ bs4_scraper_gui.py # Main application file βββ requirements.txt # Dependencies βββ README.md # Project documentation βββ output.csv # Sample output (optional) βββ output.xlsx # Sample output (optional) ``` --- ## π¦ Installation Clone the repository or download the files, then install dependencies: ```bash pip install -r requirements.txt ``` --- ##βΆοΈ How to Run ```bash python bs4_scraper_gui.py ``` --- ## π§βπ» How It Works 1. User enters a website URL in the GUI 2. The scraper sends a request to the website 3. HTML content is parsed using BeautifulSoup 4. Data is extracted using: * HTML tags (h1, h2, a, etc.) * Regular expressions (for time & price) 5. Data is stored in a structured format 6. Output is saved as CSV and Excel files --- ##β οΈ Limitations * β Does NOT work on JavaScript-heavy websites (e.g., Amazon, Daraz) * β Cannot scrape dynamically loaded content * β Works best on: * Blogs * News websites * Job listings * Static HTML pages --- ## π‘ Future Improvements * Add **Selenium support** for dynamic websites * Add **live data preview in GUI** * Multi-page scraping support * Export customization (choose file location) --- ## πΈ Output Example | Title | Link | Posted | Price | | -------------- | ----------- | ---------- | ------ | | Sample Product | example.com | 2 days ago | Rs 500 | --- ## π¨βπ» Author **Mohammad Shoaib**