GitHub - shhhoaib/Web-Scraper: Python GUI Web Scraper using BeautifulSoup and Tkinter

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
README.txt		README.txt
output.csv		output.csv
output.xlsx		output.xlsx
requirements.txt.txt		requirements.txt.txt
web_scraper.py		web_scraper.py

Repository files navigation

# 🕷️ Smart Web Scraper (GUI Based)

A beginner-friendly yet powerful **GUI-based web scraper** built using Python. This tool allows users to extract useful information like **titles, links, posting time (e.g., “2 days ago”), and prices** from websites and save the data into **CSV and Excel files**.

---

## 🚀 Features

* 🖥️ Simple and user-friendly GUI (Tkinter)
* 🌐 Scrapes data from websites using BeautifulSoup
* 📌 Extracts:

  * Titles
  * Links
  * Posted time (e.g., "3 days ago")
  * Prices (Rs / $)
* 💾 Saves data into:

  * `output.csv`
  * `output.xlsx`
* ⚡ Fast and lightweight (no browser automation required)

---

## 🛠️ Technologies Used

* **Python**
* **Requests** (for fetching webpage data)
* **BeautifulSoup (bs4)** (for HTML parsing)
* **Pandas** (for data handling & export)
* **Tkinter** (for GUI interface)
* **Regex (re)** (for pattern extraction)

---

## 📂 Project Structure

```
web-scraper-project/
│
├── bs4_scraper_gui.py      # Main application file
├── requirements.txt        # Dependencies
├── README.md               # Project documentation
├── output.csv              # Sample output (optional)
├── output.xlsx             # Sample output (optional)
```

---

## 📦 Installation

Clone the repository or download the files, then install dependencies:

```bash
pip install -r requirements.txt
```

---

## ▶️ How to Run

```bash
python bs4_scraper_gui.py
```

---

## 🧑‍💻 How It Works

1. User enters a website URL in the GUI
2. The scraper sends a request to the website
3. HTML content is parsed using BeautifulSoup
4. Data is extracted using:

   * HTML tags (h1, h2, a, etc.)
   * Regular expressions (for time & price)
5. Data is stored in a structured format
6. Output is saved as CSV and Excel files

---

## ⚠️ Limitations

* ❌ Does NOT work on JavaScript-heavy websites (e.g., Amazon, Daraz)
* ❌ Cannot scrape dynamically loaded content
* ✅ Works best on:

  * Blogs
  * News websites
  * Job listings
  * Static HTML pages

---

## 💡 Future Improvements

* Add **Selenium support** for dynamic websites
* Add **live data preview in GUI**
* Multi-page scraping support
* Export customization (choose file location)

---

## 📸 Output Example

| Title          | Link        | Posted     | Price  |
| -------------- | ----------- | ---------- | ------ |
| Sample Product | example.com | 2 days ago | Rs 500 |

---

## 👨‍💻 Author

**Mohammad Shoaib**