Skip to content

shhhoaib/Web-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

# πŸ•·οΈ Smart Web Scraper (GUI Based)

A beginner-friendly yet powerful **GUI-based web scraper** built using Python. This tool allows users to extract useful information like **titles, links, posting time (e.g., β€œ2 days ago”), and prices** from websites and save the data into **CSV and Excel files**.

---

## πŸš€ Features

* πŸ–₯️ Simple and user-friendly GUI (Tkinter)
* 🌐 Scrapes data from websites using BeautifulSoup
* πŸ“Œ Extracts:

  * Titles
  * Links
  * Posted time (e.g., "3 days ago")
  * Prices (Rs / $)
* πŸ’Ύ Saves data into:

  * `output.csv`
  * `output.xlsx`
* ⚑ Fast and lightweight (no browser automation required)

---

## πŸ› οΈ Technologies Used

* **Python**
* **Requests** (for fetching webpage data)
* **BeautifulSoup (bs4)** (for HTML parsing)
* **Pandas** (for data handling & export)
* **Tkinter** (for GUI interface)
* **Regex (re)** (for pattern extraction)

---

## πŸ“‚ Project Structure

```
web-scraper-project/
β”‚
β”œβ”€β”€ bs4_scraper_gui.py      # Main application file
β”œβ”€β”€ requirements.txt        # Dependencies
β”œβ”€β”€ README.md               # Project documentation
β”œβ”€β”€ output.csv              # Sample output (optional)
β”œβ”€β”€ output.xlsx             # Sample output (optional)
```

---

## πŸ“¦ Installation

Clone the repository or download the files, then install dependencies:

```bash
pip install -r requirements.txt
```

---

## ▢️ How to Run

```bash
python bs4_scraper_gui.py
```

---

## πŸ§‘β€πŸ’» How It Works

1. User enters a website URL in the GUI
2. The scraper sends a request to the website
3. HTML content is parsed using BeautifulSoup
4. Data is extracted using:

   * HTML tags (h1, h2, a, etc.)
   * Regular expressions (for time & price)
5. Data is stored in a structured format
6. Output is saved as CSV and Excel files

---

## ⚠️ Limitations

* ❌ Does NOT work on JavaScript-heavy websites (e.g., Amazon, Daraz)
* ❌ Cannot scrape dynamically loaded content
* βœ… Works best on:

  * Blogs
  * News websites
  * Job listings
  * Static HTML pages

---

## πŸ’‘ Future Improvements

* Add **Selenium support** for dynamic websites
* Add **live data preview in GUI**
* Multi-page scraping support
* Export customization (choose file location)

---

## πŸ“Έ Output Example

| Title          | Link        | Posted     | Price  |
| -------------- | ----------- | ---------- | ------ |
| Sample Product | example.com | 2 days ago | Rs 500 |

---

## πŸ‘¨β€πŸ’» Author

**Mohammad Shoaib**


About

Python GUI Web Scraper using BeautifulSoup and Tkinter

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages