A machine learning-powered phishing detection system with a Chrome extension for real-time website analysis. Built using real-world data from the PhiUSIIL Phishing URL Dataset (UCI Machine Learning Repository).
This project provides an end-to-end solution for detecting phishing websites:
- A Random Forest classifier trained on 100,000+ real URLs
- A Flask REST API for URL predictions
- A Chrome Extension for automatic, real-time protection while browsing
- Algorithm: Random Forest Classifier (100 estimators)
- Dataset: PhiUSIIL Phishing URL Dataset from UCI
- Validation: 5-fold cross-validation with train/test split
- Output: Classification report, confusion matrix, and accuracy metrics
- Automatic Scanning: Checks every page you visit
- Visual Indicators: Badge shows OK, DIE, or ERR
- Notifications: Desktop alerts when phishing is detected
- Warning Banner: Injects a dismissible warning banner on phishing pages
- Caching: 24-hour cache to avoid redundant API calls
- Whitelist: Pre-configured list of trusted domains (Google, Facebook, Amazon, etc.)
- Re-scan Button: Manual rescan option in the popup
- Ensure the Flask API is running (
python src/api.py) - Browse the web as normal
- The extension automatically scans each page:
- Green "OK" badge: Site appears legitimate
- Red "DIE" badge: Phishing detected - a warning banner will appear
- Orange "..." badge: Scan in progress
- Gray "ERR" badge: Could not reach the API
- Click the extension icon to see details and confidence score
- Use the Re-scan button to manually recheck a page