A comprehensive Chrome extension + Flask ML API system for detecting phishing and malicious links in real-time. LinkForensics combines rule-based analysis with AI-powered LightGBM models to protect users while browsing the web.
- Dual Detection Engine: Rule-based heuristics + LightGBM AI model for binary classification (Safe/Phishing)
- Real-time Link Scanning: Automatically scans all links on pages, including dynamic content in SPAs
- Smart Warnings: Visual alerts for risky links with detailed threat analysis
- Quick Lookup: Paste any URL in the popup for instant AI prediction
- Dashboard: Full statistics and analytics of scanned links by severity
- Local Processing: All analysis happens locally - no data sent to external servers
- Whitelist Support: Trusted domain whitelist for safe sites
- Chrome Extension v5: Modern Manifest v3 architecture
Chrome Web Store: LinkForensics - Friendly Link Checker
Dataset: Kaggle - malicious-url-classification-706k
LinkForensics/
βββ server.js # Express web server (port 9091) for dashboard
βββ model_api.py # Flask ML API (port 5001) for predictions
βββ requirements.txt # Python dependencies
βββ package.json # Node.js dependencies
βββ download_whitelist.py # Utility to manage trusted domain whitelist
β
βββ extension/ # Chrome Extension files (Manifest v3)
β βββ manifest.json # Extension config
β βββ background.js # Service worker for API bridge & network monitoring
β βββ content-script.js # Injected script for link scanning
β βββ popup.html/.js/.css # Extension popup interface
β βββ dashboard.html/.js # Full dashboard with ML analysis
β βββ warning.html/.js # Safe landing page for risky links
β βββ README.md # Extension-specific documentation
β
βββ public/ # Web dashboard static files
β βββ index.html
β βββ app.js
β βββ style.css
β
βββ artifacts/ # Pre-trained ML models & features
βββ models/
β βββ RUN6_LGB_LEX_4CLASS.joblib # 4-class classifier
β βββ RUN7_LGB_LEX_TFIDF_BINARY.joblib # Binary classifier (active)
β βββ RUN8_LGB_LEX_TFIDF_4CLASS.joblib # 4-class with TF-IDF
βββ tfidf/
β βββ tfidf_vectorizer.joblib # TF-IDF feature extraction
β βββ svd_tfidf.joblib # TF-IDF SVD transformation
βββ lex/
βββ lexical_feature_columns_binary.json # Binary model features
βββ lexical_feature_columns_4class.json # 4-class model features
βββ whitelist.txt # Trusted domains list
- Python 3.8+ (for ML API)
- Node.js 14+ (for web dashboard)
- Chrome browser (for extension)
pip install -r requirements.txtRequired packages:
flask- REST API frameworkflask-cors- Cross-origin requestslightgbm- LightGBM ML modelsscikit-learn- Feature processingnumpy&joblib- Data handling
python model_api.pyExpected output:
==================================================
LinkForensics Model API
http://localhost:5001
POST /predict { url: '...' }
GET /inspect
==================================================
The API will:
- Load pre-trained LightGBM models and artifacts
- Listen on
http://localhost:5001for prediction requests - Support
/predictendpoint for URL classification
npm install
npm startWeb dashboard available at http://localhost:9091
- Open Chrome and go to
chrome://extensions/ - Enable Developer mode (toggle, top-right)
- Click Load unpacked
- Select the
extension/folder - Pin the extension to your toolbar for easy access
POST /predict
Request:
{ "url": "https://example.com" }Response:
{
"url": "https://example.com",
"risk_score": 0.15,
"label": "safe",
"confidence": 0.92,
"features_used": 42
}GET /inspect
Returns API health and loaded model information.
Serves static dashboard with link statistics and analytics.
- Content Script β Scans all links on page
- Rule-based Analysis β Checks domain against whitelist, pattern matching
- Feature Extraction β Generates lexical + TF-IDF features
- LightGBM Model β Classifies link as Safe/Phishing
- Risk Score β Combines rule + ML confidence
- Visual Alert β Shows warning for high-risk links
- Active Model:
RUN7_LGB_LEX_TFIDF_BINARY.joblib - Classification: Binary (Safe / Phishing)
- Features: 42 lexical + TF-IDF features
- Framework: LightGBM (fast, interpretable, low overhead)
β
All processing is local - No data sent to external servers
β
Offline capable - Fallback to rule-based detection if API is unavailable
β
No tracking - Extension doesn't collect or store user history
β
Open source - Transparent detection logic
See LICENSE file for details.
Contributions are welcome! Areas for improvement:
- Additional 4-class model integration
- Whitelist expansion
- Rule refinements
- Performance optimization
- Extension README - Detailed extension setup and usage
- Chrome Web Store - Official listing