Production-grade phishing defense platform built as a Chrome Extension + Rust API Gateway + Python ML Inference Service.
This repository is the DP2 workspace for the Intellithon 2025 project and includes:
- Browser-side real-time threat monitoring
- Backend token-gated API and analytics persistence
- ML inference with ensemble models and engineered features
- Operator dashboard and test harness payloads
- Project Overview
- What This System Detects
- High-Level Architecture
- Component Deep Dive
- Data and Control Flows
- API Contracts
- Database and Analytics Model
- Machine Learning Pipeline
- Repository Structure
- Local Development Runbook
- Testing and Validation
- Security and Privacy Notes
- Performance Notes
- Known Gaps and Technical Debt
- Troubleshooting Guide
- Operational Scripts and Patch Utilities
- Contribution Guidelines
- License
PhishGuard AI is an end-to-end phishing detection system with three cooperating layers:
- Extension Layer (Chrome Manifest V3)
- Injects real-time client-side detectors into pages
- Monitors network behavior, fingerprinting, and suspicious DOM/UX patterns
- Surfaces in-page warnings and tracks telemetry
- Gateway Layer (Rust, Actix-Web)
- Serves as a local API facade for extension clients
- Enforces local API token auth (control plane bootstrap + rotation)
- Routes URL scoring requests to the Python ML service
- Persists user analytics and threat events to SQLite
- ML Layer (Python, FastAPI)
- Performs URL feature extraction and model inference
- Supports sensitivity-aware classification thresholds
- Returns detailed confidence and timing metrics
The system is designed for low-latency local operation and transparent risk visibility in the dashboard.
-
Behavioral and UX phishing patterns
- Immediate password prompts
- Cross-origin credential form posts
- Auto-submit forms
- Redirect bursts
- Clipboard abuse
-
Visual/DOM spoofing patterns
- Brand spoofing on non-official domains
- Hollow credential-harvesting DOM structures
- Clickjacking overlays
-
Cryptographic/evasion indicators
- Homograph/punycode-like URL patterns
- Inline-script obfuscation bombs
-
Network and protocol abuse
- C2-like URL/IP/port patterns
- Excessive exfiltration-size POSTs
- Suspicious WebSocket and DoH-like traffic indicators
-
Fingerprinting behavior
- Canvas/WebGL probing
- Audio context abuse
- Font enumeration patterns
- High navigator/storage probing density
- URL classification using model ensemble confidence
- Sensitivity-mode thresholds (
conservative,balanced,aggressive) - Persistent analytics aggregation per user and globally
flowchart LR
subgraph Browser[Chrome Browser]
CS[Content Scripts\ncontent_script.js\nfingerprint_detector.js\nnetwork_monitor.js]
BG[Service Worker\nbackground.js]
UI[Popup + Dashboard\npopup-enhanced.html\ndashboard.html]
end
CS -->|runtime messages| BG
UI -->|X-PhishGuard-Token| API
BG -->|POST /api/check-url| API
BG -->|POST /api/user/:id/activity| API
subgraph Gateway[Rust API Gateway :8080]
API[Actix Web API]
AUTH[API Auth Middleware\nControl-plane token validation]
RL[Rate Limit Middleware]
CACHE[Redis Cache Service]
DB[(SQLite via Diesel)]
end
API --> AUTH
API --> RL
API --> CACHE
API --> DB
API -->|POST /api/predict| ML
subgraph Inference[Python ML Service]
ML[FastAPI app.py :8888 default\n/api/predict]
FE[ProductionFeatureExtractor\nUltimateFeatureIntegrator]
MODELS[ModelCache\nLightGBM + XGBoost]
end
ML --> FE
ML --> MODELS
Source: manifest.json
- Manifest V3 service worker architecture
- Content scripts injected on
<all_urls>atdocument_start - Key permissions:
tabs,activeTabstoragewebRequestnotifications
| File | Role | Key behaviors |
|---|---|---|
background.js |
Service worker control plane | Token bootstrap/rotation, ML request orchestration, analytics logging, blacklist/state persistence |
content_script.js |
Behavioral/DOM detectors + in-page warning UI | Form/redirect/password heuristics, visual/NLP/homograph/obfuscation scans, Safety Abort overlay action |
fingerprint_detector.js |
Browser fingerprinting detector | Hooks canvas, WebGL, Audio, font checks, storage/nav probing |
network_monitor.js |
Network abuse detector | Detects suspicious patterns, exfiltration-size uploads, C2 indicators |
popup-enhanced.js |
Popup telemetry panel | Lightweight status and recent activity feed |
app.js |
Full dashboard logic | Metrics fetching, chart rendering, history filters, settings persistence |
dashboard.html: Multi-page app with Dashboard/History/Analytics/Settings/Help sectionspopup-enhanced.html: Compact quick-view status and activity panel
Source: backend/src/main.rs
Startup responsibilities:
- Load env/config
- Initialize Redis cache client (optional)
- Initialize ML HTTP client
- Initialize optional GeoIP DB (
geodb/GeoLite2-City.mmdb) - Initialize SQLite pool (fallback in-memory mode if unavailable)
- Ensure control-plane credential table exists
- Register middleware + routes and bind host/port
Source:
backend/src/middleware/api_auth.rsbackend/src/middleware/rate_limit.rs
Behavior:
bootstrapendpoint is exempt from token checks- Non-bootstrap
/api/*requiresX-PhishGuard-Token(ortokenquery fallback) - Rate limiting is route-bucketed:
- default: 120 req/min
/api/check-url: 90 req/min/api/user/:id/activityPOST: 60 req/min
- Request body max enforced at 64KB
Source: backend/src/handlers/control_plane.rs
- Bootstrap and rotate require:
- valid
install_idandextension_idformat - strict
Origin == chrome-extension://<extension_id>
- valid
- Rotations require valid existing token
Source: ml-service/app.py
- Startup loads
ModelCache()andProductionFeatureExtractor() - Prediction endpoint:
POST /api/predict - Health endpoint:
GET /health - Sensitivity thresholds are dynamic:
- conservative:
0.80 - balanced:
0.50 - aggressive:
0.30
- conservative:
app.py prints docs/health URLs as :8000, but the direct script runner currently binds port 8888 via:
uvicorn.run("app:app", host="0.0.0.0", port=8888, ...)The backend .env currently aligns to this by setting:
ML_SERVICE_URL=http://127.0.0.1:8888
sequenceDiagram
participant U as Extension UI/Caller
participant BG as background.js
participant API as Rust API :8080
participant REDIS as Redis cache
participant ML as Python ML :8888
U->>BG: action=checkURL(url)
BG->>API: POST /api/check-url (token)
API->>REDIS: cache.get(url_hash)
alt cache hit
REDIS-->>API: cached URLCheckResponse
API-->>BG: response(cached=true)
else cache miss
API->>ML: POST /api/predict
ML-->>API: confidence + threat_level + metrics
API->>REDIS: cache.set(ttl)
API-->>BG: response(cached=false)
end
BG->>API: POST /api/user/:id/activity (encrypted metadata)
BG-->>U: scored result
sequenceDiagram
participant EXT as Extension
participant API as Rust API
participant DB as SQLite control_plane_credentials
EXT->>API: POST /api/control-plane/bootstrap\n{install_id, extension_id}\nOrigin: chrome-extension://<id>
API->>API: validate identifiers + origin match
API->>DB: upsert token
API-->>EXT: token
Note over EXT,API: On 401 or update, extension rotates token
EXT->>API: POST /api/control-plane/rotate + X-PhishGuard-Token
API->>DB: validate previous token + rotate
API-->>EXT: new token
flowchart LR
DASH[dashboard app.js]
DASH -->|GET /api/stats/global| API[Rust API]
DASH -->|GET /api/user/:id/analytics| API
DASH -->|GET /health| API
API --> DB[(SQLite)]
DASH --> CHARTS[Chart.js visualizations]
Current implementation uses polling (2 seconds) for active dashboard/history/analytics views.
| Endpoint | Method | Auth | Source | Purpose |
|---|---|---|---|---|
/ |
GET | No | backend/src/handlers/root.rs |
Basic service metadata |
/health |
GET | No | backend/src/handlers/health.rs |
Redis/ML health snapshot |
/api/control-plane/bootstrap |
POST | No | backend/src/handlers/control_plane.rs |
First token issuance |
| Endpoint | Method | Auth | Source | Purpose |
|---|---|---|---|---|
/api/control-plane/rotate |
POST | Yes | control_plane.rs |
Rotate extension token |
/api/check-url |
POST | Yes | url_check.rs |
URL phishing scoring |
/api/stats |
GET | Yes | stats.rs |
Cache stats (currently minimal) |
/api/stats/global |
GET | Yes | global_stats.rs |
Global aggregate metrics |
/api/stats/user/{user_id} |
GET | Yes | global_stats.rs |
Per-user stats |
/api/user/{user_id}/analytics |
GET | Yes | user_analytics.rs |
User recent activity + threat breakdown |
/api/user/{user_id}/activity |
POST | Yes | user_analytics.rs |
Append new threat/activity event |
/api/user/{user_id}/threats/live |
GET | Yes | user_analytics.rs |
SSE live threats stream |
curl -X POST "http://localhost:8080/api/control-plane/bootstrap" \
-H "Content-Type: application/json" \
-H "Origin: chrome-extension://aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" \
-d '{"install_id":"install_12345678","extension_id":"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}'curl -X POST "http://localhost:8080/api/check-url" \
-H "Content-Type: application/json" \
-H "X-PhishGuard-Token: <token>" \
-d '{"url":"https://example.com","sensitivity_mode":"balanced"}'Current runtime is SQLite via Diesel (backend/src/db/connection.rs), defaulting to phishguard.db when DATABASE_URL is absent.
Source: backend/src/db/schema.rs
usersuser_activitydevice_metricsuser_threat_statsuser_threat_sourcesuser_scan_queueuser_model_updatesuser_privacy_settings
Control-plane persistence table is ensured separately by service startup:
control_plane_credentialsviabackend/src/services/control_plane_store.rs
erDiagram
users ||--o{ user_activity : has
users ||--o{ user_threat_stats : has
users ||--o{ user_threat_sources : has
users ||--o{ device_metrics : has
users ||--o{ user_scan_queue : has
users ||--o{ user_model_updates : has
users ||--o{ user_privacy_settings : has
users {
text user_id PK
text extension_id
text sensitivity_mode
int is_active
int total_scans
int total_threats_blocked
}
user_activity {
text activity_id PK
text user_id FK
text encrypted_url
text encrypted_domain
int is_phishing
text threat_level
double confidence
bigint timestamp
}
control_plane_credentials {
text install_id PK
text extension_id
text token
bigint issued_at
}
- PostgreSQL-style migration:
backend/migrations/.../up.sql - SQLite migrations/scripts:
backend/migrations/.../up_sqlite.sqlbackend/migrations/.../up_sqlite_complete.sql
There are legacy Postgres-oriented SQL scripts in root (database-schema.sql, setup_database.sql) alongside current SQLite runtime support.
ml-service/app.py -> ml-model/deployment/model_cache.py + ml-model/deployment/production_feature_extractor.py
- LightGBM (
lightgbm_159features.pkl) - XGBoost (
xgboost_159features.pkl)
ProductionFeatureExtractor wraps UltimateFeatureIntegrator, which combines:
- URL features
- SSL/TLS features
- DNS features
- Content features
- Behavioral features
- Network features
The integrator reports a 159-feature vector in current implementation.
POST /api/predict returns:
is_phishingconfidencethreat_levelsensitivity_modethreshold_useddetails(feature extraction and model timing)performance_metrics
DP2/
├── manifest.json
├── background.js
├── content_script.js
├── fingerprint_detector.js
├── network_monitor.js
├── popup-enhanced.html
├── popup-enhanced.js
├── dashboard.html
├── app.js
├── style.css
├── index.html
├── test-payloads/
│ ├── index.html
│ ├── 1_visual_spoof.html
│ ├── 2_nlp_spear_phishing.html
│ ├── 3_clickjack_obfuscation.html
│ └── 4_web3_drainer.html
├── backend/
│ ├── Cargo.toml
│ ├── .env
│ ├── .env.example
│ ├── migrations/
│ └── src/
│ ├── main.rs
│ ├── handlers/
│ ├── middleware/
│ ├── services/
│ ├── db/
│ └── models/
├── ml-model/
│ ├── features/
│ ├── deployment/
│ ├── models/
│ └── requirements.txt
├── ml-service/
│ ├── app.py
│ ├── requirements.txt
│ └── patch/fix utility scripts
└── root patch/fix scripts
- Chrome/Chromium browser (Manifest V3 support)
- Rust toolchain (stable)
- Python 3.9+
- Optional Redis if you want cache enabled
cd ml-service
pip install -r requirements.txt
python app.pyDefault script path binds to http://localhost:8888 in current app.py.
cd backend
cargo runDefault local bind: http://localhost:8080
Current checked-in backend/.env points to ML on 127.0.0.1:8888.
- Open
chrome://extensions - Enable Developer Mode
- Click Load unpacked
- Select the
DP2directory
cd DP2
python -m http.server 8081Then open:
http://localhost:8081/(root test portal)http://localhost:8081/test-payloads/index.html(payload index)
curl http://localhost:8080/health
curl http://localhost:8888/healthIf ML is instead on 8000, update backend/.env accordingly:
ML_SERVICE_URL=http://127.0.0.1:8000| Payload | File | Intent |
|---|---|---|
| Visual spoofing | test-payloads/1_visual_spoof.html |
Brand impersonation + credential lure |
| NLP spear phishing | test-payloads/2_nlp_spear_phishing.html |
Urgency + financial language trigger |
| Clickjacking + obfuscation | test-payloads/3_clickjack_obfuscation.html |
Overlay + eval-heavy script trigger |
| Web3 drainer (legacy test artifact) | test-payloads/4_web3_drainer.html |
Legacy wallet-drain simulation page |
- Open extension popup to confirm status and recent events
- Open
dashboard.htmlto validate:- global stats
- history table
- chart rendering
- service status indicators
- Control-plane token gating for
/api/*routes - Origin-bound bootstrap for extension identity check
- AES-GCM encrypted URL payload logging from extension before analytics write
- Optional GeoIP enrichment gate (
ALLOW_CLIENT_IP_ANALYTICS) - Payload size limits and per-route rate limiting middleware
- Keep sensitive browsing details encrypted in analytics path
- Favor local-first operation
- Limit API surface by requiring locally bootstrapped token
Runtime behavior and performance are shaped by:
- Redis cache hit path in Rust gateway (
cache.rs) - ML inference timeout settings (
ml_client.rs) - feature extraction overhead in Python (
production_feature_extractor.py) - dashboard polling frequency (
app.jscurrently every 2 seconds)
Source-level metrics snapshot (excluding build artifacts):
- Total lines (
.js,.py,.rs,.html,.css,.sql,.md): 23,454 - File counts:
- JavaScript: 31
- Python: 21
- Rust: 26
- HTML: 8
- CSS: 2
- SQL: 6
- Markdown: 4
This section is intentionally explicit to keep maintainers and reviewers aligned with current code reality.
-
ML port mismatch risk
ml-service/app.pyscript mode binds to8888- printed startup text still references
8000 .env.examplealso references8000
-
Dashboard update method
- Dashboard currently uses polling every 2 seconds (
app.js) - SSE endpoint exists (
/api/user/:id/threats/live) but is not currently wired into dashboard code
- Dashboard currently uses polling every 2 seconds (
-
Action routing mismatch from
content_script.js- content script reports actions such as
suspiciousActivity,statusReport,popupAttempt,visibilityChange - background message switch does not currently implement dedicated handlers for those action names
- content script reports actions such as
-
Legacy Web3 remnants
- Active Web3 interceptor engine was removed from the runtime phase block
- Legacy constants/artifacts remain in warning maps and test payload files
-
Patch utility script drift
- Multiple
patch_*.js/fix_*.jsscripts were one-off repair tools - Some contain hardcoded paths from prior workspace layout and are not production runtime code
- Multiple
-
Stats endpoint limitations
/api/statscurrently reports cache stats with placeholder behavior incache.get_stats()
-
SQL/documentation divergence
- Root SQL files include older PostgreSQL-focused setup patterns
- Runtime backend currently defaults to SQLite unless overridden
Check:
- Rust server running on
:8080 - Browser can reach
GET /health - Control-plane token present in extension storage
Likely ML port mismatch.
Verify:
backend/.env->ML_SERVICE_URL- actual ML bind port from
app.pyor uvicorn command
Control-plane token invalid or absent.
Resolution:
- reload extension
- allow bootstrap to run from extension origin
- ensure
Originrules in backend are satisfied
Check:
- tokened calls from popup/dashboard are succeeding
- writes to
/api/user/:id/activityare returning 200 - SQLite database file is writable and schema present
Current implementation sends closeCurrentTab to background and attempts window.close() in content script.
Verify:
- extension reloaded after latest background/content updates
tabspermission still present in manifest
Repository includes many patch/fix scripts from iterative development and debugging sessions.
Examples:
patch_dashboard_interval.jspatch_bg_close_tab.jspatch_web3.jsfix_safety_abort.jsremove_web3*.jsml-service/patch_*.jsandml-service/fix_*.js
Guidance:
- Treat them as maintenance utilities, not core runtime dependencies
- Review script internals before executing in a different environment
- Prefer small focused commits
- Keep runtime changes separate from docs-only changes
- Include reproducible validation steps in PR description
cd backend
cargo fmt
cargo clippy -- -D warnings
cargo testcd ml-service
python -m pip install -r requirements.txt
python -m py_compile app.py- Reload extension in
chrome://extensions - Open popup and dashboard
- Validate
/health,/api/stats/global,/api/user/:id/analyticsall respond
MIT License.
Project author: Sri Vishnu ---2523