Skip to content

sdalal11/scam-detection-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-Time Scam Call Detection

A full-stack web application that analyses phone calls in real-time to detect scam indicators — using browser microphone capture, ElevenLabs speech-to-text, and Google Gemini for AI-powered analysis.

Table of Contents


Overview

This application provides real-time detection of common scam patterns in phone conversations:

  • Audio Capture — Records microphone input in chunks using the Web Audio API
  • Speech-to-Text — Converts audio chunks to text via the ElevenLabs API
  • Context Management — Maintains a rolling 30-second window of conversation history
  • AI Analysis — Uses Google Gemini to detect scam indicators in the transcript
  • Real-Time Feedback — Displays risk levels (Low / Medium / High) with explanations

Architecture

┌──────────────────────────────────────────────────────────────┐
│                     Frontend (HTML/JS)                        │
│         Web Audio API · Transcript display · Risk UI         │
└────────────────────────┬─────────────────────────────────────┘
                         │ HTTP
┌────────────────────────▼─────────────────────────────────────┐
│                  Backend (FastAPI · Python)                    │
│  ┌─────────────────┬──────────────────┬────────────────────┐  │
│  │  Transcription  │ Context Manager  │   LLM Analysis     │  │
│  │  (ElevenLabs)   │  (30 s rolling)  │ (Google Gemini)    │  │
│  └─────────────────┴──────────────────┴────────────────────┘  │
│                          │    ▲                                │
│              RAG miss    │    │ store medium/high results      │
│                          ▼    │                                │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │         RAG Cache (ChromaDB · all-MiniLM-L6-v2)          │ │
│  │   query_rag() → hit: skip LLM   store_attack() → persist │ │
│  └──────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

Key endpoints:

Method Path Description
POST /process-audio Transcribe a base64-encoded audio chunk
POST /analyze Analyse a transcript for scam indicators
POST /reset Clear context and start a new call session
GET /health Health check

Tech Stack

Layer Technology
Backend Python 3.12, FastAPI, asyncio
Frontend HTML5, vanilla JavaScript, Web Audio API
Speech-to-Text ElevenLabs API
AI Analysis Google Gemini API
Containerisation Docker (python:3.12-slim, multi-stage build)
CI/CD Jenkins — 7-stage pipeline → Docker Hub (sdalal11/scam-detection-api)
RAG Cache ChromaDB · sentence-transformers (all-MiniLM-L6-v2) · cosine similarity
Testing pytest, pytest-asyncio, pytest-cov — 94 % coverage

Prerequisites

  • Python 3.10+ or Docker Desktop
  • A modern browser with microphone support (Chrome / Firefox recommended)
  • API keys:

Quick Start

Option A: Docker (Recommended)

The fastest way to run the backend without setting up a Python environment.

1. Create your .env file

cp backend/.env.example .env
# Open .env and fill in your API keys:
#   ELEVENLABS_API_KEY=your_key_here
#   GEMINI_API_KEY=your_key_here

2. Pull and run the pre-built image

docker pull sdalal11/scam-detection-api:latest

docker run -d \
  --name scam-detection-api \
  --env-file .env \
  -p 8000:8000 \
  sdalal11/scam-detection-api:latest

3. Verify the backend is running

curl http://localhost:8000/health
# → {
#     "status": "ok",
#     "service": "scam-detection-api",
#     "context_items": 0,
#     "rag": { "total_stored": 2, "threshold": 0.4, "model": "all-MiniLM-L6-v2" }
#   }

4. Start the frontend

cd frontend
python3 -m http.server 3000
# Open http://localhost:3000 in your browser

Stop when done:

docker stop scam-detection-api && docker rm scam-detection-api

Why do I need to provide API keys?
The .env file is excluded from the Docker image (via .dockerignore) and from source control (via .gitignore). Each user must supply their own keys — never hardcode secrets in images or repositories.


Option B: Run from Source

1. Set up the backend

cd backend
python3 -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your API keys

2. Start the backend

python main.py
# Uvicorn running on http://0.0.0.0:8000

3. Start the frontend

cd ../frontend
python3 -m http.server 3000
# Open http://localhost:3000

See DOCKER.md for full Docker usage and JENKINS.md for CI/CD setup.


API Reference

Full interactive docs: http://localhost:8000/docs (Swagger UI)

POST /process-audio

// Request
{ "audio_data": "<base64>", "chunk_id": 1 }

// Response 200
{ "transcript": "Hello, this is your bank calling...", "chunk_id": 1, "timestamp": 1234567890.1 }

// Response 503
{ "detail": "Transcription failed" }

POST /analyze

// Request
{ "transcript": "Verify your account immediately", "context": "..." }

// Response 200 — LLM path (first time seen)
{
  "transcript": "Verify your account immediately",
  "risk_level": "high",
  "scam_type": "bank scam",
  "reasons": ["Urgency detected", "Account verification request"],
  "confidence": 0.95,
  "source": "llm",
  "similarity": null
}

// Response 200 — RAG cache hit (pattern seen before)
{
  "transcript": "Verify your account immediately",
  "risk_level": "high",
  "scam_type": "bank scam",
  "reasons": ["[Previously seen] Urgency detected", "[Previously seen] Account verification request"],
  "confidence": 0.95,
  "source": "rag",
  "similarity": 0.97
}

POST /reset

// Response 200
{ "message": "Context reset for new call" }

GET /health

// Response 200
{ "status": "ok", "service": "scam-detection-api", "context_items": 5, "rag": { "total_stored": 12, "threshold": 0.4, "model": "all-MiniLM-L6-v2" } }

How It Works

  1. Audio Capture — The browser records audio in 4-second chunks using the Web Audio API with noise suppression and echo cancellation, encodes each chunk as base64, and POSTs it to /process-audio.

  2. Transcription — The backend decodes the base64 audio and forwards it to ElevenLabs (scribe_v2 model). The resulting transcript is returned and added to the in-memory context window.

  3. Context Management — A rolling 30-second window of transcript history is maintained in memory. Entries older than 300 seconds are automatically removed, and the window is capped at 10 items.

  4. Scam Detection — Before calling the LLM, the transcript is encoded with all-MiniLM-L6-v2 and queried against a local ChromaDB vector store. If a similar pattern has been seen before (cosine distance ≤ 0.40), the cached result is returned immediately with source: "rag" — no API call needed. On a cache miss, the transcript and context are sent to Google Gemini. Medium and high-risk results are automatically persisted to the vector store for future hits.

  5. UI Updates — The frontend displays the live transcript, updates the colour-coded risk indicator (green / amber / red), and lists detected indicators with a confidence score.


Testing

cd backend
source .venv/bin/activate
python -m pytest tests/unit/ -v --cov

Coverage summary (94 % overall):

Module Statements Coverage
config.py 15 100 %
models.py 27 100 %
services/context_manager.py 44 100 %
services/rag_store.py 84 99 %
services/transcription.py 55 93 %
services/llm_analysis.py 62 87 %
main.py 138 68 %
Total 988 94 %

Simulate a scam call without speaking:

curl -X POST http://localhost:8000/analyze \
  -H "Content-Type: application/json" \
  -d '{"transcript": "Verify your credit card details now or your account will be suspended"}'
# Expected: high risk — urgency + financial info indicators

Project Structure

scam-detection-app/
├── Jenkinsfile                  # CI/CD pipeline (7 stages → Docker Hub)
├── ARCHITECTURE.md              # Detailed architecture notes
├── DOCKER.md                    # Docker usage guide
├── JENKINS.md                   # Jenkins setup guide
│
├── backend/
│   ├── main.py                  # FastAPI application
│   ├── config.py                # Environment variable management
│   ├── models.py                # Pydantic request/response models
│   ├── requirements.txt         # Python dependencies
│   ├── pytest.ini               # Test configuration
│   ├── Dockerfile               # Multi-stage production build
│   ├── .env.example             # Environment variable template
│   │
│   ├── services/
│   │   ├── context_manager.py   # Rolling conversation context
│   │   ├── llm_analysis.py      # Google Gemini integration
│   │   ├── rag_store.py         # ChromaDB vector cache (query_rag / store_attack)
│   │   └── transcription.py     # ElevenLabs integration
│   │
│   ├── data/rag_db/             # Persisted ChromaDB vector store
│   │
│   └── tests/unit/
│       ├── test_config.py
│       ├── test_context_manager.py
│       ├── test_llm_analysis.py
│       ├── test_main_flows.py
│       ├── test_rag_store.py
│       └── test_transcription.py
│
└── frontend/
    └── index.html               # Single-page application

Configuration

backend/.env

ELEVENLABS_API_KEY=your_elevenlabs_key
GEMINI_API_KEY=your_gemini_key
BACKEND_HOST=0.0.0.0
BACKEND_PORT=8000
FRONTEND_URL=http://localhost:3000

Frontend — edit these constants at the top of frontend/index.html:

const API_ENDPOINT      = 'http://localhost:8000';
const CHUNK_DURATION_MS = 4000;
const SAMPLE_RATE       = 16000;

Troubleshooting

Symptom Fix
Backend won't start Check Python ≥ 3.10; run pip install -r requirements.txt; verify port 8000 is free (lsof -i :8000)
"Backend not available" in UI Confirm python main.py is running; check CORS settings in config.py
Microphone not working Allow microphone in browser settings; use Chrome or Firefox; HTTPS required in production
"API Key not configured" Verify .env exists in backend/; no extra spaces or quotes around keys; restart backend after editing
ElevenLabs errors Check key validity at elevenlabs.io; verify quota; confirm key starts with xi_
Gemini errors Regenerate key at aistudio.google.com; check rate limits
Container already exists docker rm -f scam-detection-api then re-run

Security & Privacy

  • All secrets are loaded from environment variables — never hardcoded
  • Audio data is sent only to ElevenLabs and Google (no third-party storage)
  • Conversation context is held in memory only and expires after 30 seconds
  • Docker image runs as a non-root user
  • HTTPS is required for microphone access in production environments

Disclaimer

This tool is designed to assist in identifying potential scam patterns and should not be used as the sole indicator of fraudulent activity. Always verify caller identity through official channels and report suspected scams to the relevant authorities:

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors