Real-Time Scam Call Detection

A full-stack web application that analyses phone calls in real-time to detect scam indicators — using browser microphone capture, ElevenLabs speech-to-text, and Google Gemini for AI-powered analysis.

Overview

This application provides real-time detection of common scam patterns in phone conversations:

Audio Capture — Records microphone input in chunks using the Web Audio API
Speech-to-Text — Converts audio chunks to text via the ElevenLabs API
Context Management — Maintains a rolling 30-second window of conversation history
AI Analysis — Uses Google Gemini to detect scam indicators in the transcript
Real-Time Feedback — Displays risk levels (Low / Medium / High) with explanations

Architecture

┌──────────────────────────────────────────────────────────────┐
│                     Frontend (HTML/JS)                        │
│         Web Audio API · Transcript display · Risk UI         │
└────────────────────────┬─────────────────────────────────────┘
                         │ HTTP
┌────────────────────────▼─────────────────────────────────────┐
│                  Backend (FastAPI · Python)                    │
│  ┌─────────────────┬──────────────────┬────────────────────┐  │
│  │  Transcription  │ Context Manager  │   LLM Analysis     │  │
│  │  (ElevenLabs)   │  (30 s rolling)  │ (Google Gemini)    │  │
│  └─────────────────┴──────────────────┴────────────────────┘  │
│                          │    ▲                                │
│              RAG miss    │    │ store medium/high results      │
│                          ▼    │                                │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │         RAG Cache (ChromaDB · all-MiniLM-L6-v2)          │ │
│  │   query_rag() → hit: skip LLM   store_attack() → persist │ │
│  └──────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

Key endpoints:

Method	Path	Description
`POST`	`/process-audio`	Transcribe a base64-encoded audio chunk
`POST`	`/analyze`	Analyse a transcript for scam indicators
`POST`	`/reset`	Clear context and start a new call session
`GET`	`/health`	Health check

Tech Stack

Layer	Technology
Backend	Python 3.12, FastAPI, asyncio
Frontend	HTML5, vanilla JavaScript, Web Audio API
Speech-to-Text	ElevenLabs API
AI Analysis	Google Gemini API
Containerisation	Docker (python:3.12-slim, multi-stage build)
CI/CD	Jenkins — 7-stage pipeline → Docker Hub (`sdalal11/scam-detection-api`)
RAG Cache	ChromaDB · sentence-transformers (`all-MiniLM-L6-v2`) · cosine similarity
Testing	pytest, pytest-asyncio, pytest-cov — 94 % coverage

Prerequisites

Python 3.10+ or Docker Desktop
A modern browser with microphone support (Chrome / Firefox recommended)
API keys:
- ElevenLabs
- Google Gemini

Quick Start

Option A: Docker (Recommended)

The fastest way to run the backend without setting up a Python environment.

1. Create your .env file

cp backend/.env.example .env
# Open .env and fill in your API keys:
#   ELEVENLABS_API_KEY=your_key_here
#   GEMINI_API_KEY=your_key_here

2. Pull and run the pre-built image

docker pull sdalal11/scam-detection-api:latest

docker run -d \
  --name scam-detection-api \
  --env-file .env \
  -p 8000:8000 \
  sdalal11/scam-detection-api:latest

3. Verify the backend is running

curl http://localhost:8000/health
# → {
#     "status": "ok",
#     "service": "scam-detection-api",
#     "context_items": 0,
#     "rag": { "total_stored": 2, "threshold": 0.4, "model": "all-MiniLM-L6-v2" }
#   }

4. Start the frontend

cd frontend
python3 -m http.server 3000
# Open http://localhost:3000 in your browser

Stop when done:

docker stop scam-detection-api && docker rm scam-detection-api

Why do I need to provide API keys?
The .env file is excluded from the Docker image (via .dockerignore) and from source control (via .gitignore). Each user must supply their own keys — never hardcode secrets in images or repositories.

Option B: Run from Source

1. Set up the backend

cd backend
python3 -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your API keys

2. Start the backend

python main.py
# Uvicorn running on http://0.0.0.0:8000

3. Start the frontend

cd ../frontend
python3 -m http.server 3000
# Open http://localhost:3000

See DOCKER.md for full Docker usage and JENKINS.md for CI/CD setup.

API Reference

Full interactive docs: http://localhost:8000/docs (Swagger UI)

`POST /process-audio`

// Request
{ "audio_data": "<base64>", "chunk_id": 1 }

// Response 200
{ "transcript": "Hello, this is your bank calling...", "chunk_id": 1, "timestamp": 1234567890.1 }

// Response 503
{ "detail": "Transcription failed" }

`POST /analyze`

// Request
{ "transcript": "Verify your account immediately", "context": "..." }

// Response 200 — LLM path (first time seen)
{
  "transcript": "Verify your account immediately",
  "risk_level": "high",
  "scam_type": "bank scam",
  "reasons": ["Urgency detected", "Account verification request"],
  "confidence": 0.95,
  "source": "llm",
  "similarity": null
}

// Response 200 — RAG cache hit (pattern seen before)
{
  "transcript": "Verify your account immediately",
  "risk_level": "high",
  "scam_type": "bank scam",
  "reasons": ["[Previously seen] Urgency detected", "[Previously seen] Account verification request"],
  "confidence": 0.95,
  "source": "rag",
  "similarity": 0.97
}

`POST /reset`

// Response 200
{ "message": "Context reset for new call" }

`GET /health`

// Response 200
{ "status": "ok", "service": "scam-detection-api", "context_items": 5, "rag": { "total_stored": 12, "threshold": 0.4, "model": "all-MiniLM-L6-v2" } }

How It Works

Audio Capture — The browser records audio in 4-second chunks using the Web Audio API with noise suppression and echo cancellation, encodes each chunk as base64, and POSTs it to /process-audio.
Transcription — The backend decodes the base64 audio and forwards it to ElevenLabs (scribe_v2 model). The resulting transcript is returned and added to the in-memory context window.
Context Management — A rolling 30-second window of transcript history is maintained in memory. Entries older than 300 seconds are automatically removed, and the window is capped at 10 items.
Scam Detection — Before calling the LLM, the transcript is encoded with all-MiniLM-L6-v2 and queried against a local ChromaDB vector store. If a similar pattern has been seen before (cosine distance ≤ 0.40), the cached result is returned immediately with source: "rag" — no API call needed. On a cache miss, the transcript and context are sent to Google Gemini. Medium and high-risk results are automatically persisted to the vector store for future hits.
UI Updates — The frontend displays the live transcript, updates the colour-coded risk indicator (green / amber / red), and lists detected indicators with a confidence score.

Testing

cd backend
source .venv/bin/activate
python -m pytest tests/unit/ -v --cov

Coverage summary (94 % overall):

Module	Statements	Coverage
`config.py`	15	100 %
`models.py`	27	100 %
`services/context_manager.py`	44	100 %
`services/rag_store.py`	84	99 %
`services/transcription.py`	55	93 %
`services/llm_analysis.py`	62	87 %
`main.py`	138	68 %
Total	988	94 %

Simulate a scam call without speaking:

curl -X POST http://localhost:8000/analyze \
  -H "Content-Type: application/json" \
  -d '{"transcript": "Verify your credit card details now or your account will be suspended"}'
# Expected: high risk — urgency + financial info indicators

Project Structure

scam-detection-app/
├── Jenkinsfile                  # CI/CD pipeline (7 stages → Docker Hub)
├── ARCHITECTURE.md              # Detailed architecture notes
├── DOCKER.md                    # Docker usage guide
├── JENKINS.md                   # Jenkins setup guide
│
├── backend/
│   ├── main.py                  # FastAPI application
│   ├── config.py                # Environment variable management
│   ├── models.py                # Pydantic request/response models
│   ├── requirements.txt         # Python dependencies
│   ├── pytest.ini               # Test configuration
│   ├── Dockerfile               # Multi-stage production build
│   ├── .env.example             # Environment variable template
│   │
│   ├── services/
│   │   ├── context_manager.py   # Rolling conversation context
│   │   ├── llm_analysis.py      # Google Gemini integration
│   │   ├── rag_store.py         # ChromaDB vector cache (query_rag / store_attack)
│   │   └── transcription.py     # ElevenLabs integration
│   │
│   ├── data/rag_db/             # Persisted ChromaDB vector store
│   │
│   └── tests/unit/
│       ├── test_config.py
│       ├── test_context_manager.py
│       ├── test_llm_analysis.py
│       ├── test_main_flows.py
│       ├── test_rag_store.py
│       └── test_transcription.py
│
└── frontend/
    └── index.html               # Single-page application

Configuration

backend/.env

ELEVENLABS_API_KEY=your_elevenlabs_key
GEMINI_API_KEY=your_gemini_key
BACKEND_HOST=0.0.0.0
BACKEND_PORT=8000
FRONTEND_URL=http://localhost:3000

Frontend — edit these constants at the top of frontend/index.html:

const API_ENDPOINT      = 'http://localhost:8000';
const CHUNK_DURATION_MS = 4000;
const SAMPLE_RATE       = 16000;

Troubleshooting

Symptom	Fix
Backend won't start	Check Python ≥ 3.10; run `pip install -r requirements.txt`; verify port 8000 is free (`lsof -i :8000`)
"Backend not available" in UI	Confirm `python main.py` is running; check CORS settings in `config.py`
Microphone not working	Allow microphone in browser settings; use Chrome or Firefox; HTTPS required in production
"API Key not configured"	Verify `.env` exists in `backend/`; no extra spaces or quotes around keys; restart backend after editing
ElevenLabs errors	Check key validity at elevenlabs.io; verify quota; confirm key starts with `xi_`
Gemini errors	Regenerate key at aistudio.google.com; check rate limits
Container already exists	`docker rm -f scam-detection-api` then re-run

Security & Privacy

All secrets are loaded from environment variables — never hardcoded
Audio data is sent only to ElevenLabs and Google (no third-party storage)
Conversation context is held in memory only and expires after 30 seconds
Docker image runs as a non-root user
HTTPS is required for microphone access in production environments

Disclaimer

This tool is designed to assist in identifying potential scam patterns and should not be used as the sole indicator of fraudulent activity. Always verify caller identity through official channels and report suspected scams to the relevant authorities:

FTC (US): https://reportfraud.ftc.gov/
IC3: https://www.ic3.gov/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time Scam Call Detection

Table of Contents

Overview

Architecture

Tech Stack

Prerequisites

Quick Start

Option A: Docker (Recommended)

Option B: Run from Source

API Reference

`POST /process-audio`

`POST /analyze`

`POST /reset`

`GET /health`

How It Works

Testing

Project Structure

Configuration

Troubleshooting

Security & Privacy

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
DOCKER.md		DOCKER.md
JENKINS.md		JENKINS.md
Jenkinsfile		Jenkinsfile
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Real-Time Scam Call Detection

Table of Contents

Overview

Architecture

Tech Stack

Prerequisites

Quick Start

Option A: Docker (Recommended)

Option B: Run from Source

API Reference

POST /process-audio

POST /analyze

POST /reset

GET /health

How It Works

Testing

Project Structure

Configuration

Troubleshooting

Security & Privacy

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /process-audio`

`POST /analyze`

`POST /reset`

`GET /health`

Packages