A full-stack web application that analyses phone calls in real-time to detect scam indicators — using browser microphone capture, ElevenLabs speech-to-text, and Google Gemini for AI-powered analysis.
- Overview
- Architecture
- Tech Stack
- Prerequisites
- Quick Start
- API Reference
- How It Works
- Testing
- Project Structure
- Configuration
- Troubleshooting
- Security & Privacy
- Disclaimer
This application provides real-time detection of common scam patterns in phone conversations:
- Audio Capture — Records microphone input in chunks using the Web Audio API
- Speech-to-Text — Converts audio chunks to text via the ElevenLabs API
- Context Management — Maintains a rolling 30-second window of conversation history
- AI Analysis — Uses Google Gemini to detect scam indicators in the transcript
- Real-Time Feedback — Displays risk levels (Low / Medium / High) with explanations
┌──────────────────────────────────────────────────────────────┐
│ Frontend (HTML/JS) │
│ Web Audio API · Transcript display · Risk UI │
└────────────────────────┬─────────────────────────────────────┘
│ HTTP
┌────────────────────────▼─────────────────────────────────────┐
│ Backend (FastAPI · Python) │
│ ┌─────────────────┬──────────────────┬────────────────────┐ │
│ │ Transcription │ Context Manager │ LLM Analysis │ │
│ │ (ElevenLabs) │ (30 s rolling) │ (Google Gemini) │ │
│ └─────────────────┴──────────────────┴────────────────────┘ │
│ │ ▲ │
│ RAG miss │ │ store medium/high results │
│ ▼ │ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ RAG Cache (ChromaDB · all-MiniLM-L6-v2) │ │
│ │ query_rag() → hit: skip LLM store_attack() → persist │ │
│ └──────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
Key endpoints:
| Method | Path | Description |
|---|---|---|
POST |
/process-audio |
Transcribe a base64-encoded audio chunk |
POST |
/analyze |
Analyse a transcript for scam indicators |
POST |
/reset |
Clear context and start a new call session |
GET |
/health |
Health check |
| Layer | Technology |
|---|---|
| Backend | Python 3.12, FastAPI, asyncio |
| Frontend | HTML5, vanilla JavaScript, Web Audio API |
| Speech-to-Text | ElevenLabs API |
| AI Analysis | Google Gemini API |
| Containerisation | Docker (python:3.12-slim, multi-stage build) |
| CI/CD | Jenkins — 7-stage pipeline → Docker Hub (sdalal11/scam-detection-api) |
| RAG Cache | ChromaDB · sentence-transformers (all-MiniLM-L6-v2) · cosine similarity |
| Testing | pytest, pytest-asyncio, pytest-cov — 94 % coverage |
- Python 3.10+ or Docker Desktop
- A modern browser with microphone support (Chrome / Firefox recommended)
- API keys:
The fastest way to run the backend without setting up a Python environment.
1. Create your .env file
cp backend/.env.example .env
# Open .env and fill in your API keys:
# ELEVENLABS_API_KEY=your_key_here
# GEMINI_API_KEY=your_key_here2. Pull and run the pre-built image
docker pull sdalal11/scam-detection-api:latest
docker run -d \
--name scam-detection-api \
--env-file .env \
-p 8000:8000 \
sdalal11/scam-detection-api:latest3. Verify the backend is running
curl http://localhost:8000/health
# → {
# "status": "ok",
# "service": "scam-detection-api",
# "context_items": 0,
# "rag": { "total_stored": 2, "threshold": 0.4, "model": "all-MiniLM-L6-v2" }
# }4. Start the frontend
cd frontend
python3 -m http.server 3000
# Open http://localhost:3000 in your browserStop when done:
docker stop scam-detection-api && docker rm scam-detection-apiWhy do I need to provide API keys?
The.envfile is excluded from the Docker image (via.dockerignore) and from source control (via.gitignore). Each user must supply their own keys — never hardcode secrets in images or repositories.
1. Set up the backend
cd backend
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your API keys2. Start the backend
python main.py
# Uvicorn running on http://0.0.0.0:80003. Start the frontend
cd ../frontend
python3 -m http.server 3000
# Open http://localhost:3000See DOCKER.md for full Docker usage and JENKINS.md for CI/CD setup.
Full interactive docs: http://localhost:8000/docs (Swagger UI)
// Request
{ "audio_data": "<base64>", "chunk_id": 1 }
// Response 200
{ "transcript": "Hello, this is your bank calling...", "chunk_id": 1, "timestamp": 1234567890.1 }
// Response 503
{ "detail": "Transcription failed" }// Request
{ "transcript": "Verify your account immediately", "context": "..." }
// Response 200 — LLM path (first time seen)
{
"transcript": "Verify your account immediately",
"risk_level": "high",
"scam_type": "bank scam",
"reasons": ["Urgency detected", "Account verification request"],
"confidence": 0.95,
"source": "llm",
"similarity": null
}
// Response 200 — RAG cache hit (pattern seen before)
{
"transcript": "Verify your account immediately",
"risk_level": "high",
"scam_type": "bank scam",
"reasons": ["[Previously seen] Urgency detected", "[Previously seen] Account verification request"],
"confidence": 0.95,
"source": "rag",
"similarity": 0.97
}// Response 200
{ "message": "Context reset for new call" }// Response 200
{ "status": "ok", "service": "scam-detection-api", "context_items": 5, "rag": { "total_stored": 12, "threshold": 0.4, "model": "all-MiniLM-L6-v2" } }-
Audio Capture — The browser records audio in 4-second chunks using the Web Audio API with noise suppression and echo cancellation, encodes each chunk as base64, and
POSTs it to/process-audio. -
Transcription — The backend decodes the base64 audio and forwards it to ElevenLabs (
scribe_v2model). The resulting transcript is returned and added to the in-memory context window. -
Context Management — A rolling 30-second window of transcript history is maintained in memory. Entries older than 300 seconds are automatically removed, and the window is capped at 10 items.
-
Scam Detection — Before calling the LLM, the transcript is encoded with
all-MiniLM-L6-v2and queried against a local ChromaDB vector store. If a similar pattern has been seen before (cosine distance ≤ 0.40), the cached result is returned immediately withsource: "rag"— no API call needed. On a cache miss, the transcript and context are sent to Google Gemini. Medium and high-risk results are automatically persisted to the vector store for future hits. -
UI Updates — The frontend displays the live transcript, updates the colour-coded risk indicator (green / amber / red), and lists detected indicators with a confidence score.
cd backend
source .venv/bin/activate
python -m pytest tests/unit/ -v --covCoverage summary (94 % overall):
| Module | Statements | Coverage |
|---|---|---|
config.py |
15 | 100 % |
models.py |
27 | 100 % |
services/context_manager.py |
44 | 100 % |
services/rag_store.py |
84 | 99 % |
services/transcription.py |
55 | 93 % |
services/llm_analysis.py |
62 | 87 % |
main.py |
138 | 68 % |
| Total | 988 | 94 % |
Simulate a scam call without speaking:
curl -X POST http://localhost:8000/analyze \
-H "Content-Type: application/json" \
-d '{"transcript": "Verify your credit card details now or your account will be suspended"}'
# Expected: high risk — urgency + financial info indicatorsscam-detection-app/
├── Jenkinsfile # CI/CD pipeline (7 stages → Docker Hub)
├── ARCHITECTURE.md # Detailed architecture notes
├── DOCKER.md # Docker usage guide
├── JENKINS.md # Jenkins setup guide
│
├── backend/
│ ├── main.py # FastAPI application
│ ├── config.py # Environment variable management
│ ├── models.py # Pydantic request/response models
│ ├── requirements.txt # Python dependencies
│ ├── pytest.ini # Test configuration
│ ├── Dockerfile # Multi-stage production build
│ ├── .env.example # Environment variable template
│ │
│ ├── services/
│ │ ├── context_manager.py # Rolling conversation context
│ │ ├── llm_analysis.py # Google Gemini integration
│ │ ├── rag_store.py # ChromaDB vector cache (query_rag / store_attack)
│ │ └── transcription.py # ElevenLabs integration
│ │
│ ├── data/rag_db/ # Persisted ChromaDB vector store
│ │
│ └── tests/unit/
│ ├── test_config.py
│ ├── test_context_manager.py
│ ├── test_llm_analysis.py
│ ├── test_main_flows.py
│ ├── test_rag_store.py
│ └── test_transcription.py
│
└── frontend/
└── index.html # Single-page application
backend/.env
ELEVENLABS_API_KEY=your_elevenlabs_key
GEMINI_API_KEY=your_gemini_key
BACKEND_HOST=0.0.0.0
BACKEND_PORT=8000
FRONTEND_URL=http://localhost:3000Frontend — edit these constants at the top of frontend/index.html:
const API_ENDPOINT = 'http://localhost:8000';
const CHUNK_DURATION_MS = 4000;
const SAMPLE_RATE = 16000;| Symptom | Fix |
|---|---|
| Backend won't start | Check Python ≥ 3.10; run pip install -r requirements.txt; verify port 8000 is free (lsof -i :8000) |
| "Backend not available" in UI | Confirm python main.py is running; check CORS settings in config.py |
| Microphone not working | Allow microphone in browser settings; use Chrome or Firefox; HTTPS required in production |
| "API Key not configured" | Verify .env exists in backend/; no extra spaces or quotes around keys; restart backend after editing |
| ElevenLabs errors | Check key validity at elevenlabs.io; verify quota; confirm key starts with xi_ |
| Gemini errors | Regenerate key at aistudio.google.com; check rate limits |
| Container already exists | docker rm -f scam-detection-api then re-run |
- All secrets are loaded from environment variables — never hardcoded
- Audio data is sent only to ElevenLabs and Google (no third-party storage)
- Conversation context is held in memory only and expires after 30 seconds
- Docker image runs as a non-root user
- HTTPS is required for microphone access in production environments
This tool is designed to assist in identifying potential scam patterns and should not be used as the sole indicator of fraudulent activity. Always verify caller identity through official channels and report suspected scams to the relevant authorities:
- FTC (US): https://reportfraud.ftc.gov/
- IC3: https://www.ic3.gov/