SOLV.ai — AI-Powered Voice Complaint Management System

Intelligent Complaint Classification, Resolution Recommendation & Voice Agent for the Wellness Industry Built for Tark Shaastra | LDCE Hackathon

Why This Exists — The Necessity
The Business Case
What SOLV.ai Does
System Architecture
Technology Stack
Module Architecture & Deep Dive
Why This Architecture Was the Best Choice
LLM Ablation Study — Model Selection
End-to-End Data Flow
Performance Benchmarks
Cost of Scaling & Real-World Implementation
Scope of Improvement
Quick Start Guide
Deployment Modes
Repository Structure
Team Members

Why This Exists — The Necessity

The Problem in the Indian FMCG & Wellness Industry

India's wellness and FMCG sector serves over 1.4 billion consumers across thousands of product lines — from Ayurvedic health supplements to packaged food to personal care. Every day, these companies receive thousands of customer complaints through phone calls, emails, web forms, and walk-ins. The current state of complaint handling is broken:

What happens today:

A customer calls to report a damaged product. They wait on hold for 8-12 minutes. A call center agent manually types the complaint into a spreadsheet. The agent guesses the priority. The complaint sits in a queue for hours or days before anyone reviews it.
An email complaint about an allergic reaction to a product — something that needs immediate attention — gets the same priority as a bulk pricing inquiry because no one reads every email in real time.
The same complaint patterns repeat across weeks and months, but no one connects the dots because there is no systematic classification or trend analysis.

The real cost of this broken process:

Problem	Impact
Manual classification	Agents misfile 15-25% of complaints — wrong category, wrong priority, wrong team
No priority triage	Critical complaints (allergic reactions, contamination) treated the same as low-priority inquiries
Phone call overhead	60-70% of call center time is spent on data entry, not resolution
No resolution guidance	Junior agents don't know what resolution steps to take — they escalate everything
Zero real-time visibility	Managers learn about complaint spikes days later, after SLA deadlines have passed
Language barriers	Indian consumers speak in mixed Hindi-English; existing systems can't handle code-switched speech
No audit trail	When a complaint is disputed, there's no timeline of who did what and when

What We Set Out to Solve

We asked a simple question: What if every customer complaint — whether spoken on a phone call or typed in a form — could be classified, prioritized, and given a specific resolution plan in under 5 seconds, with zero human intervention?

That question drove every architectural decision in SOLV.ai.

What Our Team Contributed

This project was built from scratch in a hackathon timeframe by a team of four second-year engineering students. Every line of code, every model decision, every system integration was designed, implemented, and tested by us:

Contribution	What Was Done
5 independent microservices	Each with its own API, tests, Docker config, and documentation
ONNX-accelerated NLP pipeline	Dual-model ensemble (DistilBERT + MiniLM) running at ~12ms per prediction
10-model LLM ablation study	120 API calls across 4 scenarios and 3 tasks, with auto-generated comparative graphs
Real-time voice agent	End-to-end phone call handling with STT, dialog extraction, classification, resolution, and ticket creation
Edge-deployable system	Entire stack runs offline on 4GB RAM — no cloud dependency required
Full-stack web application	Next.js 16 with role-based dashboards, SSE real-time updates, Prisma ORM, and Supabase
Production-grade security	4-layer guardrails, prompt injection detection, JWT auth, rate limiting, input validation

The Business Case

Why SOLV.ai Generates Value — Beyond Cost Savings

When a company implements SOLV.ai, the ROI is not just financial. It spans time, quality, compliance, and customer retention.

Financial Impact

Current Cost	With SOLV.ai	Savings
Call center agent: ~INR 18,000/month per agent	AI handles classification + resolution in 5s	60-70% reduction in agent workload — fewer agents needed for same volume
Misclassification rework: ~15-25% complaints need re-routing	NLP ensemble achieves 100% category accuracy (ablation-tested)	Eliminates rework — complaints reach the right team on first pass
SLA breach penalties: Companies pay contractual penalties for missed deadlines	Auto-priority with SLA tracking + real-time SSE alerts	Near-zero SLA breaches — high-priority complaints are flagged instantly
Training new agents: 2-4 weeks onboarding per agent	AI generates specific resolution steps for every complaint	Day-1 productivity — new agents follow AI-recommended actions

Time Impact

Process	Before SOLV.ai	After SOLV.ai	Time Saved
Complaint classification	2-5 minutes (manual reading + categorization)	12 milliseconds (ONNX inference)	99.9%
Resolution planning	10-30 minutes (lookup manuals, ask seniors)	1.4 seconds (LLM with context)	99%
Phone complaint intake	8-15 minutes (manual transcription + data entry)	2-4 seconds per turn (STT + dialog extraction)	90%
Manager dashboard review	End-of-day reports, often next-day	Real-time (SSE push to dashboards)	Instant
Trend detection	Weekly/monthly manual reviews	Continuous (DailyMetric aggregation)	Always current

Operational Impact

Factor	Benefit
Consistency	Every complaint gets the same rigorous classification — no human bias, no fatigue-driven mistakes
Scalability	Handle 10x complaint volume without hiring 10x agents
Audit compliance	Every complaint has a timestamped timeline (ComplaintTimeline model) — who did what, when
Multi-channel unification	Phone, email, web, walk-in — all feed into the same pipeline, same dashboard
Customer satisfaction	Faster response + correct first-time resolution = higher NPS scores
Data-driven decisions	Trend analytics reveal systemic issues (e.g., "Packaging complaints spiked 300% this month for Product X")

The Bottom Line

A mid-size FMCG company handling 500 complaints/day could expect:

INR 15-25 lakh/year in direct labor cost savings
70% reduction in average complaint resolution time
Near-zero misclassification rate (currently 15-25%)
Real-time operational visibility instead of next-day reports

What SOLV.ai Does

SOLV.ai is a production-grade, multi-layer AI system that ingests customer complaints via voice, text, or web — classifies them with ONNX-accelerated NLP, generates empathetic resolution plans with LLMs, and persists everything to a role-based dashboard with SLA tracking.

Key Metrics

Metric	Value
NLP Classification Accuracy	100% category accuracy (ablation-tested)
NLP Inference Latency	~12ms per prediction (ONNX + CUDA)
LLM Resolution Latency	~1.4s average (Groq Llama 3.3 70B)
Ablation Study Scale	10 models x 4 scenarios x 3 tasks = 120 API calls
Ablation Winner Score	96.9% (Llama 3.3 70B on Groq)
Voice Agent Latency	2-4s per turn (end-to-end, CPU)
Edge RAM Footprint	~2.8GB (runs on 4GB hardware)
Cost per NLP Prediction	~$0.00000175 on AWS T4 ($1.83 per million)
Dashboard Roles	4 (Admin, Operational, Call Center, QA)
SLA Tracking	Automated (High: 4h, Medium: 8h, Low: 24h)

System Architecture

                              USER
                               |
              +----------------+----------------+
              |                |                |
         PHONE CALL        WEB FORM         WALK-IN
              |                |                |
              v                v                v
  +-------------------+  +-----------------------------+
  |   TWILIO          |  |   WEBSITE (Next.js :3000)   |
  |   Media Stream    |  |   - Landing page            |
  |   WebSocket PCM16 |  |   - Role-based dashboards   |
  +--------+----------+  |   - Complaint intake API    |
           |             |   - SSE real-time updates   |
           v             |   - Supabase/PostgreSQL     |
  +-------------------+  +--------------+--------------+
  |  ORCHESTRATOR     |                 |
  |  (:8003)          |                 |
  |                   |                 |
  |  - FSM State Mgmt |                 |
  |  - Agent Router   |                 |
  |  - LLM: Ollama/Groq|                |
  |  - TTS: Piper/Edge|                 |
  +--------+----------+                 |
           |                            |
    +------+----------+                 |
    |      |          |                 |
    v      v          v                 v
+------+ +------+ +--------+    +--------------+
| STT  | | NLP  | | GenAI  |    |  SUPABASE /  |
|:8001 | |:8002 | |:8004   |    |  PostgreSQL  |
|Whisper| |ONNX  | |Groq    |    |  + Prisma    |
|+VAD  | |+VADER| |Llama70B|    |              |
+------+ +------+ +--------+    +--------------+

The system is composed of five independent microservices, each self-contained with its own API, deployment configuration, and documentation.

Technology Stack

Core Technologies

Layer	Technology	Purpose
NLP Classifier	Python, FastAPI, ONNX Runtime, CUDA, DistilBERT-MNLI, MiniLM-L6, VADER, scikit-learn	GPU-accelerated complaint classification (category + sentiment + priority)
GenAI Engine	Python, FastAPI, Groq API, Llama 3.3 70B, LangSmith	LLM-powered resolution recommendations, email generation, guardrails
Speech-to-Text	Python, FastAPI, Faster-Whisper (CTranslate2), Silero VAD (ONNX)	Audio transcription with voice activity detection
Voice Agent	Python, FastAPI, Ollama, Piper TTS, Edge TTS, Twilio	Multi-agent orchestrator for phone call handling
Website	Next.js 16, React 19, TypeScript, Tailwind CSS v4, Prisma ORM, Supabase	Role-based dashboards, complaint intake, analytics
Database	PostgreSQL (Supabase)	Cloud-hosted relational database with row-level security
Observability	LangSmith (LangChain), Prometheus	LLM tracing, latency monitoring, request metrics

AI & ML Models

Model	Type	Size	Purpose
DistilBERT-MNLI	Zero-Shot NLI	~260MB	Category classification via entailment
MiniLM-L6	Sentence Embeddings	~80MB	Semantic similarity classification
VADER	Lexicon-based	~500KB	Sentiment scoring
DecisionTree	sklearn	~10KB	Priority prediction (5 features, depth=6)
Llama 3.3 70B	LLM (Groq)	70B params	Resolution generation, dialog, tickets
phi3.5	LLM (Ollama, local)	1.5B params	Offline fallback for voice agent
Faster-Whisper Tiny	ASR	~75MB	Speech-to-text transcription
Silero VAD	RNN	~400KB	Voice activity detection
Piper TTS	Neural TTS	~30MB	Offline text-to-speech

Module Architecture & Deep Dive

1. GenAI Resolution Engine (`genai/`)

Port: 8004 | Stack: FastAPI + Groq Llama 3.3 70B + LangSmith

The GenAI layer takes the NLP classifier's output (category, sentiment, priority) and uses the ablation-study-selected LLM to generate empathetic, actionable resolution recommendations.

Architecture

genai/
|
+-- main.py              FastAPI server (4 endpoints)
+-- config.py            Environment configuration
+-- llm.py               Groq/OpenAI-compatible LLM client + LangSmith tracing
+-- models.py            Pydantic schemas (ClassifierOutput, ResolutionResponse)
+-- prompts.py           System + user prompt templates
+-- guardrails.py        4-layer security (sanitize, inject-detect, validate, parse)
+-- email_html.py        Branded HTML email generator (solv.ai theme)
+-- run_server.py        Startup script
+-- requirements.txt
|
+-- comparative_analysis/
    +-- config.py         10-model configuration
    +-- run_ablation.py   120 test cases across 4 scenarios x 3 tasks
    +-- evaluate.py       5-metric auto-evaluation
    +-- generate_report.py   8 comparative graphs
    +-- test_scenarios.py    Complaint test data
    +-- prompts.py           Task-specific prompts
    +-- model_clients.py     Multi-provider API clients
    +-- graphs/              8 PNG charts (auto-generated)
    +-- results/             Raw JSON results

Processing Pipeline

RECEIVE   Classifier output (complaint_id, text, category, sentiment, priority)
    |
SANITIZE  Strip control chars, escape HTML, truncate (2000 chars)
    |
DETECT    Scan for prompt injection (13 regex patterns)
    |
VALIDATE  Verify category/priority/sentiment are in valid ranges
    |
BUILD     Construct system + user prompt with complaint context
    |
LLM CALL  Groq Llama 3.3 70B (temp=0.25, max_tokens=2048)
    |
PARSE     Extract JSON (direct parse -> markdown fence -> first {...} block)
    |
VALIDATE  Cross-check: required fields, team validity, escalation rules
    |
RESPOND   Structured ResolutionResponse with SLA + metadata

Key Features

Feature	Description
Empathetic Responses	AI-generated customer messages tailored to complaint severity and sentiment
Role-Tagged Steps	Each action step assigned to a specific team (Support, QA, Logistics, Sales)
Root Cause Hypothesis	AI generates initial root cause analysis
Branded HTML Emails	solv.ai black+orange themed customer reply emails, Gmail-paste-ready
4-Layer Guardrails	Input sanitization, prompt injection detection (13 regex rules), output validation, safe JSON parsing
LangSmith Observability	Full LLM call tracing — prompts, outputs, latency, token usage

API Endpoints

Method	Path	Description
`GET`	`/health`	Service health with LLM connectivity check
`POST`	`/resolve`	Full resolution from classifier output + customer metadata
`POST`	`/resolve/quick`	Shorthand — accepts raw classifier output
`POST`	`/reply/email-html`	Generate branded HTML email from resolution data

SLA Management

Priority	Response SLA	Resolution SLA	Escalation
High	1 hour	4 hours	Mandatory — immediate to senior management
Medium	4 hours	24 hours	If unresolved within 12 hours
Low	24 hours	72 hours	Only if complaint is repeated

2. NLP Text Classifier (`text_classifier/`)

Port: 8002 | Stack: FastAPI + ONNX Runtime + CUDA + VADER + scikit-learn

The NLP layer is the first stage of the processing pipeline — classifying raw complaint text into categories, computing sentiment, and predicting priority in ~12ms using ONNX-accelerated inference.

Architecture

text_classifier/
|
+-- server.py              FastAPI server (5 endpoints + Prometheus metrics)
+-- inference_engine.py    Dual-model ensemble engine (ONNX inference)
+-- run_server.py          Startup script (CLI args: port, host, workers)
+-- requirements.txt
+-- Dockerfile
+-- brief.md
|
+-- models/                Pre-trained ONNX models
|   +-- distilbert-mnli/   Zero-shot NLI model
|   +-- minilm-l6/         Sentence embedding model
|   +-- priority_tree.pkl  DecisionTree classifier
|
+-- data/                  Reference embeddings + training data

Inference Pipeline

INPUT TEXT
    |
    +---> [DistilBERT-MNLI ONNX] ---> Zero-shot entailment logits --------+
    |                                                                      |
    +---> [MiniLM-L6 ONNX] ---------> 384-dim embedding -> Cosine sim ----+
    |                                                                      |
    |                                                          ENSEMBLE   |
    |                                                          (50/50)    |
    |                                                             |       |
    |                                                   +---------v------+
    |                                                   |  CATEGORY      |
    |                                                   |  Trade/Product/|
    |                                                   |  Packaging     |
    |                                                   +--------+-------+
    |                                                            |
    +---> [VADER Lexicon] ---------> Sentiment score [-1, +1] ---+
    |                                                            |
    |                                                   +--------v-------+
    |                                                   |  PRIORITY      |
    |                                                   |  High/Med/Low  |
    |                                                   |  DecisionTree  |
    |                                                   |  (5 features)  |
    |                                                   +----------------+
    v
OUTPUT: {complaint_id, text, category, sentiment_score, priority, latency_ms}

Mathematical Foundations

Why 50/50 Ensemble? Zero-shot NLI captures semantic reasoning (does the text logically entail the category?), while similarity captures surface-level pattern matching (is the text similar to known examples?). Both signals are complementary — the ensemble achieves 100% accuracy in our ablation tests.

How classification works:

DistilBERT-MNLI — For each category, constructs a hypothesis ("This text is about Packaging") and measures entailment probability via softmax over logits
MiniLM-L6 — Encodes text to 384-dim embedding, computes cosine similarity to reference embeddings, normalizes to probabilities
Ensemble — 50/50 weighted average of both probability distributions
VADER — Lexicon-based compound sentiment score with negation, intensifier, and punctuation rules
DecisionTree — 5 features (sentiment, |sentiment|, category, text_length, word_count), max_depth=6, O(1) inference

Computational Efficiency

Approach	Latency	GPU Memory	Cost per 1M predictions
Our ONNX+CUDA system	~12ms	~500MB	$1.83
PyTorch eager (no ONNX)	~35ms	~800MB	$5.30
Full BERT-base	~25ms	~1GB	$3.79
GPT-3.5 API call	~1500ms	N/A	~$1,500

3. Speech-to-Text Service (`stt/`)

Port: 8001 | Stack: FastAPI + Faster-Whisper (CTranslate2) + Silero VAD (ONNX)

The STT layer converts spoken audio into transcribed text — the voice input gateway for the entire system.

Architecture

stt/
|
+-- server.py              FastAPI server (5 endpoints including WebSocket)
+-- inference_engine.py    Whisper inference + VAD + text post-processing
+-- run_server.py          Startup script
+-- requirements.txt
+-- Dockerfile
+-- brief.md
|
+-- models/                Auto-downloaded on first run
    +-- silero_vad.onnx    Voice Activity Detection model (~400KB)

Processing Pipeline

AUDIO INPUT (wav/mp3/ogg/flac/PCM16)
    |
    v
[1. AUDIO PREPROCESSING]
    - Resample to 16kHz (librosa)
    - Peak normalisation
    - Decode PCM16 -> float32 (/32768)
    |
    v
[2. SILENCE REMOVAL - Silero VAD ONNX]
    - Sliding window: 512 samples at 16kHz (~32ms)
    - Speech probability threshold: 0.5
    - RNN state propagation (h, c vectors: 2x1x64)
    |
    v
[3. TRANSCRIPTION - Faster-Whisper Tiny]
    - CTranslate2 backend (INT8/FP16 quantised)
    - Beam search: beam_size=1 (greedy, fastest)
    - Device auto-detect: CUDA -> CPU fallback
    |
    v
[4. TEXT POST-PROCESSING]
    - Remove filler words (um, uh, uhh, er, ah)
    - Number word -> digit conversion (one->1)
    - Sentence case capitalisation
    - Punctuation fixing
    |
    v
OUTPUT: {text, confidence, latency_ms, model_used}

Streaming WebSocket

For real-time phone calls, /ws/transcribe uses a fixed-window chunker with overlap:

Window: 4000ms (64,000 samples at 16kHz)
Overlap: 450ms (7,200 samples) — prevents word-boundary truncation
Flush on disconnect — remaining buffer transcribed with is_final: true

Model Rationale

Component	Model	Size	Why Chosen
Transcription	Faster-Whisper Tiny	~75MB	Smallest Whisper variant — optimal for edge on 4GB RAM. CTranslate2 gives 4x speedup via INT8/FP16 quantisation
VAD	Silero VAD (ONNX)	~400KB	Ultra-lightweight RNN. 32ms window granularity. Runs in microseconds on CPU

4. Voice Agent Orchestrator (`voice-agent/`)

Port: 8003 | Stack: FastAPI + Ollama + Piper TTS + Edge TTS + Twilio

The Voice Agent is the central orchestration layer — managing the entire voice complaint lifecycle from dial-in to ticket creation.

Architecture

voice-agent/
|
+-- orchestrator/
|   +-- main.py              FastAPI server (webhooks, WebSocket, test endpoints)
|   +-- config.py            Service URLs, Twilio, LLM, TTS configuration
|   +-- requirements.txt
|   +-- Dockerfile
|   |
|   +-- agents/              5 specialised agents
|   |   +-- dialog.py        Multi-turn complaint extraction (LLM)
|   |   +-- classify_client.py   HTTP client to NLP classifier
|   |   +-- resolve.py       Resolution step generation (LLM)
|   |   +-- ticket_client.py HTTP client to backend API
|   |   +-- genai_client.py  HTTP client to GenAI service
|   |   +-- llm_router.py    Ollama <-> Groq fallback chain
|   |   +-- stt_client.py    HTTP client to STT service
|   |   +-- http_pool.py     Connection pooling
|   |
|   +-- pipeline/            Call flow state machine
|   |   +-- coordinator.py   FSM state management (6 states)
|   |   +-- session.py       Session data + conversation history
|   |   +-- confidence.py    Extraction confidence scoring
|   |
|   +-- telephony/           Twilio integration
|   |   +-- twilio_handler.py  Webhook + media stream handler
|   |   +-- audio_utils.py     mu-law <-> PCM16 conversion
|   |   +-- barge_in.py        User speech detection during TTS
|   |
|   +-- tts/                 Text-to-speech
|   |   +-- piper_tts.py     Offline TTS (Piper ONNX)
|   |   +-- edge_tts_client.py  Cloud TTS (Microsoft Edge)
|   |   +-- audio_convert.py    Format conversion utilities
|   |
|   +-- prompts/             Prompt templates for each agent
|   +-- tests/               Unit + integration tests
|
+-- dashboard/               Static HTML role-based dashboards
+-- piper/                   Piper TTS voice models
+-- docker-compose.yml       Cloud deployment
+-- docker-compose.edge.yml  Edge/offline deployment
+-- start.sh                 Cloud startup script
+-- start-edge.sh            Edge startup script
+-- .env.example

Call Flow — Session State Machine

greeting -> collecting -> confirming -> classifying -> resolving -> ticket_created -> done
                ^             |
                +-------------+  (user says "no" -> re-collect)

State	Description	Automated?
greeting	Welcome TTS plays	Yes
collecting	Dialog Agent extracts complaint details (max 4 turns)	LLM-driven
confirming	System reads back, user confirms or corrects	User-driven
classifying	ONNX ensemble runs classification	Yes
resolving	LLM generates resolution steps	Yes
ticket_created	Ticket persisted, confirmation TTS plays	Yes

Five Specialised Agents

Agent	Responsibility	Backend
Dialog Agent	Multi-turn complaint extraction with structured schema	LLM (Ollama/Groq)
Classify Agent	HTTP call to NLP classifier `/predict` endpoint	text_classifier service
Resolve Agent	Generates specific resolution steps (escalate/refund/follow-up/exchange)	LLM (Ollama/Groq)
Ticket Agent	CRUD against backend database, SLA state management	Backend API
Analytics Agent	Incremental KPI counters, WebSocket broadcast to dashboards	In-memory

Dual-Mode Operation

Mode	LLM	TTS	Telephony	RAM
Online	Groq (0.5-1.5s) with Ollama fallback	Edge TTS (~200ms) with Piper fallback	Twilio	~2.8GB
Offline	Ollama only (2-4s)	Piper only (~50ms)	Unavailable	~2.8GB

FMCG Domain Corrections

The system corrects 40+ common misrecognitions from Indian-accented speech:

Misrecognized	Corrected	Misrecognized	Corrected
parley g	Parle-G	surf excell	Surf Excel
curry cure	Kurkure	dairy milk	Dairy Milk
maggy	Maggi	(40+ total)	...

5. Website & Dashboard (`website/`)

Port: 3000 | Stack: Next.js 16 + React 19 + TypeScript + Prisma ORM + Supabase/PostgreSQL

The website provides the web-facing interface — public landing page, role-based dashboards, complaint intake API, and real-time analytics.

Architecture

website/
|
+-- src/
|   +-- app/
|   |   +-- layout.tsx           Root layout (Oswald + Inter fonts, dark theme)
|   |   +-- globals.css          Tailwind CSS v4 styles
|   |   +-- login/page.tsx       Authentication page
|   |   |
|   |   +-- (dashboard)/        Role-based dashboard views
|   |   |   +-- admin/           Full system control, employee management
|   |   |   +-- operational/     Ticket management, status updates
|   |   |   +-- call-center/     Complaint intake, live queue
|   |   |   +-- quality-assurance/  QA review workflows
|   |   |   +-- voice-agent/     Voice agent monitoring
|   |   |
|   |   +-- api/                 Next.js Route Handlers
|   |       +-- auth/            NextAuth v5 (JWT sessions)
|   |       +-- complaints/      CRUD + AI pipeline trigger
|   |       +-- analytics/       Dashboard KPIs, trends, products
|   |       +-- admin/           Employee + SLA management
|   |       +-- export/          CSV export
|   |       +-- sse/             Server-Sent Events (real-time)
|   |       +-- webhooks/brevo/  Email webhook handler
|   |       +-- health/          Health check
|   |       +-- voice-agent/     Voice agent proxy
|   |
|   +-- components/
|       +-- landing/             Hero, CTA, Footer, Testimonials, Navbar
|       +-- dashboard/           Sidebar, charts, tables
|       +-- providers/           SessionProvider (NextAuth)
|
+-- prisma/
|   +-- schema.prisma            8 models (User, Product, Customer, Complaint, etc.)
|   +-- seed.ts                  Database seeding script
|
+-- supabase/
|   +-- migrations/              SQL migration files
|
+-- public/                      Static assets
+-- package.json                 Dependencies
+-- next.config.ts
+-- tsconfig.json

Database Schema (Prisma + PostgreSQL)

Model	Purpose	Key Fields
User	Employee authentication	email, password_hash, role (admin/operational/call_center/quality_assurance)
Product	Product catalog	name, SKU, category
Customer	Customer records	name, email, phone
Complaint	Core entity	text, category, priority, sentiment_score, source, status, resolution data
ComplaintTimeline	Audit trail	complaint_id, action, performed_by, metadata (JSON)
DailyMetric	Pre-aggregated KPIs	date, total_complaints, priority/category counts, avg_sentiment
SLAConfig	SLA thresholds	priority, response_hours, resolve_hours

Complaint Intake Pipeline

When a complaint is submitted via POST /api/complaints:

VALIDATE     Zod schema validation (text, source, product_id, customer_name)
    |
TRANSCRIBE   If source="call" with audio_base64 -> STT service
    |
CLASSIFY     NLP service returns category, sentiment, priority
    |
RESOLVE      GenAI service generates resolution steps + customer response
    |
PERSIST      Insert into Supabase complaints table with all AI outputs
    |
TIMELINE     Create initial timeline entry
    |
BROADCAST    SSE push to all connected dashboard clients
    |
ALERT        If priority="High" -> broadcast high-priority alert

Frontend Stack

Technology	Purpose
Next.js 16 (App Router)	Server-side rendering, API routes, middleware
React 19	Component framework
TypeScript	Type safety
Tailwind CSS v4	Utility-first styling
Framer Motion	Page transitions and animations
GSAP	Scroll-triggered landing page animations
Lenis	Smooth scrolling
Recharts	Dashboard charts and analytics
Lucide React	Icon library
NextAuth v5	JWT authentication with role-based middleware
Prisma	Type-safe ORM for PostgreSQL
Zod	Runtime schema validation

API Endpoints

Method	Path	Description
`POST`	`/api/auth/login`	JWT authentication
`POST`	`/api/auth/register`	User registration
`GET/POST`	`/api/complaints`	List (paginated, filtered) / Create (triggers AI pipeline)
`GET`	`/api/complaints/search`	Full-text search
`GET`	`/api/complaints/[id]`	Complaint detail
`GET`	`/api/analytics/dashboard`	KPI aggregation
`GET`	`/api/analytics/trends`	Time-series analytics
`GET`	`/api/export/complaints`	CSV export
`GET/PATCH`	`/api/admin/employees`	Employee management
`PATCH`	`/api/admin/sla-config`	SLA configuration
`GET`	`/api/sse/complaints`	SSE complaint event stream
`GET`	`/api/sse/notifications`	SSE notification stream
`POST`	`/api/webhooks/brevo`	Brevo email webhook

Why This Architecture Was the Best Choice

Microservices Over Monolith — For a Hackathon

At first glance, microservices seem like overkill for a hackathon. Here's why it was the right call:

Factor	Monolith	Our Microservices	Why We Chose This
Parallel development	One codebase, merge conflicts	4 team members, 5 independent repos	Each team member worked on a separate service with zero conflicts
Language flexibility	Single runtime	Python for ML/AI, TypeScript for web	Used the right tool for each job — Python for ML inference, TypeScript for React
Independent deployment	All or nothing	Each service deploys independently	Failed service doesn't bring down the entire system
Offline fallback	Hard to isolate cloud dependencies	Each service has local fallback	Voice agent works fully offline (Ollama + Piper) when internet is unavailable
Testing	Integration tests for everything	Each service has isolated unit tests	Easier to validate correctness per-service
Demo flexibility	Must demo everything or nothing	Can demo any layer independently	Showed NLP accuracy independently, then voice pipeline, then full integration

Why ONNX Over Raw PyTorch for NLP

Factor	PyTorch Eager	ONNX Runtime
Inference latency	~35ms	~12ms (3x faster)
GPU memory	~800MB	~500MB (37% less)
Quantisation	Manual, complex	Built-in INT8/FP16
Deployment	Requires full PyTorch	Standalone runtime
Docker image size	~2GB+	~500MB

Why Groq Over Self-Hosted LLM for GenAI

Factor	Self-hosted (Ollama)	Groq API
Latency	2-4s (CPU)	1.4s
Quality (ablation-tested)	Good (phi3.5 1.5B)	96.9% (Llama 3.3 70B)
RAM requirement	~1.5GB	0 (API call)
Cost	Free	Free (Groq free tier)
Offline support	Yes	No

Our solution: Use Groq as primary, Ollama as automatic offline fallback. Best of both worlds.

Why Next.js 16 for the Website

Factor	Why
API routes	Backend logic (auth, complaint pipeline, analytics) colocated with frontend — no separate Express server needed
SSR	Dashboard pages server-rendered for fast initial load
Middleware	Role-based route protection at the edge layer
React 19	Latest React with improved performance and streaming
Prisma integration	Type-safe database queries generated from schema

LLM Ablation Study — Model Selection

10 models x 4 complaint scenarios x 3 tasks = 120 total API calls, auto-evaluated on 5 weighted metrics.

Models Evaluated

#	Model	Provider	Parameters
1	Llama 3.3 70B	Groq	70B
2	Qwen 2.5 72B	HuggingFace	72B
3	MiniMax M2.5	OpenRouter	—
4	Qwen 3.5 Plus	OpenRouter	—
5	Gemini 2.5 Flash	Google	—
6	MiniMax M2.7	OpenRouter	—
7	GLM 5 (Zhipu)	OpenRouter	—
8	GLM 5.1 (Zhipu)	OpenRouter	—
9	MiMo V2 Pro (Xiaomi)	OpenRouter	—
10	MiMo V2 Omni (Xiaomi)	OpenRouter	—

Evaluation Metrics

Metric	Weight	What It Measures
Classification Accuracy	30%	Correct complaint category
Priority Accuracy	25%	Correct urgency level (adjacent = 50% credit)
Resolution Quality	25%	Completeness and actionability of resolution steps
Format Compliance	10%	Valid JSON with all required schema fields
Response Quality	10%	Appropriate length, no refusals, coherent output

Results

Rank	Model	Overall Score	Avg Latency	Category Acc.	Priority Acc.
1	Llama 3.3 70B (Groq)	96.9%	1.4s	100%	88%
2	Qwen 2.5 72B (HuggingFace)	96.9%	11.6s	100%	88%
3	MiniMax M2.5	96.0%	13.5s	100%	88%
4	Qwen 3.5 Plus	92.3%	51.7s	92%	79%
5	Gemini 2.5 Flash	89.9%	7.4s	91%	82%
6	MiniMax M2.7	89.3%	15.1s	83%	88%
7-10	GLM 5/5.1, MiMo V2 Pro/Omni	32-49%	13-19s	33-58%	38-62%

Why Llama 3.3 70B on Groq?

Tied for highest accuracy (96.9%) with Qwen 2.5 72B
8x faster than the runner-up (1.4s vs 11.6s)
100% category accuracy — zero misclassification
100% format compliance — always valid JSON
Free tier — no API costs for production use
Consistent latency — no cold starts or spikes

Visual Results

Overall Score Comparison

Multi-Metric Radar

Response Latency

Per-Task Breakdown

Quality vs Speed (Efficiency Frontier)

Detailed Metric Heatmap

Token Usage

Scenario Accuracy

How to Reproduce

cd genai
export OPENCODE_API_KEY=your_key_here

# 1. Test API connectivity
python -m comparative_analysis.test_api_keys

# 2. Run the ablation study (120 API calls, ~10-15 min)
python -m comparative_analysis.run_ablation

# 3. Generate 8 graphs and markdown report
python -m comparative_analysis.generate_report

End-to-End Data Flow

Voice Call Path

User speaks -> Twilio Media Stream -> Orchestrator (mu-law -> PCM16)
    -> STT Service (Whisper + VAD) -> transcribed text
    -> Dialog Agent (LLM extraction) -> structured complaint
    -> Classifier Service (ONNX ensemble) -> category + sentiment + priority
    -> Resolve Agent (Groq Llama 3.3 70B) -> resolution steps
    -> Backend API -> persisted to PostgreSQL
    -> TTS (Piper/Edge) -> confirmation spoken to user
    -> SSE broadcast -> dashboard updates in real-time

Web Form Path

User submits form -> Next.js API Route (/api/complaints)
    -> (optional) STT if audio attached
    -> NLP Classifier -> category + sentiment + priority
    -> GenAI Resolution Engine -> resolution steps + customer response
    -> Supabase/PostgreSQL -> persisted with timeline entry
    -> SSE broadcast -> all connected dashboards receive update
    -> High-priority alert -> pushed to operations dashboard

Dashboard View Path

Browser -> Next.js middleware (role check) -> role-specific dashboard page
    -> API call to /api/complaints (filtered by role)
    -> Prisma query -> Supabase/PostgreSQL
    -> JSON response -> React components render
    -> SSE connection -> real-time updates without page refresh

Performance Benchmarks

Operation	Latency	Notes
NLP classification (single)	~12ms	ONNX + CUDA, dual-model ensemble
GenAI resolution	~1.4s	Groq Llama 3.3 70B
STT transcription (GPU)	300-500ms	Faster-Whisper Tiny, INT8+FP16
STT transcription (CPU)	1-2s	CPU fallback
Voice agent (end-to-end, online)	2-3s/turn	Groq + Edge TTS
Voice agent (end-to-end, offline)	4-6s/turn	Ollama + Piper TTS
Dashboard API response	<100ms	Prisma + Supabase
SSE event propagation	<50ms	Server-Sent Events

Resource Requirements

Deployment	RAM	Disk	GPU
Full stack (online)	~2.8GB	~1.3GB	Optional
Full stack (offline)	~4.3GB	~2.3GB	Recommended
Website only	~500MB	~200MB	None

Per-Service Breakdown

Component	RAM	Disk	Offline?
STT (Whisper-tiny)	~300MB	~50MB	Yes
NLP Classifier (ONNX)	~200MB	~200MB	Yes
GenAI (API client)	~100MB	~10MB	Needs API
Orchestrator	~100MB	~5MB	Yes
Ollama (phi3.5)	~1.5GB	~1GB	Yes
Piper TTS	~100MB	~30MB	Yes
Website (Next.js)	~500MB	~200MB	Partial

Cost of Scaling & Real-World Implementation

Deployment Cost Breakdown

Small Scale (Startup / Pilot — 100 complaints/day)

Component	Option	Monthly Cost
Compute	1x AWS t3.medium (2 vCPU, 4GB RAM)	~$30
Database	Supabase Free tier (500MB)	$0
LLM	Groq Free tier	$0
Telephony	Twilio Pay-as-you-go (~100 calls x $0.02/min)	~$4
Domain + SSL	Cloudflare	$0
Total		~$34/month

Medium Scale (Enterprise — 1,000 complaints/day)

Component	Option	Monthly Cost
Compute	2x AWS t3.large (2 vCPU, 8GB RAM) behind ALB	~$120
GPU	1x AWS g4dn.xlarge (T4 GPU for ONNX)	~$380
Database	Supabase Pro (8GB, daily backups)	$25
LLM	Groq paid tier (~1000 calls/day x 2048 tokens)	~$50
Telephony	Twilio (~1000 calls x $0.02/min)	~$40
Monitoring	LangSmith team plan	$39
Total		~$654/month

Large Scale (Enterprise — 10,000+ complaints/day)

Component	Option	Monthly Cost
Compute	Kubernetes cluster (4-8 nodes, auto-scaling)	~$800
GPU	2x T4 instances (ONNX) + 1x A10G (if self-hosting LLM)	~$1,200
Database	Supabase Team or AWS RDS (Multi-AZ, read replicas)	~$200
LLM	Self-hosted Llama 70B on vLLM or continued Groq enterprise	~$300-500
Telephony	Twilio enterprise volume pricing	~$400
CDN + WAF	Cloudflare Pro	$20
Monitoring	LangSmith + Grafana + Prometheus	~$100
Total		~$3,000-3,300/month

Cost Comparison vs. Manual Process

Scale	Manual Cost (agents + overhead)	SOLV.ai Cost	Savings
100/day	~INR 50,000/month (2 agents)	~INR 2,800/month	94%
1,000/day	~INR 3,00,000/month (15 agents)	~INR 54,000/month	82%
10,000/day	~INR 25,00,000/month (100+ agents)	~INR 2,70,000/month	89%

Scaling Considerations

Factor	Approach
Horizontal scaling	Each microservice scales independently — add more NLP instances for classification throughput, more GenAI instances for resolution generation
Database	Read replicas for dashboard queries, write primary for complaint intake. Connection pooling via PgBouncer
Caching	Redis for frequently accessed complaint lists, dashboard KPIs, and classifier model outputs
Queue-based processing	For high volume: decouple intake from AI processing via Redis/RabbitMQ job queue
CDN	Static assets (Next.js frontend) served from edge CDN
Rate limiting	Per-IP rate limiting already implemented (60 req/min default)

Scope of Improvement

Near-Term Enhancements

Area	Improvement	Impact
Multi-language STT	Add Hindi, Gujarati, Tamil Whisper models	Handle India's linguistic diversity
Few-shot prompting	Gold-standard resolution examples in system prompts	Higher resolution quality
Email integration	Auto-send branded resolution emails to customers via Brevo	Close the loop without agent intervention
RAG for product manuals	Vector search over product documentation for grounded resolutions	Reduce hallucination, more specific advice
Sentiment trend alerts	Auto-detect when negative sentiment spikes for a product	Proactive quality control
Voice language detection	Auto-detect caller's language and switch STT model	Seamless multilingual experience

Production Hardening

Area	Improvement	Impact
Kubernetes	Containerized deployment with auto-scaling, health checks, rolling updates	Production reliability
Redis caching	Cache classifier outputs, dashboard KPIs, frequently accessed complaints	Reduce database load
CI/CD pipeline	GitHub Actions for automated testing, linting, deployment	Faster iteration, fewer regressions
Prometheus + Grafana	Infrastructure monitoring (CPU, memory, latency percentiles)	Operational visibility
Load testing	k6 or Locust stress tests to validate scaling assumptions	Confidence in production capacity
Database migrations	Automated Prisma migration pipeline with rollback support	Safe schema evolution

Advanced AI Capabilities

Area	Improvement	Impact
Fine-tuned classifier	Train on company's historical complaint data instead of zero-shot	Higher accuracy on domain-specific categories
Complaint clustering	Unsupervised clustering to discover new complaint categories	Detect emerging product issues
Predictive analytics	Time-series forecasting of complaint volumes	Staff planning, proactive inventory adjustments
Self-critique loop	Second LLM call reviews first output for hallucinations	Stronger safety guarantees
Customer sentiment journey	Track sentiment evolution across a customer's complaint history	Identify at-risk customers before churn
Agent performance scoring	Compare AI resolutions with human agent outcomes	Continuous improvement loop
Chain-of-thought reasoning	Step-by-step LLM reasoning before JSON output	Better resolution quality for complex complaints

Infrastructure Expansion

Area	Improvement	Impact
Dedicated vector DB	Pinecone or Weaviate for RAG at scale	Sub-100ms retrieval over large document corpus
WebSocket dashboards	Replace SSE with WebSocket for bidirectional communication	Real-time collaborative ticket management
Mobile app	React Native app for field agents to update ticket status	Faster resolution for walk-in complaints
Webhook integrations	Slack, Teams, Jira integrations for ticket routing	Fit into existing enterprise workflows
Multi-tenancy	Tenant isolation for SaaS deployment	Serve multiple companies from one instance

Quick Start Guide

Prerequisites

Python 3.11+
Node.js 18+
NVIDIA GPU (optional, for GPU acceleration)
Ollama (for local LLM — ollama pull phi3.5:latest)
Supabase project (or PostgreSQL database)

Start All Services

# 1. STT Service (port 8001)
cd stt && pip install -r requirements.txt && python run_server.py

# 2. NLP Classifier (port 8002)
cd text_classifier && pip install -r requirements.txt && python run_server.py --port 8002

# 3. GenAI Resolution Engine (port 8004)
cd genai && pip install -r requirements.txt && cp .env.example .env  # add LLM_API_KEY
python run_server.py

# 4. Voice Agent Orchestrator (port 8003)
cd voice-agent/orchestrator && pip install -r requirements.txt
python -m uvicorn main:app --port 8003

# 5. Website (port 3000)
cd website && npm install && npm run dev

Docker Deployment

# Cloud mode (with Twilio telephony)
cd voice-agent && cp .env.example .env && docker-compose up -d

# Edge mode (fully offline, no telephony)
docker-compose -f docker-compose.edge.yml up -d

Test the Pipeline

# Test NLP classifier directly
curl -X POST http://localhost:8002/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "My Parle-G biscuit packet was torn and the biscuits were all broken"}'

# Test end-to-end via voice agent
curl -X POST http://localhost:8003/test/pipeline \
  -H "Content-Type: application/json" \
  -d '{"text": "Box was broken during delivery"}'

# Test website complaint API
curl -X POST http://localhost:3000/api/complaints \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Product expired before printed date",
    "source": "email",
    "product_id": "pw-col-001",
    "customer_name": "Test User",
    "customer_email": "test@example.com"
  }'

Verify Services

Service	URL	Check
STT	http://localhost:8001/health	Model loaded, VAD status
NLP Classifier	http://localhost:8002/health	ONNX models loaded, GPU/CPU
GenAI	http://localhost:8004/health	LLM connectivity
Voice Agent	http://localhost:8003/health	All downstream services
Website	http://localhost:3000	Landing page renders

Deployment Modes

Mode	Description	Dependencies	Monthly Cost
Full Cloud	All services + Twilio telephony + Groq LLM + Edge TTS	Twilio, Groq API key, Supabase	~$55-75
Edge/Offline	Fully local — Ollama LLM, Piper TTS, Whisper STT, ONNX classifier	4GB RAM, NVIDIA GPU (optional)	$0
Hybrid	Local STT + Classifier, cloud LLM (Groq) for faster resolution	Groq API key	~$0 (Groq free tier)

Repository Structure

lakshya-ldce/
|
+-- genai/                          GenAI Resolution Engine (FastAPI)
|   +-- main.py                     Server (4 endpoints)
|   +-- llm.py                      LLM client + LangSmith tracing
|   +-- guardrails.py               4-layer security
|   +-- prompts.py                  System + user prompts
|   +-- models.py                   Pydantic schemas
|   +-- email_html.py               Branded HTML emails
|   +-- comparative_analysis/       10-model ablation study
|       +-- graphs/                 8 auto-generated PNG charts
|
+-- text_classifier/                NLP Classifier (FastAPI + ONNX)
|   +-- server.py                   Server (5 endpoints + Prometheus)
|   +-- inference_engine.py         Dual-model ONNX ensemble
|   +-- models/                     DistilBERT-MNLI, MiniLM-L6, DecisionTree
|
+-- stt/                            Speech-to-Text (FastAPI + Whisper)
|   +-- server.py                   Server (5 endpoints + WebSocket)
|   +-- inference_engine.py         Whisper + VAD + post-processing
|
+-- voice-agent/                    Voice Agent Orchestrator
|   +-- orchestrator/
|   |   +-- main.py                 Server (webhooks, WebSocket, test)
|   |   +-- agents/                 5 specialised agents
|   |   +-- pipeline/               FSM state machine (6 states)
|   |   +-- telephony/              Twilio handler + audio utils
|   |   +-- tts/                    Piper + Edge TTS
|   +-- dashboard/                  Static HTML dashboards (3 roles)
|   +-- docker-compose.yml          Cloud deployment
|   +-- docker-compose.edge.yml     Edge deployment
|
+-- website/                        Web Application (Next.js 16)
|   +-- src/app/                    App Router (pages + API routes)
|   +-- src/components/             React components (landing + dashboard)
|   +-- prisma/schema.prisma        8 database models
|   +-- supabase/                   Migrations + seed data
|
+-- workflow1.png                   System architecture diagram
+-- readme.md                       This file

Team Members

Name	Email	University	Graduation Year	Role
Priyanshu Doshi (leader)	24bam050@nirmauni.ac.in	Nirma University	2028	Conversational ML Engineer
Tirth Patel	24btm028@nirmauni.ac.in	Nirma University	2028	Full Stack Developer
Neal Daftary	24bam019@nirmauni.ac.in	Nirma University	2028	Conversational ML Engineer
Parshva Shah	24bam043@nirmauni.ac.in	Nirma University	2028	Data Engineer & Researcher

What We Bring

What	Evidence
End-to-end ML	ONNX-accelerated dual-model ensemble achieving 100% accuracy at ~12ms, with full mathematical documentation
Production GenAI	4-layer guardrails, prompt injection detection, LangSmith observability, 10-model ablation study (120 API calls)
Full-stack engineering	Next.js 16 + React 19 + Prisma + Supabase with role-based auth, SSE real-time updates, and 15+ API routes
Voice AI	End-to-end phone call pipeline: Twilio -> STT -> Dialog -> Classify -> Resolve -> Ticket -> TTS, running on 4GB RAM
System design	5 independently deployable microservices with offline fallback, dual-mode operation, and Docker orchestration

Built for Tark Shaastra | LDCE Hackathon — SOLV.ai: AI-Powered Complaint Classification & Resolution Recommendation Engine

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
genai		genai
stt		stt
text_classifier		text_classifier
voice-agent		voice-agent
website		website
.gitignore		.gitignore
package-lock.json		package-lock.json
readme.md		readme.md
workflow1.png		workflow1.png

Folders and files

Latest commit

History

Repository files navigation

SOLV.ai — AI-Powered Voice Complaint Management System

Table of Contents

Why This Exists — The Necessity

The Problem in the Indian FMCG & Wellness Industry

What We Set Out to Solve

What Our Team Contributed

The Business Case

Why SOLV.ai Generates Value — Beyond Cost Savings

Financial Impact

Time Impact

Operational Impact

The Bottom Line

What SOLV.ai Does

Key Metrics

System Architecture

Technology Stack

Core Technologies

AI & ML Models

Module Architecture & Deep Dive

1. GenAI Resolution Engine (genai/)

Architecture

Processing Pipeline

Key Features

API Endpoints

SLA Management

2. NLP Text Classifier (text_classifier/)

Architecture

Inference Pipeline

Mathematical Foundations

Computational Efficiency

3. Speech-to-Text Service (stt/)

Architecture

Processing Pipeline

Streaming WebSocket

Model Rationale

4. Voice Agent Orchestrator (voice-agent/)

Architecture

Call Flow — Session State Machine

Five Specialised Agents

Dual-Mode Operation

FMCG Domain Corrections

5. Website & Dashboard (website/)

Architecture

Database Schema (Prisma + PostgreSQL)

Complaint Intake Pipeline

Frontend Stack

API Endpoints

Why This Architecture Was the Best Choice

Microservices Over Monolith — For a Hackathon

Why ONNX Over Raw PyTorch for NLP

Why Groq Over Self-Hosted LLM for GenAI

Why Next.js 16 for the Website

LLM Ablation Study — Model Selection

Models Evaluated

Evaluation Metrics

Results

Why Llama 3.3 70B on Groq?

Visual Results

Overall Score Comparison

Multi-Metric Radar

Response Latency

Per-Task Breakdown

Quality vs Speed (Efficiency Frontier)

Detailed Metric Heatmap

Token Usage

Scenario Accuracy

How to Reproduce

End-to-End Data Flow

Voice Call Path

Web Form Path

Dashboard View Path

Performance Benchmarks

Resource Requirements

Per-Service Breakdown

Cost of Scaling & Real-World Implementation

Deployment Cost Breakdown

1. GenAI Resolution Engine (`genai/`)

2. NLP Text Classifier (`text_classifier/`)

3. Speech-to-Text Service (`stt/`)

4. Voice Agent Orchestrator (`voice-agent/`)

5. Website & Dashboard (`website/`)

Packages