HearLink — Backend Services (Flask, Whisper, Vision, SQLite)
HearLink is a cloud‑native, production‑ready EdTech backend that turns classrooms into inclusive, intelligent learning spaces. It powers:
- Real‑time multilingual speech‑to‑text (Whisper / faster‑whisper)
- AI content automation (notes, summaries, quizzes, flashcards, exercises)
- Emotion detection and analytics from video frames (OpenCV‑based)
- Classroom chat with AI tutor and persistent chat history
- User accounts, class records, and emotion insights stored in SQLite
Recognition: Top 10 Project — Pragati META AI Hackathon 2025
Live resources (placeholders):
There are two Flask servers provided:
Server.py— main API server (default port5009). Includes transcription, content generation, chat endpoints, user management, and emotion insights.VoiceChatbotServer.py— auxiliary voice chatbot API (default port5000). Uses a custom OpenAI‑compatible LLaMA endpoint for responses and manages classroom chat history in SQLite.
- Language: Python 3.12 (Docker base) / Python 3.x locally
- Frameworks: Flask, Flask‑CORS, Flask‑Login, Flask‑SQLAlchemy
- AI/ML: PyTorch (
torch), Whisper / faster‑whisper, OpenCV, deepface, transformers - LLM providers: Google Gemini (
google-generativeai), custom OpenAI‑compatible LLaMA endpoints (viaopenai/OpenAIclients) - Data: SQLite
- Main API:
sqlite:///made_with_hardwork.db(SQLAlchemy) - Chat helpers:
classroom_data.db
- Main API:
- Media: FFmpeg, moviepy, pydub, PortAudio
- Package manager: pip via
requirements.txt - Containerization:
Dockerfile(Ubuntu‑based Python 3.12 image)
System prerequisites (for local, non‑Docker runs):
- Python 3.10+ (3.12 recommended)
- FFmpeg (required by media and Whisper pipelines)
- OpenCV runtime libs (if using local camera/video processing)
- PortAudio (for certain audio features)
- Optional GPU: CUDA‑capable GPU + compatible PyTorch for faster inference (
torch.cuda.is_available()is used to auto‑select device)
Python dependencies:
- Install from
requirements.txt:pip install -r requirements.txt
Data and working directories (created at runtime if missing):
uploads/— uploaded mediaresults/— generated assets (PDFs, notes, etc.)registered_faces/— face images and related assetsinstance/— Flask instance folder
SQLite databases used:
- Main API (
Server.py):sqlite:///made_with_hardwork.db - Chatbot helpers:
classroom_data.db
Some features require API keys for LLMs:
GOOGLE_API_KEY— for Gemini models used in translation/content generationE2E_API_KEY— for custom OpenAI‑compatible LLaMA endpoints (used inVoice_Chat_helper.py,VoiceChatbotServer.py, and parts ofContentGeneration.py)
How to set (Windows PowerShell):
$env:GOOGLE_API_KEY = "<your_key>"
$env:E2E_API_KEY = "<your_key>"
Or create a .env file in the project root (dotenv is loaded in several modules):
GOOGLE_API_KEY=<your_key>
E2E_API_KEY=<your_key>
- Create and activate a virtual environment
python -m venv .venv
\.venv\Scripts\activate # PowerShell on Windows
- Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
-
Set env vars (see above), ensure FFmpeg is installed and accessible in PATH
-
Initialize runtime folders (optional — created on demand):
uploads/,results/,registered_faces/
Run the main server (port 5009):
python Server.py
Health check:
curl http://localhost:5009/api/health
Run the voice chatbot server (port 5000):
python VoiceChatbotServer.py
Note: You can run both services in parallel on different ports if needed.
Build the image from the provided Dockerfile:
docker build -t hearlink-backend:local .
Run the container (exposes 5009):
docker run --rm -p 5009:5009 ^
-e GOOGLE_API_KEY=$env:GOOGLE_API_KEY ^
-e E2E_API_KEY=$env:E2E_API_KEY ^
hearlink-backend:local
Notes:
- The Dockerfile copies and runs
Server.pyand exposes port5009by default. - System packages installed in the image include FFmpeg, OpenCV libs, PortAudio, and build tools.
- The
apt-get installsection currently lacks continuation characters in comments; ensure it builds in your Docker version. If it fails, remove inline comments or merge packages into a single line. See Troubleshooting. - GPU support would require a CUDA‑enabled base image and matching PyTorch wheels (not covered here).
pipeline.yaml references a pre‑built image and sets environment variables:
services:
hearlink:
image: deeppriyo/hearlinkapp:latest
command: ["python", "Server.py"]
ports:
- "5009:5009"
environment:
- GOOGLE_API_KEY=${GOOGLE_API_KEY}
- E2E_API_KEY=${E2E_API_KEY}
Notes:
- Treat this as a service descriptor; validate syntax for your orchestrator (Docker Compose, Nomad, Kubernetes via Kompose, etc.). It may need formatting fixes.
- Ensure the image tag matches your build pipeline if you are not pulling
deeppriyo/hearlinkapp:latest.
- Real‑Time Multilingual Speech‑to‑Text: Live captions across 50+ languages using Whisper
- Emotion Detection (Live & Group): Camera‑based detection of confusion, frustration, boredom; aggregated analytics
- Class Recording & Smart Notes: Auto‑generated notes and downloadable quizzes
- AI Chatbot for Learning Support: Students ask context‑aware questions in class chat
- AI‑Generated Study Materials: Notes, summaries, quizzes, flashcards, exercises in Indian languages
- Multi‑Source Content Generation: Uploads and YouTube links merged into comprehensive resources (single downloadable PDF)
- Teacher Decision Support: Real‑time engagement analytics and suggested interventions
- Device‑Agnostic Access: Web optimized; mobile app in development
- Advanced Full‑Class Analytics: Trend dashboards for class‑wide sentiment and participation
- Mobile App: Cross‑platform Flutter app
- Smart Assistive Hardware: Affordable wearables and classroom devices
- EdTech Integrations: Google Classroom, Moodle, Microsoft Teams APIs
High‑level files in the repo root:
-
Server.py— main Flask API (port 5009). Endpoints include:GET /api/health- Auth:
POST /api/login,POST /api/logout,POST /api/register - Transcription:
POST /api/transcribe,POST /api/transcribelink - Content:
GET /api/summary,GET /api/flashcards,GET /api/quiz,GET /api/exercise,GET /download/<id>/<file_type> - Notes:
POST /api/generate-note - Emotion/video:
GET /api/emotion_dashboard,POST /api/upload_video,GET /api/analysis/<user_id>,POST /api/regenerate_insights/<user_id>,POST /api/batch_regenerate_insights - Users:
GET /api/users,GET /api/current_user - Chat/classes:
POST /api/process-audio,POST /api/chat,GET /api/classes,POST /api/search-classes,GET /api/get-chat-history/<session_id>,GET /api/stats,GET /api/download/pdf/<filename>
-
VoiceChatbotServer.py— auxiliary Flask app (port 5000) for chat using a custom LLaMA endpoint -
Voice_Chat_helper.py— helpers for chat, DB init, audio handling -
ContentGeneration.py— ingest audio/video/text; Whisper/Gemini/LLaMA integrations -
emotion_helper.py— emotion analysis and media utilities -
requirements.txt— Python dependencies -
Dockerfile— container build forServer.py -
pipeline.yaml— service/image descriptor (see above) -
Runtime directories:
uploads/,results/,registered_faces/ -
Databases:
classroom_data.db,instance/made_with_hardwork.db,instance/sever_data.db -
Working artifacts:
output_files/,summary.txt,flashcards.txt,transcript.txt,translated_transcript.txt,emotion.txt,detailed_notes.txt
-
Local run:
python Server.py(main API, port 5009)python VoiceChatbotServer.py(chatbot API, port 5000)
-
Docker run:
- Entrypoint runs
Server.py(CMD ["python", "Server.py"])
- Entrypoint runs
No additional standalone CLI scripts; functionality is exposed via HTTP endpoints.
Transcribe an uploaded audio file:
curl -X POST http://localhost:5009/api/transcribe ^
-F video=@sample.mp4 ^
-F target_language=en
Transcribe from YouTube and generate notes:
curl -X POST http://localhost:5009/api/transcribelink ^
-F youtube_link=https://www.youtube.com/watch?v=dQw4w9WgXcQ ^
-F target_language=en
Get classes listing:
curl http://localhost:5009/api/classes
Chat message:
curl -X POST http://localhost:5009/api/chat ^
-H "Content-Type: application/json" ^
-d '{"session_id":"abc123","message":"Explain photosynthesis"}'
Note: Request/response shapes may evolve; inspect the route implementations for full payload details.
No automated tests are currently included in the repository.
Suggested next steps:
- Add smoke tests for
GET /api/healthand core endpoint contracts - Unit tests for
ContentGeneration.pyandemotion_helper.py(mock external APIs and model calls) - Include sample media and golden outputs for reproducibility
Primary contributor: Backend development, DevOps, and frontend integration.
- Backend architecture: Designed Flask services (
Server.py,VoiceChatbotServer.py), modular helpers, and data flows for STT, content generation, and emotion analytics - Database design: Modeled users and emotion analyses in SQLite via SQLAlchemy; separate chat history DB for isolation
- AI/ML integration: Wired Whisper/faster‑whisper pipelines, Gemini for generation/translation, and custom LLaMA endpoint for chat
- DevOps: Authored Dockerfile (system deps: FFmpeg, OpenCV, PortAudio), environment management via
.env, and deploymentpipeline.yaml; built and published images used for live demos - Frontend integration: Defined REST endpoints and payloads enabling teacher dashboards, class PDFs, and student chat experiences
TODO: Add a LICENSE file (e.g., MIT, Apache‑2.0) and reference it here.
- Docker build fails at system package install: ensure
apt-get installlines are valid for your Docker version. Consider merging packages into one line or removing inline comments to avoid “Missing continuation character”. - Whisper model downloads can be large; ensure network access and sufficient disk space.
- If GPU is expected but not used, verify NVIDIA drivers, CUDA toolkit, and install a CUDA build of PyTorch compatible with your base image.