Bridging the Gap: Real-time Indian Sign Language to Speech Translation with Neural Intelligence.
SignBridge AI v2.0 is a high-performance, real-time sign language translation system designed to convert hand gestures into natural spoken English. Built for the modern web and optimized for legacy hardware, v2.0 features an ultra-low latency "Neural Link" interface, deterministic finger-state heuristics, and deep integration with Google Gemini AI for grammatically correct, natural language translation.
It specifically targets Indian Sign Language (ISL), transforming its Subject-Object-Verb (SOV) structure into fluent English (SVO).
- Zero-Latency Neural Link: Optimized frame throttling ensures the UI remains responsive (60 FPS target) by preventing buffer bloat.
- Perfect Skeleton Alignment: Adaptive coordinate mapping handles
object-containvideo scaling for pixel-perfect landmark overlays. - Ultra-Fast ML Engine: Powered by MediaPipe Hands for maximum speed on CPUs like the Intel i5-4440.
- Gemini AI Integration: Uses Gemini 2.0 Flash to transform ISL sign sequences into natural English sentences, inferring tenses and articles.
- Real-time Audio: Integrated neural Text-to-Speech (TTS) provides immediate verbal feedback.
- ADK Powered: Built on the Agent Development Kit (ADK) for systematic evaluation and agentic workflow management.
The system is hosted on Render.com (Backend) and GitHub Pages (Frontend), providing a globally accessible, auto-scaling infrastructure.
┌─────────────────────────────────┐ WebSocket (JSON + Binary) ┌─────────────────────────────────┐
│ Next.js 16 (Redesigned) │ <───────────────────────────────────────────> │ FastAPI (Async) │
├─────────────────────────────────┤ ├─────────────────────────────────┤
│ │ 1. Camera Frames (RAW/resized) │ Render.com Host │
│ ┌─────────────────────────┐ │ ────────────────────────────────────────────> │ ┌─────────────────────────┐ │
│ │ Neural Link App │ │ │ │ MediaPipe (Holistic) │ │
│ └───────────┬─────────────┘ │ │ └────────────┬────────────┘ │
│ │ │ 2. Landmarks & Confidence │ │ │
│ ┌───────────▼─────────────┐ │ <──────────────────────────────────────────── │ ┌────────────▼────────────┐ │
│ │ Adaptive Overlay │ │ │ │ LSTM / Heuristic Engine │ │
│ └───────────┬─────────────┘ │ │ └────────────┬────────────┘ │
│ │ │ 3. Translated Audio (B64) │ │ │
│ ┌───────────▼─────────────┐ │ <──────────────────────────────────────────── │ ┌────────────▼────────────┐ │
│ │ Translation Panel │ │ │ │ Gemini 2.0 Flash Agent │ │
│ └─────────────────────────┘ │ │ └─────────────────────────┘ │
│ │ │ │
└─────────────────────────────────┘ └─────────────────────────────────┘
- Python 3.11+
- uv (Recommended for Python management)
- Node.js 18+
- Google Gemini API Key
-
Root Configuration: Create a
.env.localfile in the root directory:GEMINI_API_KEY=your_key_here JWT_SECRET=your_secret_here
-
Start Backend Engine:
cd backend uv run python main.py -
Authentication: The backend is secured via JWT.
- Obtain a token via the
/loginendpoint (default credentials:admin/password). - The WebSocket connection requires this token passed as a
tokenquery parameter or in theAuthorizationheader during the handshake.
- Obtain a token via the
- Navigate to the frontend directory:
cd web-frontend - Install dependencies and build for production:
npm install npm run build
- Deploy to GitHub Pages (Automated via GitHub Actions).
This project utilizes the google-agents-cli for systematic agent testing:
- Install ADK:
uv tool install google-agents-cli - Sync dependencies:
uv sync --extra eval - Execute evaluations:
agents-cli eval run
- Target Hardware: Intel i5-4440 (60 FPS Goal)
- Current Performance: ~25 FPS (Headless Profiling)
- Bottlenecks: Neural processing on CPU.
- Optimizations:
model_complexity=0used in MediaPipe;asyncio.to_threadfor non-blocking ML inference.
- Credential Protection: API keys are isolated in
.envand strictly excluded from version control via.gitignore. - Stateless Processing: Video frames are processed in-memory and discarded immediately after landmark extraction.
Licensed under the Apache License 2.0.