(Indian Sign Language Recognition System)
Team: NAMO NIRVANA (Team ID: 94943)
Problem Statement ID: SIH25247
Theme: Miscellaneous | Category: Hardware
Smart India Hackathon 2025
- Harsh Yadav (Team Lead)
- Avishkar Jaiswal
- Samriddhi Ganguly
- Samyak Jain
- Harshit Singh
- Thakur Akshayakumar Raj
SPARC is a specialized Indian Sign Language (ISL) Recognition System. It is designed to interpret the complex, dynamic, and bimanual gestures unique to ISL, translating them into text/speech in real-time.
Unlike generic sign language models, SPARC focuses specifically on the temporal dynamics of ISL—where the movement is just as important as the pose.
- 5 Million+ deaf individuals in India.
- < 250 certified ISL interpreters.
- The Gap: Most existing solutions focus on static alphabets (A-Z). ISL is a full language with grammar and continuous motion. SPARC solves for words and sentences.
This repository implements a multi-stage deep learning pipeline, evolving from standard LSTMs to State-of-the-Art (SOTA) architectures.
- Input normalization: 45 Frames per video (Fixed Size).
- Feature Extraction: MediaPipe Holistic extracts 258 Keypoints per frame:
- Pose (132): Body orientation & arm movement.
- Left Hand (63) + Right Hand (63): Fine-grained finger articulation.
- Augmentation Strategy: To ensure robustness, we implement:
- Gaussian Noise Injection (Simulating sensor noise).
- Spatial Scaling (Handling different body sizes).
- Temporal Warping (Handling different signing speeds).
We researched and implemented three distinct tiers of models:
- Structure: 3 stacked LSTM layers (64-128-256 units) + Dense classification head.
- Use Case: Fast, lightweight recognition for basic vocabulary.
- Current Deployment: Optimized for low-latency CPU inference.
- Improvements: Added Batch Normalization, Dropout (0.3), and L2 Regularization.
- Activation: Switched to
tanhfor stable gradient flow. - Result: Higher accuracy on unseen test subjects.
- SOTA Architecture: A hybrid Spatial-Temporal design.
- Stream 1 (Spatial): LSTM with Self-Attention mechanisms to focus on hand-face interaction.
- Stream 2 (Temporal): Temporal Convolutional Networks (TCN) to capture fast motion dynamics.
- Fusion: Attention-based fusion layer combines both streams for the final prediction.
For instant feedback on static cultural signs.
Alongside the AI model, we engineered a Rule-Based Heuristic Engine (realtime-detection.py) for specific geometric ISL gestures:
- Namaste: Calculates wrist-to-wrist distance and palm symmetry.
- I am Indian: Triangulates Hand-Eyebrow-Shoulder positions.
- Water/Doctor/Home: Custom geometric signatures.
- Dataset: INCLUDE 50 + Custom NAMO NIRVANA Dataset.
- Vocabulary: 16 Classes (Hello, Thank you, Please, Good Morning, etc.).
- Training Scale: 1000+ Videos with 5x Augmentation.
- Accuracy:
- Validation: 74.6%
- Real-Time Test: 84.0%
- Clone the repository
- Install Dependencies:
pip install -r requirements.txt
This uses the LSTM network to recognize dynamic words ("Hello", "How are you").
python deploy-code.pyFor checking specific static signs (Namaste, Indian, etc.).
python realtime-detection.py(Or use RUN-REALTIME-DEMO.bat on Windows)
If you want to add new words to the ISL dictionary:
# Prepare data in 'training-data/' folder
python train-improved-model.py --epochs 100 --augment 5Developed by Team NAMO NIRVANA for Smart India Hackathon 2025.