Note
Active Repository Notice: The primary development, active issues, and latest updates for this project are maintained at the official organization repository: PxA-Labs/IndiVoice-DeepASR. Please direct all issues, feature requests, and contributions there.
Explore the Code • Launch Colab • Launch Kaggle
Important
Resilience Update (v1.8): Implemented high-frequency checkpointing (every 100 steps) and auto-resumption to protect training progress against Colab or Kaggle runtime disconnections. Resolved load_best_model_at_end compatibility issues.
Current commercial automatic speech recognition (ASR) systems demonstrate a 20-30% performance degradation when processing Indian English accents. IndiVoice-DeepASR is a research-driven project that fine-tunes OpenAI's Whisper models using Low-Rank Adaptation (LoRA) to achieve state-of-the-art accuracy across diverse Indian linguistic profiles.
- Fault-Tolerant Training: Automatic checkpoint detection and seamless resumption to safeguard progress during remote GPU training disconnections.
- Parameter Efficiency: Fine-tune with less than 2% of total parameters using Parameter-Efficient Fine-Tuning (PEFT) techniques.
- Accent Localization: Optimized to handle Hindi, Tamil, Kannada, Bengali, and Punjabi regional accents.
- Robust Audio Pipeline: Multi-layered
AudioDecoderlogic for stable preprocessing across diverse computing environments. - Enhanced Accuracy: Accomplishes significant Word Error Rate (WER) reductions compared to the base Whisper model.
graph LR
A[Raw Audio] --> B(Standardization: 16kHz Mono)
B --> C{IndiVoice Engine}
C --> D[Whisper Backbone]
C --> E[LoRA Adapters]
D & E --> F[Optimized Transcripts]
F --> G[Metric Analysis: WER/CER]
Choose your preferred platform for free GPU access:
- Colab Gateway: Best for initial setup and rapid experimentation.
- Kaggle Runner: Optimized for long-running training. Includes
setup_kaggle.shfor environment configuration.
# Clone the repository
git clone https://github.com/PxA-Labs/IndiVoice-DeepASR.git
cd IndiVoice-DeepASR
# Install dependencies
pip install -r requirements.txt
# Preprocess the Svarah dataset
python src/preprocess.py --hf_dataset ai4bharat/Svarah --output_dir data/processed
# Train the model (Auto-resumes from latest checkpoint if present)
python src/train.py --output_dir models/indian-accent-loraIndiVoice-DeepASR/
├── assets/ # Branding and visual elements
├── kaggle/ # Kaggle training utilities and scripts
├── src/ # Core codebase for training, preprocessing, and deployment
├── notebooks/ # Jupyter Notebooks for experimentation
├── data/ # Dataset symlinks and manifests
├── models/ # Checkpoints and serialized weights
└── paper/ # Source files for academic publication
If you use this work in your research, please cite:
@misc{indivoice2026,
author = {Purvansh Joshi and Archit Mittal},
title = {IndiVoice-DeepASR: Efficient Adaptation of Multilingual Speech Models for Indian Accents},
year = {2026},
publisher = {GitHub},
howpublished = {\url{https://github.com/PxA-Labs/IndiVoice-DeepASR}}
}