Skip to content

purvanshjoshi/IndiVoice-DeepASR

IndiVoice Banner

IndiVoice-DeepASR: Indian-Accented Speech Recognition

Bridging the Accent Gap in Modern ASR with Whisper + LoRA

Note

Active Repository Notice: The primary development, active issues, and latest updates for this project are maintained at the official organization repository: PxA-Labs/IndiVoice-DeepASR. Please direct all issues, feature requests, and contributions there.

GitHub Stars Hugging Face PyTorch License

Explore the CodeLaunch ColabLaunch Kaggle

Important

Resilience Update (v1.8): Implemented high-frequency checkpointing (every 100 steps) and auto-resumption to protect training progress against Colab or Kaggle runtime disconnections. Resolved load_best_model_at_end compatibility issues.


Overview

Current commercial automatic speech recognition (ASR) systems demonstrate a 20-30% performance degradation when processing Indian English accents. IndiVoice-DeepASR is a research-driven project that fine-tunes OpenAI's Whisper models using Low-Rank Adaptation (LoRA) to achieve state-of-the-art accuracy across diverse Indian linguistic profiles.

Key Features

  • Fault-Tolerant Training: Automatic checkpoint detection and seamless resumption to safeguard progress during remote GPU training disconnections.
  • Parameter Efficiency: Fine-tune with less than 2% of total parameters using Parameter-Efficient Fine-Tuning (PEFT) techniques.
  • Accent Localization: Optimized to handle Hindi, Tamil, Kannada, Bengali, and Punjabi regional accents.
  • Robust Audio Pipeline: Multi-layered AudioDecoder logic for stable preprocessing across diverse computing environments.
  • Enhanced Accuracy: Accomplishes significant Word Error Rate (WER) reductions compared to the base Whisper model.

Technical Stack and Architecture

Model Backbone
HF
Optimization
PEFT
Audio Engine
Audio
Cloud Compute
Kaggle Colab
Deployment
Gradio
Infrastructure
CUDA

Pipeline Flow

graph LR
    A[Raw Audio] --> B(Standardization: 16kHz Mono)
    B --> C{IndiVoice Engine}
    C --> D[Whisper Backbone]
    C --> E[LoRA Adapters]
    D & E --> F[Optimized Transcripts]
    F --> G[Metric Analysis: WER/CER]
Loading

Getting Started

1. Cloud-Based Training (Recommended)

Choose your preferred platform for free GPU access:

  • Colab Gateway: Best for initial setup and rapid experimentation.
  • Kaggle Runner: Optimized for long-running training. Includes setup_kaggle.sh for environment configuration.

2. Local Setup and Execution

# Clone the repository
git clone https://github.com/PxA-Labs/IndiVoice-DeepASR.git
cd IndiVoice-DeepASR

# Install dependencies
pip install -r requirements.txt

# Preprocess the Svarah dataset
python src/preprocess.py --hf_dataset ai4bharat/Svarah --output_dir data/processed

# Train the model (Auto-resumes from latest checkpoint if present)
python src/train.py --output_dir models/indian-accent-lora

Repository Structure

IndiVoice-DeepASR/
├── assets/            # Branding and visual elements
├── kaggle/            # Kaggle training utilities and scripts
├── src/               # Core codebase for training, preprocessing, and deployment
├── notebooks/         # Jupyter Notebooks for experimentation
├── data/              # Dataset symlinks and manifests
├── models/            # Checkpoints and serialized weights
└── paper/             # Source files for academic publication

Academic Citation

If you use this work in your research, please cite:

@misc{indivoice2026,
  author = {Purvansh Joshi and Archit Mittal},
  title = {IndiVoice-DeepASR: Efficient Adaptation of Multilingual Speech Models for Indian Accents},
  year = {2026},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/PxA-Labs/IndiVoice-DeepASR}}
}

Built for the Indian Speech Recognition Research Community

Python

About

Bridging the Accent Gap in Modern ASR — Fine-tuning OpenAI Whisper with LoRA for Indian-Accented Speech Recognition

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors