IndiVoice-DeepASR: Indian-Accented Speech Recognition

IndiVoice-DeepASR: Indian-Accented Speech Recognition

Bridging the Accent Gap in Modern ASR with Whisper + LoRA

Note

Active Repository Notice: The primary development, active issues, and latest updates for this project are maintained at the official organization repository: PxA-Labs/IndiVoice-DeepASR. Please direct all issues, feature requests, and contributions there.

Explore the Code • Launch Colab • Launch Kaggle

Important

Resilience Update (v1.8): Implemented high-frequency checkpointing (every 100 steps) and auto-resumption to protect training progress against Colab or Kaggle runtime disconnections. Resolved load_best_model_at_end compatibility issues.

Overview

Current commercial automatic speech recognition (ASR) systems demonstrate a 20-30% performance degradation when processing Indian English accents. IndiVoice-DeepASR is a research-driven project that fine-tunes OpenAI's Whisper models using Low-Rank Adaptation (LoRA) to achieve state-of-the-art accuracy across diverse Indian linguistic profiles.

Key Features

Fault-Tolerant Training: Automatic checkpoint detection and seamless resumption to safeguard progress during remote GPU training disconnections.
Parameter Efficiency: Fine-tune with less than 2% of total parameters using Parameter-Efficient Fine-Tuning (PEFT) techniques.
Accent Localization: Optimized to handle Hindi, Tamil, Kannada, Bengali, and Punjabi regional accents.
Robust Audio Pipeline: Multi-layered AudioDecoder logic for stable preprocessing across diverse computing environments.
Enhanced Accuracy: Accomplishes significant Word Error Rate (WER) reductions compared to the base Whisper model.

Technical Stack and Architecture

Model Backbone	Optimization	Audio Engine
Cloud Compute	Deployment	Infrastructure

Pipeline Flow

graph LR
    A[Raw Audio] --> B(Standardization: 16kHz Mono)
    B --> C{IndiVoice Engine}
    C --> D[Whisper Backbone]
    C --> E[LoRA Adapters]
    D & E --> F[Optimized Transcripts]
    F --> G[Metric Analysis: WER/CER]

Getting Started

1. Cloud-Based Training (Recommended)

Choose your preferred platform for free GPU access:

Colab Gateway: Best for initial setup and rapid experimentation.
Kaggle Runner: Optimized for long-running training. Includes setup_kaggle.sh for environment configuration.

2. Local Setup and Execution

# Clone the repository
git clone https://github.com/PxA-Labs/IndiVoice-DeepASR.git
cd IndiVoice-DeepASR

# Install dependencies
pip install -r requirements.txt

# Preprocess the Svarah dataset
python src/preprocess.py --hf_dataset ai4bharat/Svarah --output_dir data/processed

# Train the model (Auto-resumes from latest checkpoint if present)
python src/train.py --output_dir models/indian-accent-lora

Repository Structure

IndiVoice-DeepASR/
├── assets/            # Branding and visual elements
├── kaggle/            # Kaggle training utilities and scripts
├── src/               # Core codebase for training, preprocessing, and deployment
├── notebooks/         # Jupyter Notebooks for experimentation
├── data/              # Dataset symlinks and manifests
├── models/            # Checkpoints and serialized weights
└── paper/             # Source files for academic publication

Academic Citation

If you use this work in your research, please cite:

@misc{indivoice2026,
  author = {Purvansh Joshi and Archit Mittal},
  title = {IndiVoice-DeepASR: Efficient Adaptation of Multilingual Speech Models for Indian Accents},
  year = {2026},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/PxA-Labs/IndiVoice-DeepASR}}
}

Built for the Indian Speech Recognition Research Community

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.agents/workflows		.agents/workflows
.github		.github
assets		assets
docs		docs
final_deploy		final_deploy
kaggle		kaggle
notebooks		notebooks
src		src
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IndiVoice-DeepASR: Indian-Accented Speech Recognition

Overview

Key Features

Technical Stack and Architecture

Pipeline Flow

Getting Started

1. Cloud-Based Training (Recommended)

2. Local Setup and Execution

Repository Structure

Academic Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IndiVoice-DeepASR: Indian-Accented Speech Recognition

Overview

Key Features

Technical Stack and Architecture

Pipeline Flow

Getting Started

1. Cloud-Based Training (Recommended)

2. Local Setup and Execution

Repository Structure

Academic Citation

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages