Welcome to the Briskk AI Speech-to-Text Assignment! 🎤
This challenge will test your AI integration, API development, and problem-solving skills through a structured sequence of tasks.
🚀 Your Goal: Build a real-time, noise-resilient voice-based search assistant that:
✅ Converts voice input (audio file or live mic input) into text.
✅ Suggests smart search autocompletions based on user intent.
✅ Handles noisy background audio and improves speech accuracy.
✅ Supports real-time speech-to-search via WebSockets.
To ensure a smooth progression, complete each task in sequence:
✅ Implement a FastAPI service that:
- Accepts an audio file and converts speech to text using OpenAI Whisper or Mozilla DeepSpeech.
- Returns JSON output
{ "text": "<transcribed text>" }. - Test Input:
sample_data/clean_audio/sample_english.wav - Expected Output:
"Find me a red dress"
📌 API:
POST /api/voice-to-text
Content-Type: multipart/form-data✅ Enhance speech recognition by:
- Filtering background noise using RNNoise, DeepFilterNet, or PyDub.
- Comparing accuracy with and without noise removal.
- Test Input:
sample_data/noisy_audio/sample_noisy.wav - Expected Output (after denoising):
"Find me a red dress"
📌 Evaluation Criteria:
✔ Speech accuracy before vs after noise removal.
✔ Processing time must remain <1s.
✅ Implement an API that:
- Suggests relevant results based on user intent & previous searches.
- Ranks results dynamically based on popularity & trends.
- Test Input:
"find me" - Expected Output:
[ "find me a red dress", "find me a jacket" ]
📌 API:
GET /api/autocomplete?q=find+me📌 How to Improve?
- Store previous searches in Redis for ranking.
- Use AI embeddings (OpenAI or BERT) for better matching.
✅ Upgrade the system to process live speech queries via WebSockets:
- Accept real-time audio streams.
- Continuously transcribe & autocomplete results dynamically.
- Test: Use a live microphone input.
📌 API WebSocket:
/ws/speech-to-search✔ Bonus: Deploy the system using Docker & AWS Lambda.
| Test Case | Input | Expected Output |
|---|---|---|
| Speech Recognition | sample_data/clean_audio/sample_english.wav |
"Find me a red dress" |
| Noisy Speech | sample_data/noisy_audio/sample_noisy.wav |
"Find me a red dress" |
| Autocomplete Query | "find me" |
["find me a red dress", "find me a jacket"] |
| Live Streaming | Microphone | Real-time suggestions |
📂 All sample audio files are provided in sample_data/.
pip install fastapi uvicorn openai-whisper soundfile numpy scipyuvicorn src.main:app --reload- Open Swagger Docs → http://127.0.0.1:8000/docs
- Upload
sample_audio_english.wavand check transcription accuracy.
📌 Fork this repo & create a new branch candidate-<yourname>.
📌 Push your implementation & submit a Pull Request (PR).
📌 Explain your approach in a README ( Document trade-offs (e.g., why Whisper vs. DeepSpeech, Redis vs. Pinecone for ranking)).
📌 **Good to have - A deployed version **.
For questions, contact us at: wizard@briskk.one
📢 Have questions? Drop an email at wizard@briskk.one 🚀