Add Smallest AI Pulse STT (English)#93
Open
harshitajain165 wants to merge 6 commits into
Open
Conversation
Implements the SBI inference wrapper for Smallest AI's Pulse STT API, targeting English benchmarks (LibriSpeech, TEDLium3, GigaSpeech, etc.).
Replaced ubuntu:20.04 base with continuumio/miniconda3 to install pynini via conda-forge, avoiding OpenFST build dependency issues.
Confirmed via live API call that the transcript is returned under the 'transcription' key, not 'text'.
- Add 200ms QPS interval between requests to avoid rate limiting on large test sets (LibriSpeech ~2600 clips) - Increase MAX_RETRIES from 3 to 5 - Add [PROGRESS] n/total logging for monitoring long runs - Remove stale comment about response field name
Replaces HTTP batch endpoint with WebSocket streaming endpoint (wss://api.smallest.ai/waves/v1/pulse/get_text) for better accuracy. - Streams raw PCM frames in 4096-byte binary chunks - Collects is_final=true transcript segments and concatenates - Waits for is_last=true to close connection cleanly - Retains 5-retry logic, QPS throttle, and progress logging - Adds websockets to Dockerfile
Author
|
Hey @dophist - just wanted to give a gentle nudge on this PR! I've submitted #93 adding Smallest AI Pulse STT (English) to the leaderboard. It includes the model config, a WebSocket streaming API runner, Dockerfile, and README. Local validation on MINI_EN passed with 0% WER. Happy to provide API credits for the full benchmark run. Please just let me know of the email you would use for benchmarking for me to add credits. |
Author
|
Hey @dophist Bumping this PR. Would love to get this reviewed and benchmarked when feasible for you |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Model
Smallest AI — Pulse STT (English)
Adds a cloud API model submission for Smallest AI's Pulse speech-to-text model, targeting English benchmarks via real-time WebSocket streaming.
What's included
model.yaml— metadata (entity, language, sample_rate)SBI— entry point with API key validationasr_api.py— WSS streaming viawss://api.smallest.ai/waves/v1/pulse/get_text, streams raw PCM in 4096-byte chunks, collectsis_finaltranscript segments, with 5-retry logic, 200ms QPS throttling, and progress loggingdocker/Dockerfile— miniconda3 base with pynini + regex via conda-forge, websockets via pipREADME.md— model details, setup and usage docsLocal Validation (MINI_EN)
Notes
assets/API_KEY(git-ignored). Happy to provide API access for the full benchmark run — please reach out at harshitajain@smallest.ai.