Skip to content

Add Smallest AI Pulse STT (English)#93

Open
harshitajain165 wants to merge 6 commits into
SpeechColab:masterfrom
harshitajain165:add-smallest-ai-stt
Open

Add Smallest AI Pulse STT (English)#93
harshitajain165 wants to merge 6 commits into
SpeechColab:masterfrom
harshitajain165:add-smallest-ai-stt

Conversation

@harshitajain165

Copy link
Copy Markdown

Model

Smallest AI — Pulse STT (English)

Adds a cloud API model submission for Smallest AI's Pulse speech-to-text model, targeting English benchmarks via real-time WebSocket streaming.

What's included

  • model.yaml — metadata (entity, language, sample_rate)
  • SBI — entry point with API key validation
  • asr_api.py — WSS streaming via wss://api.smallest.ai/waves/v1/pulse/get_text, streams raw PCM in 4096-byte chunks, collects is_final transcript segments, with 5-retry logic, 200ms QPS throttling, and progress logging
  • docker/Dockerfile — miniconda3 base with pynini + regex via conda-forge, websockets via pip
  • README.md — model details, setup and usage docs

Local Validation (MINI_EN)

%WER 0.00 [ 0 / 14, 0 ins, 0 del, 0 sub ]
%SER 0.00 [ 0 / 2 ]

Notes

  • API credentials go in assets/API_KEY (git-ignored). Happy to provide API access for the full benchmark run — please reach out at harshitajain@smallest.ai.

Implements the SBI inference wrapper for Smallest AI's Pulse STT API,
targeting English benchmarks (LibriSpeech, TEDLium3, GigaSpeech, etc.).
Replaced ubuntu:20.04 base with continuumio/miniconda3 to install
pynini via conda-forge, avoiding OpenFST build dependency issues.
Confirmed via live API call that the transcript is returned under
the 'transcription' key, not 'text'.
- Add 200ms QPS interval between requests to avoid rate limiting on
  large test sets (LibriSpeech ~2600 clips)
- Increase MAX_RETRIES from 3 to 5
- Add [PROGRESS] n/total logging for monitoring long runs
- Remove stale comment about response field name
Replaces HTTP batch endpoint with WebSocket streaming endpoint
(wss://api.smallest.ai/waves/v1/pulse/get_text) for better accuracy.

- Streams raw PCM frames in 4096-byte binary chunks
- Collects is_final=true transcript segments and concatenates
- Waits for is_last=true to close connection cleanly
- Retains 5-retry logic, QPS throttle, and progress logging
- Adds websockets to Dockerfile
@harshitajain165 harshitajain165 marked this pull request as ready for review May 12, 2026 08:09
@harshitajain165

Copy link
Copy Markdown
Author

Hey @dophist - just wanted to give a gentle nudge on this PR! I've submitted #93 adding Smallest AI Pulse STT (English) to the leaderboard.

It includes the model config, a WebSocket streaming API runner, Dockerfile, and README. Local validation on MINI_EN passed with 0% WER.

Happy to provide API credits for the full benchmark run. Please just let me know of the email you would use for benchmarking for me to add credits.

@harshitajain165

Copy link
Copy Markdown
Author

Hey @dophist

Bumping this PR. Would love to get this reviewed and benchmarked when feasible for you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant