Testing STT/TTS Services

Currently testing Deepgram for both STT and TTS.

- [Flux](https://developers.deepgram.com/docs/flux/quickstart) is an STT model that also does VAD and SmartTurns, but is English only. So far, it has been awesome. Very fast, very accurate. I actually would like to use this since I think it made a huge difference in the quality and responsiveness. Not having to process VAD/ST locally improved startup times as well as seemed to reduce total latency.
- Trying out `aura-2-helena-en` for TTS. I think it sounds good, better than Google Chirp3, and also faster. Most of the "Aura-2" models sounded good, I just think Helena sounded the best for LawLine.
- Will need to discuss the "[Model Improvement Program](https://developers.deepgram.com/docs/the-deepgram-model-improvement-partnership-program)" which allows them to train on the data but gives a 50% discount. Their pricing with MIP is very good, without it is in line with the other good providers. Also, Pipecat currently doesn't support disabling MIP from the `DeepgramTTSService`, but I should be able to subclass it and fix it. I might also do a PR for it.

I also tried [Rime](https://www.rime.ai/) for TTS earlier today since they're supposed to have very realistic voices, but I wasn't impressed. Their newest Arcana models required a new Pipecat module that seemed slightly buggy, and their older models didn't sound good. Actually, neither of them sounded very good at 8 kHz. Overall, I just wasn't that impressed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing STT/TTS Services #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Testing STT/TTS Services #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions