-
Notifications
You must be signed in to change notification settings - Fork 0
Testing STT/TTS Services #48
Copy link
Copy link
Open
Labels
designDesign planningDesign planning
Description
Currently testing Deepgram for both STT and TTS.
- Flux is an STT model that also does VAD and SmartTurns, but is English only. So far, it has been awesome. Very fast, very accurate. I actually would like to use this since I think it made a huge difference in the quality and responsiveness. Not having to process VAD/ST locally improved startup times as well as seemed to reduce total latency.
- Trying out
aura-2-helena-enfor TTS. I think it sounds good, better than Google Chirp3, and also faster. Most of the "Aura-2" models sounded good, I just think Helena sounded the best for LawLine. - Will need to discuss the "Model Improvement Program" which allows them to train on the data but gives a 50% discount. Their pricing with MIP is very good, without it is in line with the other good providers. Also, Pipecat currently doesn't support disabling MIP from the
DeepgramTTSService, but I should be able to subclass it and fix it. I might also do a PR for it.
I also tried Rime for TTS earlier today since they're supposed to have very realistic voices, but I wasn't impressed. Their newest Arcana models required a new Pipecat module that seemed slightly buggy, and their older models didn't sound good. Actually, neither of them sounded very good at 8 kHz. Overall, I just wasn't that impressed.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
designDesign planningDesign planning
Type
Projects
Status
In progress