Fails to detect two speakers and can't add them after the fact #28

samkahn-prog · 2026-05-20T23:13:38Z

samkahn-prog
May 20, 2026

Before starting transcription, I specify there are two speakers in the clip, and name the interviewer and subject in the popup. After transcription runs, the transcript only shows a single speaker (interviewer) and combines all the dialogue.

Why does it fail to identify the 2 speakers and only list a single one despite my specification?
Is there a way to manually add the 2nd speaker to the transcript and edit the attribution? There's only a single speaker shown, so clicking on their name in the transcript does nothing.

DozaVisuals · 2026-05-21T00:07:31Z

DozaVisuals
May 21, 2026
Maintainer

Hey Sam — thanks for the clear report, this is a real bug, not anything you did wrong.

Quick explanation: on Apple Silicon, Doza Assist tries Parakeet first because it's the fastest engine, but Parakeet doesn't actually do speaker diarization. So even though you specified 2 speakers and named them in the popup, every word ended up labeled with whoever you typed as the first speaker. WhisperX (the engine that does diarize) never got reached because Parakeet was already running.

Just opened #29 to fix the routing — any project with more than 1 speaker now skips Parakeet and goes straight to WhisperX. Will land in the next release.

Heads-up though: WhisperX uses pyannote-audio for diarization, which needs a free HuggingFace token to download the model. Once the fix ships, the one-time setup is:

Create a free HuggingFace account
Accept the model terms at https://huggingface.co/pyannote/speaker-diarization-3.1
Generate a read token in your HF settings
Set HF_TOKEN=<your-token> in your environment before launching the app

If you skip the token step, you'll still get a working transcript — just back to one speaker, but with a clear warning in the terminal so you know what's missing instead of it being a mystery. I'd love to bundle that setup more smoothly down the road, but for now the env var is the path.

On your second question — yeah, being able to fix speaker attributions in the transcript UI after the fact (without having to re-run the whole thing) is on the list. It's queued behind this routing fix but I hear you that it's an obvious gap.

I'll comment here when the release ships.

0 replies

DozaVisuals · 2026-05-21T00:18:34Z

DozaVisuals
May 21, 2026
Maintainer

Hey Sam — v3.5.4 is out with the routing fix: https://github.com/DozaVisuals/doza-assist/releases/tag/v3.5.4

Grab Doza-Assist-3.5.4.dmg from the release page, drag the app to Applications (replacing the existing copy), and the multi-speaker case should now actually distinguish speakers.

One thing to do before you re-run that interview: set up the HuggingFace token, otherwise diarization will still skip silently. The one-time setup is:

Create a free account at https://huggingface.co (if you don't have one)
Accept the model terms at https://huggingface.co/pyannote/speaker-diarization-3.1 (click the gate)
Generate a read token at https://huggingface.co/settings/tokens
Set HF_TOKEN=<your-token> in your environment before launching the app

Easiest persistent way on macOS — add this line to ~/.zshrc:

export HF_TOKEN=hf_yourtokenhere

Then open a fresh terminal and launch Doza Assist from there, or just relaunch via Finder (the env var will be picked up by GUI apps after a logout/login).

If you forget the token step, you'll still get a transcript — just back to one speaker, but with a clear warning in server.log explaining what's missing. So no silent failure this time.

Let me know how the re-run goes.

0 replies

samkahn-prog · 2026-05-21T23:05:47Z

samkahn-prog
May 21, 2026
Author

I followed those steps: set the token in .zshrc, logged out, logged back in. Running the transcription still resulted in a single speaker.

2 replies

samkahn-prog May 25, 2026
Author

@DozaVisuals Just checking in, is there anything else I should try?

DozaVisuals May 25, 2026
Maintainer

Thanks for checking in! Let me try a couple things tomorrow and I’ll get back to you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fails to detect two speakers and can't add them after the fact #28

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Fails to detect two speakers and can't add them after the fact #28

Uh oh!

samkahn-prog May 20, 2026

Replies: 3 comments · 2 replies

Uh oh!

DozaVisuals May 21, 2026 Maintainer

Uh oh!

DozaVisuals May 21, 2026 Maintainer

Uh oh!

samkahn-prog May 21, 2026 Author

Uh oh!

samkahn-prog May 25, 2026 Author

Uh oh!

DozaVisuals May 25, 2026 Maintainer

samkahn-prog
May 20, 2026

Replies: 3 comments 2 replies

DozaVisuals
May 21, 2026
Maintainer

DozaVisuals
May 21, 2026
Maintainer

samkahn-prog
May 21, 2026
Author

samkahn-prog May 25, 2026
Author

DozaVisuals May 25, 2026
Maintainer