Skip to content

Add 60db TTS provider (Hindi/Indian voices) — live-API-verified#27

Open
ConalMullan wants to merge 2 commits into
mainfrom
feat/sixtydb-tts-integration
Open

Add 60db TTS provider (Hindi/Indian voices) — live-API-verified#27
ConalMullan wants to merge 2 commits into
mainfrom
feat/sixtydb-tts-integration

Conversation

@ConalMullan

Copy link
Copy Markdown
Collaborator

Adds 60db (https://60db.ai) as a third TTS provider alongside ElevenLabs and Qwen3. Builds on @manishEMS47's work in #26 (their original commit is preserved here) with fixes to make it work against the live 60db API, plus end-to-end verification.

Why 60db

Fills a real gap in the toolkit: native Hindi + Indian-accented English voices, cheaper than ElevenLabs ($0.00002/char) and faster (RTF ~0.22). Qwen3 is English/Chinese-leaning and ElevenLabs is premium-priced for Indic languages. Verified live that the default voice produces good-quality English and Hindi (Devanagari) speech.

The fix (on top of #26)

The original integration coded to 60db's documented /tts-synthesize contract (single JSON {audio_base64}, mp3). In production the endpoint actually streams newline-delimited JSON of raw 48 kHz PCM (Content-Type: application/x-ndjson) with a trailing {metadata} line — so the default path failed with "Invalid JSON response."

tools/sixtydb_tts.py:

  • Rewrote _synthesize_rest to consume the NDJSON PCM stream, while still accepting the documented single-JSON shape if 60db ships it (defensive both ways).
  • Added _finalize_audio — sniffs bytes for an audio container (mp3/wav/ogg/flac) and writes/transcodes as-is, else wraps raw PCM as WAV and transcodes to --output-format via ffmpeg.
  • Added _derive_pcm_sample_rate — infers rate from byte-count ÷ metadata.audio_sec instead of hardcoding.
  • Surfaces 60db metadata.warnings; routed _synthesize_stream through the same PCM finalizer and flagged that /tts-stream currently returns HTTP 500 upstream.

Verified live ✅

  • sixtydb_tts.py → valid MP3 (ID3v2.4, 48 kHz mono), EN + Hindi
  • voiceover.py --provider 60db --scene-dir → correct per-scene MP3s + the JSON shape sync_timing.py consumes
  • Compiles clean (Python 3.9 compatible), dry-run works

Not verified

  • redub.py --tts-provider 60db — needs an ElevenLabs key (Scribe STT) + a video; delegation logic is straightforward but untested end-to-end.
  • websocket transport — matches docs, not needed for batch voiceover (minor wss:// vs documented ws:// discrepancy to confirm).

Closes #26.

🤖 Generated with Claude Code

manishEMS47 and others added 2 commits June 8, 2026 16:16
The original integration coded to 60db's documented /tts-synthesize contract
(single JSON object with `audio_base64` in the requested container format).
In production the endpoint instead streams newline-delimited JSON of raw
16-bit mono PCM (Content-Type: application/x-ndjson) with a trailing
`{metadata}` line, so the default voiceover path failed with "Invalid JSON
response".

Changes (tools/sixtydb_tts.py):
- Rewrite _synthesize_rest to consume the NDJSON PCM stream, while still
  accepting the documented single-JSON shape if 60db ships it.
- Add _finalize_audio: sniff bytes for an audio container (mp3/wav/ogg/flac)
  and write/transcode as-is, else wrap raw PCM as WAV and transcode to the
  requested --output-format via ffmpeg.
- Add _derive_pcm_sample_rate: infer the rate from byte-count and the
  metadata audio_sec (snap to nearest standard rate) instead of hardcoding.
- Surface 60db metadata warnings; route _synthesize_stream through the same
  PCM-aware finalizer and flag that /tts-stream currently 500s upstream.

Verified live: sixtydb_tts.py and `voiceover.py --provider 60db --scene-dir`
both produce valid 48kHz MP3s for English and Hindi.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ConalMullan ConalMullan mentioned this pull request Jun 8, 2026
Comment thread tools/sixtydb_tts.py
finally:
try:
ws.close()
except Exception:
@ConalMullan

Copy link
Copy Markdown
Collaborator Author

@manishEMS47 — heads up: I built directly on your #26 work here (your original commit is preserved with your authorship) and added one fix so it works against the live 60db API. The only issue was that the code followed 60db's documented /tts-synthesize response shape (single JSON audio_base64), but in production the endpoint streams NDJSON of raw 48 kHz PCM — so I made the parsing handle both and transcode to the requested format. Tested end-to-end (English + Hindi → valid MP3, plus voiceover.py --provider 60db).

Would love a quick look if you have a moment — happy to adjust anything. If I don't hear back in a day or two I'll go ahead and merge so it doesn't stall. Thanks again for adding this — the Hindi/Indian-voice support fills a real gap for us. 🙌

@ConalMullan

Copy link
Copy Markdown
Collaborator Author

Hi @manishEMS47 — wanted to loop you back in before we take this further.

Quick status: I built on your #26 work to get 60db running against the live API. The main thing was that the production /tts-synthesize endpoint doesn't match the documented contract — instead of a single JSON {audio_base64} mp3, it streams newline-delimited JSON of raw 48 kHz PCM with a trailing metadata line. So the original path failed with "Invalid JSON response." I rewrote the synth path to consume the NDJSON PCM stream (while still accepting the documented single-JSON shape if 60db ships it later), added container-sniffing + PCM→WAV finalization, and infer the sample rate from the metadata. It's verified end-to-end now — clean MP3 output in both English and Hindi, and per-scene voiceover that plugs into the rest of the toolkit.

Once this merges, 60db becomes a first-class TTS provider in the toolkit alongside ElevenLabs and Qwen3 — native Hindi/Indic voices is a genuine gap it fills, so we're keen to ship it well.

Before we do, there are a few things only your side can confirm, and I'd rather get your input than guess:

  1. /tts-stream returns HTTP 500 upstream right now — is that a known issue, and is a fix coming? Batch voiceover doesn't need it, but I've flagged it in the code.
  2. wss:// vs ws:// — the docs say ws:// but the live endpoint looks like wss://. Which is canonical?
  3. redub.py --tts-provider 60db — the delegation logic is straightforward but I couldn't test it end-to-end (needs an ElevenLabs Scribe key + a video). If you're able to run it once, that'd close the last gap.

No rush, but I'd like to have you back in the loop before merging — it's your integration as much as ours, and a quick confirmation on the above (especially the stream 500) would let us ship it with confidence. Happy to jump on anything if it's easier.

Thanks again for kicking this off!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant