Skip to content

feat(tts): add Azure AI Speech provider#86

Open
RayJiang4S wants to merge 2 commits into
calesthio:mainfrom
RayJiang4S:ray/azure-ai-speech-tts-provider
Open

feat(tts): add Azure AI Speech provider#86
RayJiang4S wants to merge 2 commits into
calesthio:mainfrom
RayJiang4S:ray/azure-ai-speech-tts-provider

Conversation

@RayJiang4S

Copy link
Copy Markdown

Summary

Adds an Azure AI Speech text-to-speech provider for OpenMontage.

Highlights:

  • REST synthesis via Azure AI Speech with AZURE_SPEECH_KEY and AZURE_SPEECH_REGION
  • operation=list_voices for voice catalog discovery
  • SSML generation with voice, style, role, rate, pitch, volume, and sentence silence controls
  • Custom endpoint and custom voice deployment support
  • Optional SDK mode for word-boundary timing metadata when azure-cognitiveservices-speech is installed
  • TTS selector compatibility through the existing capability/provider registry
  • Documentation and provider contract coverage

Validation

Passed:

.venv/bin/python -m pytest -q tests/tools/test_azure_tts.py tests/contracts/test_phase3_contracts.py
# 71 passed

Also validated manually with a live Azure Speech resource:

  • provider status: available
  • operation=list_voices for zh-CN: 49 voices returned
  • direct SDK synthesis succeeded
  • SDK word-boundary metadata returned 13 boundaries / 11 words for a Chinese sample
  • tts_selector routed preferred_provider=azure to azure_tts and generated audio

Full-suite note:

.venv/bin/python -m pytest -q

currently stops during collection in tests/qa/test_08_end_to_end.py because its fixture does not include the now-required render_runtime property for edit_decisions. That appears unrelated to this Azure provider change.

Add an Azure AI Speech TTS provider with REST synthesis, voice catalog discovery, SSML controls, custom endpoint support, and optional SDK word-boundary metadata. Document setup and include provider contract coverage.
@RayJiang4S RayJiang4S requested a review from calesthio as a code owner May 22, 2026 05:20
Add Azure preflight metadata and REST fallback when word-boundary timing is requested but the optional SDK is missing, unless callers explicitly require word boundaries.
@RayJiang4S

Copy link
Copy Markdown
Author

Updated this PR with a robustness fix from the multi-provider TTS audition workflow.

What changed:

  • Added operation: \"preflight\" to report Azure credential/region status, SDK availability, word-boundary intent, and fallback behavior before batch generation.
  • Added require_word_boundaries; when false, Azure now falls back to REST synthesis if enable_word_boundaries requested the optional SDK but the SDK is unavailable.
  • REST fallback records word_boundary_fallback: \"rest_without_word_boundaries\", empty words/boundaries, and a warning in output metadata so review pages can explain why timing metadata is missing while still producing audio.
  • Documented the fallback and preflight workflow in docs/PROVIDERS.md.

Validation:

  • .venv/bin/python -m pytest tests/tools/test_azure_tts.py -q passed: 10 tests.
  • .venv/bin/python -m pytest tests/contracts/test_phase3_contracts.py -q passed: 63 tests.
  • .venv/bin/python -m py_compile tools/audio/azure_tts.py tests/tools/test_azure_tts.py passed.
  • git diff --check passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant