Skip to content

Add TTS processing pipeline#100

Merged
Jorjeous merged 44 commits into
mainfrom
fqian/tts-sdp
May 8, 2025
Merged

Add TTS processing pipeline#100
Jorjeous merged 44 commits into
mainfrom
fqian/tts-sdp

Conversation

@fqian1107

@fqian1107 fqian1107 commented Mar 4, 2025

Copy link
Copy Markdown
Collaborator

Add a TTS data processing pipeline that:

  1. Creates initial manifest by resampling audio to 16kHz mono WAV format
  2. Runs speaker diarization and overlap detection using pyannote
  3. Splits long audio segments
  4. Aligns text and audio using NeMo ASR models
  5. Joins split audio metadata back together
  6. Merges alignment and diarization information
  7. Performs inverse text normalization
  8. Calculates audio quality metrics using TorchSQUIM
  9. Estimates audio bandwidth
  10. Generate TTS usable segments by merging single speaker segments into desired segment lengths based on min_duration and max_duration.

@Jorjeous Jorjeous self-requested a review March 4, 2025 12:21
@Jorjeous Jorjeous requested a review from lilithgrigoryan March 5, 2025 14:02
@Jorjeous Jorjeous added the enhancement New feature or request label Mar 5, 2025
@Jorjeous Jorjeous requested a review from karpnv March 10, 2025 11:30
@fqian1107 fqian1107 force-pushed the fqian/tts-sdp branch 5 times, most recently from 7265906 to a0bb194 Compare April 28, 2025 11:28

@karpnv karpnv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please hide extra dependencies inside processor's class constructor

Comment thread sdp/processors/datasets/ytc/create_initial_manifest.py Outdated
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
@fqian1107 fqian1107 force-pushed the fqian/tts-sdp branch 8 times, most recently from 00994c4 to c4403b6 Compare May 7, 2025 10:44

@karpnv karpnv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add all your processors to docs/src/sdp/api.rst

@karpnv karpnv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jorjeous Jorjeous left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top, waiting for tests to finish and ready to merge

Comment thread sdp/processors/tts/merge_alignment_diarization.py
Comment thread sdp/processors/tts/metrics.py
Comment thread sdp/processors/tts/nemo_asr_align.py
Comment thread sdp/processors/tts/prepare_tts_segments.py
Comment thread sdp/processors/tts/pyannote.py
Comment thread sdp/processors/tts/split.py
Comment thread sdp/processors/tts/text.py
Comment thread .github/workflows/docker_pull.yml
Comment thread .github/workflows/tests.yml
Jorjeous and others added 9 commits May 7, 2025 05:54
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
@Jorjeous Jorjeous merged commit bca5d1f into main May 8, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants