Skip to content

Validate chapter duration in HiFiTTS-2 download#129

Merged
Jorjeous merged 2 commits into
mainfrom
hifitts2_duration
Jun 17, 2025
Merged

Validate chapter duration in HiFiTTS-2 download#129
Jorjeous merged 2 commits into
mainfrom
hifitts2_duration

Conversation

@rlangman

Copy link
Copy Markdown
Collaborator

There is an issue when downloading HiFiTTS-2 in which some audiobook chapters have been updated/edited since the dataset was first created, causing the utterances in the chapter to be misaligned.

To fix this, each chapter has had its full duration added to the dataset metadata. If the downloaded chapter file has a duration different than the original duration when the dataset is created, it will be marked as an error and removed from the dataset.

Also after adding this, and finding there are a few other random errors that can happen during download (e.g. http.client.RemoteDisconnected), I removed the "error_code" from the output which is unique to the HTTPError.

Signed-off-by: Ryan <rlangman@nvidia.com>
@rlangman rlangman requested review from Jorjeous and karpnv June 13, 2025 23:13
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
@Jorjeous

Copy link
Copy Markdown
Collaborator

LGTM

@Jorjeous Jorjeous left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1)HIFITTS part looks good
2)Added installation in dockerfile to prevent cascade errors

LGTM

@Jorjeous Jorjeous merged commit 22a6bfe into main Jun 17, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants