Conversation
e2b7c6a to
5ba4562
Compare
Signed-off-by: Ryan <rlangman@nvidia.com>
There was a problem hiding this comment.
Grea work! Please add 2 end-to-end tests for 2 new pipelines and mention new processors in
https://github.com/NVIDIA/NeMo-speech-data-processor/blob/main/docs/src/sdp/api.rst
|
Since we usually skip a download processor in the end-to-end test, you can just test RemovedFailedChapters processor in the unit test |
Signed-off-by: Ryan <rlangman@nvidia.com>
I tried to add an end to end test, and more documentation. Let me know if there are any issues, or if I need to upload test data. I also added a generic bandwidth estimation script, which was used when creating the dataset (but running it during the dataset download is not needed). |
Signed-off-by: Ryan <rlangman@nvidia.com>
|
"/home/runner/work/NeMo-speech-data-processor/NeMo-speech-data-processor/test_data/english/hifitts2/manifest_filtered_22khz.json" |
|
Ways to solve: |
|
Docs ok, errors expected as pages not exist yet |
|
Also lacking files: |
|
Could you please rewrite class descriptions in this format? (codeblock with example at the end) class CreateInitialManifestYTC(BaseParallelProcessor): |
Jorjeous
left a comment
There was a problem hiding this comment.
When mentioned above comments is addressed every thing is good to go!
Nice!
Signed-off-by: Ryan <rlangman@nvidia.com>
Added |
I have setup the data on S3 so it passes when I run locally. Will see if it passes in the automated PR tests. |
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Adds processors needed to download HiFiTTS-2. The input to the processors will be two files (e.g.
manifest_22khzandchapters_22khz) that the user downloads from another location:Example command:
This PR also contains a generic processor to estimate bandwidth of audio, which was used in creating HiFiTTS-2. It is not part of the downloading pipeline itself, as it is already precomputed and provided in the dataset manifest.
Example command: