-
Notifications
You must be signed in to change notification settings - Fork 1
[Bug] Incorrect Timestamps in Subtitles for YouTube Video #247
Copy link
Copy link
Open
Labels
Description
Description
I've encountered an issue with incorrect timestamps in the subtitles generated for a YouTube video using the speech_processor tool.
Steps to Reproduce
- Prepare an input JSON file named
xjEFo3a1AnI_input.jsonwith the following content:
[
{
"integration": "youtube",
"id": "xjEFo3a1AnI",
"language_code": "en-US",
"resource_id": 7231,
"recognizer": "google",
"captions": "try"
}
] Note: The YouTube video in question exists and has manual captions in English (US).
- Run the speech_processor with the following command:
LOG_OUTPUT=standard MAX_THREADS=8 INPUT_FILE='xjEFo3a1AnI_input.json' SPEECH_ENV='development' SUBS_LOCATION='file' python3 .- After the process completes, inspect the generated subtitles in
resources/subtitles/development.
$ cd resources/subtitles/development
$ grep -n "1038" 123145579446272-7231-11-11.18:03:26723463-subs.srt | cut -d: -f1 | xargs -I {} awk 'NR>={}-5 && NR<={}+5' 123145579446272-7231-11-11.18:03:26723463-subs.srtObserved Behavior
The timestamps in the generated subtitles file 123145579446272-7231-11-11.18:03:26723463-subs.srt are incorrect. For instance, the following excerpt shows an issue with the timestamps:
1037
00:00:02,731 --> 00:00:02,736
Usually, for depression, I
want to see at least greater
1038
00:00:02,736 --> 00:00:02,739
than probably 0.8 minimal.
1039
00:00:02,739 --> 00:00:02,744
Expected Behavior
The timestamps in the subtitles should accurately reflect the timing of the spoken words in the video.
Additional Information
Environment: Development
Tool Version: v3.0.0
Python Version: Python 3.10.0
Operating System: macOS 14.1.1
Reactions are currently unavailable