Skip to content

Improve LRC output by removing SRT styles#21

Open
v2lmmj04 wants to merge 1 commit intoercanserteli:masterfrom
v2lmmj04:strip-subtitle-styles
Open

Improve LRC output by removing SRT styles#21
v2lmmj04 wants to merge 1 commit intoercanserteli:masterfrom
v2lmmj04:strip-subtitle-styles

Conversation

@v2lmmj04
Copy link

This addresses #20 .

ASS files, or even SRT files, may contain styles / markup that are not desirable to exist as part of the final output when exporting subtitles (especially LRC files).

This updates the ffmpeg command when converting subtitles to use a plain text format that removes any sort of styling. It results in SRT files also always being "converted" as a result, as they may have styling that needs removed as well.

It may not handle cases where convert_sub_if_needed is not called, such as through the condense_multi -> extract_srt code path that seems to skip calling this function.

I functionally tested by running .\make_exe.bat, updating the config per the issue, and dragging a folder over the new executable for three cases:

  • SRT file with no styling (陰の実力者になりたくて! Season 2, Episode 1)

    1
    00:00:09,225 --> 00:00:15,231
    (走る息遣い)
    
    2
    00:00:19,902 --> 00:00:22,905
    ハァハァ…。
    
    3
    00:00:22,905 --> 00:00:27,209
    (デルタ)アオーン!
    
    4
    00:00:40,256 --> 00:00:43,092
    (ベータ)先日のブシン祭の一件以来➡
    
    ...
    [00:00.50]ハァハァ…。
    [00:03.50]
    [00:03.50](デルタ)アオーン!
    [00:07.80]
    [00:08.80](ベータ)先日のブシン祭の一件以来➡
    
    ...
    
  • SRT file with styling (陰の実力者になりたくて! Season 1, Episode 1)

    1
    00:00:07,435 --> 00:00:12,482
    <font color="japanese">(目覚まし時計のアラーム)</font>
    
    2
    00:00:12,565 --> 00:00:15,110
    <font color="japanese">(アカネ)んっ んん…</font>
    
    3
    00:00:19,197 --> 00:00:20,323
    <font color="japanese">(アラームを止める音)</font>
    
    4
    00:00:20,407 --> 00:00:22,158
    <font color="japanese">(アカネ)ああ…</font>
    
    5
    00:00:23,326 --> 00:00:25,662
    <font color="japanese">そのまま寝ちゃったのか</font>
    
    ...
    [00:00.50](アカネ)んっ んん…
    [00:03.04]
    [00:04.04](アカネ)ああ…
    [00:05.79]
    [00:06.79]そのまま寝ちゃったのか
    
    ...
    
  • ASS file with styling (からかい上手の高木さん Season 2, Episode 1):

    [Script Info]
    Title: [Erai-raws] Teasing Master Takagi-San S02E01 sdh-jpn
    ScriptType: v4.00+
    WrapStyle: 0
    PlayResX: 1280
    PlayResY: 720
    Video Zoom Percent: 1
    Scroll Position: 0
    Active Line: 0
    ScaledBorderAndShadow: yes
    
    [V4+ Styles]
    Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
    Style: Default,Open Sans Semibold,45,&H00FFFFFF,&H000000FF,&H00020713,&H00000000,-1,0,0,0,100,100,0,0,1,1.7,0,2,10,10,15,1
    
    [Events]
    Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
    
    Dialogue: 0,0:00:01.63,0:00:03.22,Default,,0,0,0,,(学校のチャイム)\N(西片(にしかた))うん? うん?
    Dialogue: 0,0:00:03.34,0:00:05.68,Default,,0,0,0,,(西片)うん? う〜ん?
    Dialogue: 0,0:00:06.64,0:00:08.85,Default,,0,0,0,,いやいや まさか…
    Dialogue: 0,0:00:09.52,0:00:11.14,Default,,0,0,0,,(高木)ねえ 西片\N(西片)ウッ…
    Dialogue: 0,0:00:12.81,0:00:14.10,Default,,0,0,0,,(高木)何してんの?
    Dialogue: 0,0:00:14.27,0:00:18.57,Default,,0,0,0,,な… なに? 高木さん 別に何も
    
    ...
    
    [00:00.50](学校のチャイム) (西片(にしかた))うん? うん?
    [00:02.09]
    [00:02.21](西片)うん? う〜ん?
    [00:04.55]
    [00:05.51]いやいや まさか…
    [00:07.72]
    [00:08.39](高木)ねえ 西片 (西片)ウッ…
    [00:10.01]
    [00:11.01](高木)何してんの?
    [00:12.30]
    [00:12.47]な… なに? 高木さん 別に何も
    
    ...
    

To confirm the LRC output was as expected, and that SRT -> SRT didn't cause any issues. These were all video + external subtitle file cases.

I also noticed that lines containing only symbols like these from 葬送のフリーレン Season 1, Episode 1, seem to get removed now:

1
00:00:00,522 --> 00:00:05,110
{\an8}♪~

2
00:01:25,648 --> 00:01:29,903
{\an8}~♪

3
00:01:50,006 --> 00:01:52,926
(馬車の進行音)

4
00:01:59,891 --> 00:02:01,059
(ヒンメル)フリーレン

5
00:02:08,233 --> 00:02:10,443
{\an8}(フリーレン)
王都が見えてきたね
[00:00.50](ヒンメル)フリーレン
[00:01.66]
[00:02.66](フリーレン) 王都が見えてきたね
[00:04.87]

...

Whereas before the SRT file was just directly used so it produced:

[00:00.50]{\an8}♪~
[00:05.08]
[00:06.08]{\an8}~♪
[00:10.34]
[00:11.34](ヒンメル)フリーレン
[00:12.51]
[00:13.51]{\an8}(フリーレン) 王都が見えてきたね

...

Even though I didn't modify the filtered_characters entry in config.json from this, which should have already been already been filtering out those lines (but wasn't, potentially because of the {\an8} formatting on the same line?):

"filtered_characters": "\u2669\u266a\u266b\u266c\uff5e\u301c",

Though ones only containing the {\an8} types of content that also have regular text are preserved (with the regular text only, not the {\an8}).

ASS files, or even SRT files, may contain styles / markup that are not desirable to exist as part
of the final output when exporting subtitles (especially LRC files).

This updates the ffmpeg command when converting subtitles to use a plain text format that
removes any sort of styling. It results in SRT files also always being "converted" as a result,
as they may have styling that needs removed as well.

It may not handle cases where `convert_sub_if_needed` is not called, such as through the
`condense_multi` -> `extract_srt` code path that seems to skip calling this function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant