LRC files generated from ASS subtitles contain font / style information in the lyrics

When I update config.json like so:

```json
{
 ...,
 "output_condensed_subtitles": true,
 "condensed_subtitles_format": "lrc"
}
```

And then run `condenser.exe` on a folder that contains MP4 videos + ASS subtitles, I get lines that contain font + italic / bold style information instead of only plain text "lyrics".

An example from からかい上手の高木さん Season 2, Episode 1 in ASS form:

```ass
[Script Info]
Title: [Erai-raws] Teasing Master Takagi-San S02E01 sdh-jpn
ScriptType: v4.00+
WrapStyle: 0
PlayResX: 1280
PlayResY: 720
Video Zoom Percent: 1
Scroll Position: 0
Active Line: 0
ScaledBorderAndShadow: yes

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Open Sans Semibold,45,&H00FFFFFF,&H000000FF,&H00020713,&H00000000,-1,0,0,0,100,100,0,0,1,1.7,0,2,10,10,15,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text

Dialogue: 0,0:00:01.63,0:00:03.22,Default,,0,0,0,,（学校のチャイム）\N（西片(にしかた)）うん？　うん？
Dialogue: 0,0:00:03.34,0:00:05.68,Default,,0,0,0,,（西片）うん？　う〜ん？
Dialogue: 0,0:00:06.64,0:00:08.85,Default,,0,0,0,,いやいや　まさか…
...
```

And then the generated LRC file:

```lrc
[00:00.50]（学校のチャイム） （西片(にしかた)）うん？　うん？
[00:02.09]
[00:02.21]（西片）うん？　う〜ん？
[00:04.55]
[00:05.51]いやいや　まさか…
[00:07.72]
...
```

In music players like Musicolet on Android, this extra style information just gets displayed as plain text which makes the lyrics difficult to read. It does seem to get interpreted by some players, like mpv, like so:

![Image](https://github.com/user-attachments/assets/8e49a92d-cee6-4603-a399-f3ae9aef3b7c)

However, I think it would be better to output plain text for better LRC compatibility as the core format described here does not mention any styling support:

https://en.wikipedia.org/wiki/LRC_(file_format)#Core_format

Like how からかい上手の高木さん Season 1, Episode 1 from a typical SRT file outputs:

```lrc
[00:00.50]（西片(にしかた)）フッフッフ…
[00:02.16]
[00:02.25]（田辺(たなべ)）アイアムハナコ
[00:03.33]
[00:04.29]ユーアーマイフレンド
[00:05.58]
[00:06.58]（西片）フッフッフ…
...
```

For the best area of code to update the behavior, I'm not sure if it should apply to all x -> SRT file conversions happening here:

https://github.com/ercanserteli/condenser/blob/f214d999e941a5392cc2ec02d5deb024ab52def5/condenser.py#L304

Or specifically when creating the LRC file:

https://github.com/ercanserteli/condenser/blob/f214d999e941a5392cc2ec02d5deb024ab52def5/condenser.py#L356

Or as an additional step across all subtitle output (strip style -> `condense_subtitles` -> output in whatever format is needed) in this block:

https://github.com/ercanserteli/condenser/blob/f214d999e941a5392cc2ec02d5deb024ab52def5/condenser.py#L323-L328

SRT files aren't supposed to support style information, but running `ffmpeg.exe -i input.ass output.srt`, per the code, seems to inject / maintain that info from an ASS file and players like mpv do interpret is for display.

It seems it's a somewhat unofficial thing that depends on the application for if it gets respected:

https://en.wikipedia.org/wiki/SubRip#Markup

But to most easily solve the conversion case at least, instead of:

`ffmpeg.exe -i input.ass output.srt`

```srt
1
00:00:01,630 --> 00:00:03,220
（学校のチャイム）
（西片(にしかた)）うん？　うん？

2
00:00:03,340 --> 00:00:05,680
（西片）うん？　う〜ん？

3
00:00:06,640 --> 00:00:08,850
いやいや　まさか…

...
```

in `convert_sub_if_needed`, it can be done with:

`ffmpeg.exe -i input.ass -c:s text output.srt`

```srt
1
00:00:01,630 --> 00:00:03,220
（学校のチャイム）
（西片(にしかた)）うん？　うん？

2
00:00:03,340 --> 00:00:05,680
（西片）うん？　う〜ん？

3
00:00:06,640 --> 00:00:08,850
いやいや　まさか…

...
```

Perhaps regardless of input format, ffmpeg should be run in that way inside of `convert_sub_if_needed` so that it can strip style info from SRT files that contain it as well (instead of only stripping it when converting from non-SRT to SRT)?

	if output_condensed_subtitles:
	condensed_srt_path = op.splitext(output_filename)[0] + ".srt"
	condense_subtitles(periods, srt_path, condensed_srt_path)
	if condensed_subtitles_format == "lrc":
	srt_file_to_lrc(condensed_srt_path)
	os.remove(condensed_srt_path)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LRC files generated from ASS subtitles contain font / style information in the lyrics #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

LRC files generated from ASS subtitles contain font / style information in the lyrics #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions