Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions examples/tutorials/ctc_forced_alignment_api_tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
import torchaudio.functional as F

######################################################################
# First we prepare the speech data and the transcript we area going
# First we prepare the speech data and the transcript we are going
# to use.
#

Expand All @@ -71,15 +71,15 @@
# ~~~~~~~~~~~~~~~~~~~~
#
# :py:func:`~torchaudio.functional.forced_align` takes emission and
# token sequences and outputs timestaps of the tokens and their scores.
# token sequences and outputs timestamps of the tokens and their scores.
#
# Emission reperesents the frame-wise probability distribution over
# Emission represents the frame-wise probability distribution over
# tokens, and it can be obtained by passing waveform to an acoustic
# model.
#
# Tokens are numerical expression of transcripts. There are many ways to
# tokenize transcripts, but here, we simply map alphabets into integer,
# which is how labels were constructed when the acoustice model we are
# which is how labels were constructed when the acoustic model we are
# going to use was trained.
#
# We will use a pre-trained Wav2Vec2 model,
Expand Down Expand Up @@ -161,7 +161,7 @@ def align(emission, tokens):
#
# .. note::
#
# The alignment is expressed in the frame cordinate of the emission,
# The alignment is expressed in the frame coordinate of the emission,
# which is different from the original waveform.
#
# It contains blank tokens and repeated tokens. The following is the
Expand All @@ -184,7 +184,7 @@ def align(emission, tokens):
#
# .. note::
#
# When same token occured after blank tokens, it is not treated as
# When same token occurred after blank tokens, it is not treated as
# a repeat, but as a new occurrence.
#
# .. code-block::
Expand All @@ -200,7 +200,7 @@ def align(emission, tokens):
# Token-level alignments
# ~~~~~~~~~~~~~~~~~~~~~~
#
# Next step is to resolve the repetation, so that each alignment does
# Next step is to resolve the repetition, so that each alignment does
# not depend on previous alignments.
# :py:func:`torchaudio.functional.merge_tokens` computes the
# :py:class:`~torchaudio.functional.TokenSpan` object, which represents
Expand Down Expand Up @@ -352,7 +352,7 @@ def plot_alignments(waveform, token_spans, emission, transcript, sample_rate=bun
#
# When splitting the token-level alignments into words, you will
# notice that some blank tokens are treated differently, and this makes
# the interpretation of the result somehwat ambigious.
# the interpretation of the result somewhat ambiguous.
#
# This is easy to see when we plot the scores. The following figure
# shows word regions and non-word regions, with the frame-level scores
Expand Down Expand Up @@ -387,7 +387,7 @@ def plot_scores(word_spans, scores):
#
# One reason for this is because the model was trained without a
# label for the word boundary. The blank tokens are treated not just
# as repeatation but also as silence between words.
# as repetition but also as silence between words.
#
# But then, a question arises. Should frames immediately after or
# near the end of a word be silent or repeat?
Expand All @@ -400,12 +400,12 @@ def plot_scores(word_spans, scores):
#
# Unfortunately, CTC does not provide a comprehensive solution to this.
# Models trained with CTC are known to exhibit "peaky" response,
# that is, they tend to spike for an aoccurance of a label, but the
# that is, they tend to spike for an occurrence of a label, but the
# spike does not last for the duration of the label.
# (Note: Pre-trained Wav2Vec2 models tend to spike at the beginning of
# label occurances, but this not always the case.)
# label occurrences, but this not always the case.)
#
# :cite:`zeyer2021does` has in-depth alanysis on the peaky behavior of
# :cite:`zeyer2021does` has in-depth analysis on the peaky behavior of
# CTC.
# We encourage those who are interested understanding more to refer
# to the paper.
Expand Down
Loading