From efa63f7af0c8f4d726fbcfbbf7fb97f47f8b29eb Mon Sep 17 00:00:00 2001
From: James Craig /**/).
In this example, an optional WebVTT attributes object is used to define the source language and its label in a subtitle/caption selection menu.
++WEBVTT + +ATTRIBUTES +kind: subtitles +srclang: es-mx +label: Español + +NOTE +Standard subtitles (unlike CC or SDH captions) typically +translate spoken dialog or signage, but not audible sounds +effects like "dogs barking." + +1 +00:00:10.123 --> 00:00:15.432 +¡Hola! ¿Qué tál? ++ +
In this example, an optional WebVTT attributes object is used to differentiate captions from standard subtitles.
++WEBVTT + +ATTRIBUTES +kind: captions +srclang: es-mx +label: Español (SDH) + +NOTE +Captions (SDH aka Subtitles for the Deaf and Hard-of-Hearing) +typically include spoken dialog as well as important audible +sounds such as "floor boards creak", "dogs barking", or in +this case, "music". + +1 +00:00:10.123 --> 00:00:15.432 +¡Hola! ¿Qué tál? + +2 +00:00:47.462 --> 00:01:04.028 +[♫ música ♫] ++ +
This section is non-normative.
@@ -658,6 +714,32 @@ CSS comment (e.g./**/).
+
+
+In this example, a WebVTT attributes object is used to indicate the text track cues represent video descriptions for the blind. Unlike subtitles or captions, these are not intended to be rendered visually.
++WEBVTT + +ATTRIBUTES +kind: descriptions +srclang: en-us +label: English (AD) + +NOTE +VTT-based descriptions are meant to render as text-to-speech audio or braille, +for blind or deafblind audiences, not usually as visual captions on screen. +As such, the option/label might be displayed in an audio menu or elsewhere. + +1 +00:00:10.123 --> 00:00:15.432 +A young girl tiptoes down a dark hallway. ++ +
This section is non-normative.
@@ -671,11 +753,14 @@ signifies the end of the WebVTT cue.In this example, a talk is split into each slide being a chapter.
+In this example, topics mentioned in a talk are provided as URLs for reference.
WEBVTT + ATTRIBUTES + kind: metadata + NOTE Thanks to http://output.jsbin.com/mugibo @@ -704,6 +789,28 @@ signifies the end of the WebVTT cue.
In this example, a sequence of video thumbnails and their text alternative are made available for the playback UI.
+
+WEBVTT
+
+ATTRIBUTES
+kind: metadata
+type: video-thumbnails
+
+00:00:01.959 --> 00:00:02.938
+{
+ "src": "https://cdn.example.com/thumbnails.jpg#xywh=0,0,284,160",
+ "alt": {
+ "en-us": "Miguel crosses the marigold bridge to the land of the dead.",
+ "es-mx": "Miguel cruza el puente marigold hacia la tierra de los muertos."
+ }
+}
+
+
+When interpreted as a number, a WebVTT percentage must be in the range 0..100.
+A WebVTT attributes object consists of the following components, in the given order:
+ATTRIBUTES".:").-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).A WebVTT comment block consists of the following components, in the given order:
WebVTT metadata text cues are only useful for scripted applications (e.g. using the +
WebVTT metadata text cues were originally intended for scripted applications (e.g. using the
metadata text track kind in a HTML text track).
:").-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).text/vtt. [[!RFC3629]]
[a-zA-Z_][0-9a-zA-Z_]*).:").-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).text/vtt. [[!RFC3629]]
When interpreted as a number, a WebVTT percentage must be in the range 0..100.
-A WebVTT attributes object consists of the following components, in the given order:
+A WebVTT attributes block consists of the following components, in the given order:
ATTRIBUTES".[a-zA-Z_][0-9a-zA-Z_]*).[A-Za-z_][0-9A_Za-z_]*:
+ :").-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).Process the WebVTT attributes block key/value pairs according to the WebVTT attributes key/value parsing rules.
+A WebVTT comment block consists of the following components, in the given order:
@@ -4266,6 +4287,47 @@ follows:The WebVTT attributes key/value parsing rules consist of the following algorithm.
+ +kind"srclang"label"type" (TODO: For clarity, should this be "subkind" or "kind_subtype" instead?)The WebVTT type attribute parsing rules consist of the following algorithm.
+ +[A-Za-z_][0-9A_Za-z_]*:
- [A-Za-z_][0-9A_Za-z_]*)
+ :").label"label"This section describes in some detail how to visually render WebVTT caption or
From d57f19c0facce4e16c0adf7f8eea43926dbfbfc6 Mon Sep 17 00:00:00 2001
From: James Craig In this example, a WebVTT attributes object is used to indicate the text track cues represent video descriptions for the blind. Unlike subtitles or captions, these are not intended to be rendered visually. In this example, a WebVTT attributes object is used to indicate the text track cues represent descriptions for the blind. Unlike subtitles or captions, these are not intended to be rendered visually. In this example, a WebVTT attributes object is used to indicate the text track cues represent descriptions for the blind. Unlike subtitles or captions, these are not intended to be rendered visually. In this example, a WebVTT attributes object is used to indicate the text track cues represent audible or braille descriptions for the blind. Unlike subtitles or captions, these are not intended to be rendered visually. The Timed Text Working Group is discussing a registry for metadata The WebVTT attributes key/value parsing rules consist of the following algorithm. This section is non-normative. WebVTT supports an Attributes block to provide additional information about the rendered text track, and to allow disambiguation of metadata tracks. In this example, an optional WebVTT attributes object is used to define the source language and its label in a subtitle/caption selection menu. In this example, an optional WebVTT attributes object is used to differentiate captions from standard subtitles. In this example, a WebVTT attributes object is used to indicate the text track cues represent audible or braille descriptions for the blind. Unlike subtitles or captions, these are not intended to be rendered visually. This section is non-normative. WebVTT also supports some less-often used features. In this example, the cues have an identifier: In this example, an optional WebVTT attributes object is used to define the source language and its label in a subtitle/caption selection menu. In this example, an optional WebVTT attributes object is used to differentiate captions from standard subtitles. This section is non-normative. In this example, a WebVTT attributes object is used to indicate the text track cues represent audible or braille descriptions for the blind. Unlike subtitles or captions, these are not intended to be rendered visually. This section is non-normative. The Timed Text Working Group is discussing a registry for metadata The Timed Text Working Group is discussing a registry for metadata All diagrams, examples, and notes in this specification are non-normative, as are all sections
@@ -1800,7 +1807,7 @@ SIGN). Process the WebVTT attributes block key/value pairs according to the WebVTT attributes key/value parsing rules. Process the WebVTT attributes block key/value pairs according to the WebVTT rules for parsing attribute key/value pairs. A WebVTT comment block consists of the following components, in the given order:
WEBVTT
From 71a249c23482dd0ed8e44cacc2c6b79e90efe4bd Mon Sep 17 00:00:00 2001
From: James Craig
WEBVTT
ATTRIBUTES
kind: descriptions
-srclang: en-us
+lang: en-us
label: English (AD)
NOTE
@@ -798,11 +798,6 @@ WEBVTT
ATTRIBUTES
kind: metadata
-NOTE
-The Timed Text Working Group is discussing a registry for metadata `type`
-values, such as `type: video-thumbnails` or `type: video-flash-avoidance`.
-See webvtt issues #511 and #512 for more info.
-
00:00:01.959 --> 00:00:02.938
{
"src": "https://cdn.example.com/thumbnails.jpg#xywh=0,0,284,160",
@@ -812,6 +807,11 @@ See webvtt issues #511 and #512 for more info.
}
}
+
+ type
+values, such as type: video-thumbnails or type: video-flash-avoidance.
+See WebVTT issues #511 and #512 for more info.kind"srclang"lang"label"WebVTT Attributes key/value Parsing Rules
@@ -4291,13 +4292,13 @@ follows:
How the attribute is processed depends on its key name, as follows:
-
kind"kind" (case-insensitive)lang"lang" (case-insensitive)label"label" (case-insensitive)/**/).
Attributes Block
+
+
+WEBVTT
+
+ATTRIBUTES
+kind: subtitles
+lang: es-mx
+label: Español
+
+NOTE
+Standard subtitles (unlike CC or SDH captions) typically
+translate spoken dialog or signage, but not audible sound
+effects like "dogs barking."
+
+1
+00:00:10.123 --> 00:00:15.432
+¡Hola! ¿Qué tál?
+
+
+
+WEBVTT
+
+ATTRIBUTES
+kind: captions
+lang: es-mx
+label: Español (SDH)
+
+NOTE
+Captions (SDH aka Subtitles for the Deaf and Hard-of-Hearing)
+typically include spoken dialog as well as important audible
+sounds such as "floor boards creak", "dogs barking", or in
+this case, "music".
+
+1
+00:00:10.123 --> 00:00:15.432
+¡Hola! ¿Qué tál?
+
+2
+00:00:47.462 --> 00:01:04.028
+[♫ música ♫]
+
+
+
+WEBVTT
+
+ATTRIBUTES
+kind: descriptions
+lang: en-us
+label: English (AD)
+
+NOTE
+VTT-based descriptions are meant to render as text-to-speech audio or braille,
+for blind or deafblind audiences, not usually as visual captions on screen.
+As such, the option/label might be displayed in an audio menu or elsewhere.
+
+1
+00:00:10.123 --> 00:00:15.432
+A young girl tiptoes down a dark hallway.
+
+
+Other caption and subtitling features
/**/).
-
-WEBVTT
-
-ATTRIBUTES
-kind: subtitles
-lang: es-mx
-label: Español
-
-NOTE
-Standard subtitles (unlike CC or SDH captions) typically
-translate spoken dialog or signage, but not audible sound
-effects like "dogs barking."
-
-1
-00:00:10.123 --> 00:00:15.432
-¡Hola! ¿Qué tál?
-
-
-
-WEBVTT
-
-ATTRIBUTES
-kind: captions
-lang: es-mx
-label: Español (SDH)
-
-NOTE
-Captions (SDH aka Subtitles for the Deaf and Hard-of-Hearing)
-typically include spoken dialog as well as important audible
-sounds such as "floor boards creak", "dogs barking", or in
-this case, "music".
-
-1
-00:00:10.123 --> 00:00:15.432
-¡Hola! ¿Qué tál?
-
-2
-00:00:47.462 --> 00:01:04.028
-[♫ música ♫]
-
-
-Comments in WebVTT
-WEBVTT
-
-ATTRIBUTES
-kind: descriptions
-lang: en-us
-label: English (AD)
-
-NOTE
-VTT-based descriptions are meant to render as text-to-speech audio or braille,
-for blind or deafblind audiences, not usually as visual captions on screen.
-As such, the option/label might be displayed in an audio menu or elsewhere.
-
-1
-00:00:10.123 --> 00:00:15.432
-A young girl tiptoes down a dark hallway.
-
-
-Metadata example
type
+type
values, such as type: video-thumbnails or type: video-flash-avoidance.
See WebVTT issues #511 and #512 for more info.Conformance
The WebVTT attributes key/value parsing rules consist of the following algorithm.
+The WebVTT rules for parsing attribute key/value pairs consist of the following algorithm.
kind" (case-insensitive)kind" (ASCII case-insensitive)lang" (case-insensitive)lang" (ASCII case-insensitive)label" (case-insensitive)label" (ASCII case-insensitive)These keys are case-insensitive to allow compatibility with large video distributors already using this pattern in production.
+ +This section describes in some detail how to visually render WebVTT caption or
From d0e4581e273225b7361c2b4f548c2d971314500e Mon Sep 17 00:00:00 2001
From: James Craig
kind" (ASCII case-insensitive)kind" (ASCII case-insensitive)lang" (ASCII case-insensitive)lang" (ASCII case-insensitive)label" (ASCII case-insensitive)label" (ASCII case-insensitive)A WebVTT attributes block consists of the following components, in the given order:
ATTRIBUTES".A WebVTT attributes body block consists of the following components, in the given order:
+[A-Za-z_][0-9A_Za-z_]*)
[A-Za-z_][0-9A_Za-z_]*)
- :").-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).:").-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).Process the WebVTT attributes block key/value pairs according to the WebVTT rules for parsing attribute key/value pairs.
+Process the WebVTT attributes body block key/value pairs according to the WebVTT rules for parsing attribute key/value pairs.
A WebVTT comment block consists of the following components, in the given order:
From b333603f218b264dad5c879b6f29f1f57853828d Mon Sep 17 00:00:00 2001 From: James Craig:").-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).WebVTT also supports some less-often used features.
-In this example, the cues have an identifier:
From 6a79f258ee5bcea064cdf37cc0dffc12226dd5bf Mon Sep 17 00:00:00 2001 From: James CraigThis section is non-normative.
From 7d54121805026f2a33f935558872837a5e02f087 Mon Sep 17 00:00:00 2001 From: James CraigThis section is non-normative.
From ae5ad1c84192fd0ec54b5dc2b8eae6d8edd314d1 Mon Sep 17 00:00:00 2001 From: James CraigA WebVTT attributes object represents the optional file-level metadata declared in a +WebVTT attributes block. It consists of:
+ +A string giving the text track kind. If present, must be one of "subtitles",
+ "captions", "descriptions", "chapters", or
+ "metadata". Defaults to the empty string.
The kind key is the only required key. Consumers that do not
+ recognise the kind value should treat the entire WebVTT attributes object
+ as opaque.
A string further differentiating the subtype within a kind (for example,
+ distinguishing varieties of metadata tracks). If present, must be either
+ "custom" or a string beginning with "custom-". All other values
+ are reserved for future standardization. Defaults to the empty string.
The type key disambiguates the track kind subtype to resolve
+ naming conflicts for the other common key names often used by different types of metadata.
Authors including
+ custom pairs should provide a non-empty
+ type value to identify the application or schema those pairs belong to.
+ A WebVTT attributes block with non-empty
+ custom pairs and an empty
+ type is valid but parsers may generate a warning.
A string giving the BCP 47 language tag of the track content. Defaults to the empty + string.
+A human-readable string intended for use in a track selection menu. Defaults to the empty + string.
+An ordered list of key/value string pairs for any unrecognized attribute keys. Defaults to + the empty list.
+Custom pairs should be accompanied by a non-empty
+ type value so that consumers can identify the
+ schema to which the pairs belong. If custom pairs are present and type is the
+ empty string, parsing continues normally, but parsers may generate a warning.
The WebVTT attributes object's properties are intended to be used by
+the embedding context (e.g. HTML) to populate the corresponding internal text track concepts.
+How format-provided values interact with values specified in the embedding context (e.g.
+<track> element attributes) is defined by the embedding specification. See
+whatwg/html issue #11665 for the
+ongoing HTML integration work.
A WebVTT chapter cue is a WebVTT cue whose cue text is interpreted as a @@ -1769,49 +1833,61 @@ SIGN).
A WebVTT attributes block consists of the following components, in the given order:
ATTRIBUTES".ATTRIBUTES" (U+0041, U+0054, U+0054, U+0052, U+0049, U+0042, U+0055,
+ U+0054, U+0045, U+0053).A WebVTT attributes body block consists of the following components, in the given order:
+The WebVTT attributes block is terminated by a blank line (two consecutive +WebVTT line terminators), exactly as for +WebVTT region definition blocks.
+ +The kind key is the only required key in a
+WebVTT attributes block. It must appear in the block to disambiguate the track kind.
+Without it, consumers cannot determine whether other well-known keys such as
+language and label apply to a recognized track kind, and may treat
+them as opaque. See WebVTT rules for parsing
+attribute key/value pairs.
A WebVTT attribute key/value pair consists of the following components, +in the given order:
[A-Za-z_][0-9A_Za-z_]*)
- :").-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN)._) characters, where the first character is either an
+ ASCII alpha character or U+005F LOW
+ LINE (_). In other words, matching the production
+ [A-Za-z_][0-9A-Za-z_]*.:").
 and 
+ respectively.‎).-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).Process the WebVTT attributes body block key/value pairs according to the WebVTT rules for parsing attribute key/value pairs.
+ +Keys are restricted to ASCII to ensure consistent case-folding and to avoid +ambiguity in key matching. Values may contain any Unicode characters (subject to the constraints +above) to support multilingual labels, language tags, and other internationalized content.
+ +The numeric character reference escape convention above is the same as that used
+in WebVTT for bidi mark characters in cue payloads (e.g."⁨").
A WebVTT comment block consists of the following components, in the given order:
@@ -2614,11 +2690,13 @@ chapters, or metadata. Most of the steps will be skipped for chapters or metadatA WebVTT parser, given an input byte stream, a text track list of cues -|output|, and a collection of CSS style sheets |stylesheets|, must decode the byte -stream using the UTF-8 decode algorithm, and then must parse the resulting -string according to the WebVTT parser algorithm below. This results in WebVTT cues -being added to |output|, and CSS style sheets being added to |stylesheets|. -[[!RFC3629]]
+|output|, a collection of CSS style sheets |stylesheets|, and optionally a +slot |attributes| for a WebVTT attributes object, must decode the byte stream using the +UTF-8 decode algorithm, and then must parse the resulting string +according to the WebVTT parser algorithm below. This results in WebVTT cues +being added to |output|, CSS style sheets being added to |stylesheets|, and +if a WebVTT attributes block is present, a WebVTT attributes object being set in +|attributes|. [[!RFC3629]]A WebVTT parser, specifically its conversion and parsing steps, is typically run asynchronously, with the input byte stream being updated incrementally as the resource is @@ -2713,6 +2791,9 @@ stream lacks this WebVTT file signature, then the parser aborts.
Otherwise, if |block| is a WebVTT region object, add |block| to |regions|.
Otherwise, if |block| is a WebVTT attributes object, and |attributes| has been + provided to this invocation of the WebVTT parser, let |attributes| be |block|.
collect a sequence of code points that are U+000A LINE FEED (LF) @@ -2753,6 +2834,8 @@ header set, the user agent must run the following steps:
Let |region| be null.
Let |attributes| be null.
Loop: Run these substeps in a loop:
@@ -2934,7 +3017,40 @@ header set, the user agent must run the following steps:Otherwise, if |seen cue| is false and |buffer| starts with the substring
+ "ATTRIBUTES" (U+0041, U+0054, U+0054, U+0052, U+0049, U+0042, U+0055, U+0054,
+ U+0045, U+0053), and the remaining characters in |buffer| (if any) are all ASCII
+ whitespace, then run these substeps:
Attributes creation: Let |attributes| be a new WebVTT attributes + object.
Let |attributes|'s kind be the empty + string.
Let |attributes|'s type be the empty + string.
Let |attributes|'s language be the + empty string.
Let |attributes|'s label be the empty + string.
Let |attributes|'s custom pairs + be an empty list.
Let |buffer| be the empty string.
Otherwise, if |attributes| is not null, then collect WebVTT attribute settings from + |buffer| using |attributes| for the results, and return |attributes|.
Otherwise, return null.
When the WebVTT parser algorithm says to collect WebVTT attribute settings +from a string |input| for a WebVTT attributes object |attributes|, the user agent must +run the following steps:
+ +Let |lines| be the result of splitting |input| on U+000A LINE FEED (LF) characters.
For each string |line| in |lines|, run the following substeps:
+If |line| does not contain a U+003A COLON character (:), then jump to
+ the step labeled next line.
Let |name| be the leading substring of |line| up to and excluding the first U+003A + COLON character.
If |name| is not a valid WebVTT attribute key, then jump to the step labeled + next line.
Let |value| be the trailing substring of |line| starting from the character + immediately after the first U+003A COLON character.
If |value| is not empty and its first character is a U+0020 SPACE or U+0009 + CHARACTER TABULATION (tab) character, remove that first character from |value|.
Let |value| be the result of + parsing + character references in |value|, with no additional allowed character.
Run the WebVTT rules for parsing attribute key/value pairs for |name| and + |value| against |attributes|.
Next line: Continue.
If |attributes|'s custom pairs is not + empty and |attributes|'s type is the empty string, + then, parsers may generate a warning.
+When the algorithm above says to collect WebVTT cue timings and settings from a string @@ -4294,32 +4463,81 @@ follows:
-The WebVTT rules for parsing attribute key/value pairs consist of the following algorithm.
+The WebVTT rules for parsing attribute key/value pairs for a |name|/|value| pair +against a WebVTT attributes object |attributes| are as follows:
-kind" (ASCII case-insensitive)kind"Let |normalized| be the result of + converting |value| to ASCII + lowercase.
+If |normalized| is one of "subtitles", "captions",
+ "descriptions", "chapters", or "metadata", set
+ |attributes|'s kind to |normalized|.
Otherwise, ignore the pair.
+type"Let |normalized| be the result of + converting |value| to ASCII + lowercase.
+If |normalized| is "custom" or starts with the prefix
+ "custom-" (U+0063, U+0075, U+0073, U+0074, U+006F, U+006D, U+002D), set
+ |attributes|'s type to |normalized|.
Otherwise, if |normalized| is not the empty string, ignore the pair. All non-custom + type values are reserved for future standardization.
+lang" (ASCII case-insensitive)language"label" (ASCII case-insensitive)label"These keys are case-insensitive to allow compatibility with large video distributors already using this pattern in production.
+The kind key is the only required key in a
+WebVTT attributes block. It disambiguates the track kind and guards against naming
+conflicts: consumers that do not recognise a given kind value should treat the
+entire WebVTT attributes object as opaque. The type key further
+differentiates subtypes within a kind (for example, distinguishing varieties of
+metadata tracks). All non-custom type values are reserved for future
+standardization; authors needing custom subtypes must use "custom" or a value
+beginning with "custom-".
The kind, type, language, and
+label keys are matched
+ASCII case-insensitively
+to allow compatibility with implementations already using this pattern in production.
+Unrecognized keys are preserved in the
+custom pairs list for use by consuming
+applications.
The WebVTT attributes object's properties are consumed by the embedding
+context. How kind,
+language, and
+label relate to the corresponding attributes of an
+HTML <track> element is defined by the HTML specification. See
+whatwg/html issue #11665.
The kind key is the only required key in a
WebVTT attributes block. It must appear in the block to disambiguate the track kind.
-Without it, consumers cannot determine whether other well-known keys such as
+Without it, consumers cannot determine whether other common key names such as
language and label apply to a recognized track kind, and may treat
them as opaque. See WebVTT rules for parsing
attribute key/value pairs.
Otherwise, if |attributes| is not null, then collect WebVTT attribute settings from +
Otherwise, if |attributes| is not null, then collect WebVTT attributes from |buffer| using |attributes| for the results, and return |attributes|.
When the WebVTT parser algorithm says to collect WebVTT attribute settings +
When the WebVTT parser algorithm says to collect WebVTT attributes from a string |input| for a WebVTT attributes object |attributes|, the user agent must run the following steps:
-Let |lines| be the result of splitting |input| on U+000A LINE FEED (LF) characters.
/**/).
-
-In this example, an optional WebVTT attributes object is used to define the source language and its label in a subtitle/caption selection menu.
--WEBVTT - -ATTRIBUTES -kind: subtitles -lang: es-mx -label: Español - -NOTE -Standard subtitles (unlike CC or SDH captions) typically -translate spoken dialog or signage, but not audible sound -effects like "dogs barking." - -1 -00:00:10.123 --> 00:00:15.432 -¡Hola! ¿Qué tál? -- -
In this example, an optional WebVTT attributes object is used to differentiate captions from standard subtitles.