From b49ff6d65db537866a09d574c3ccf2140dd33ccd Mon Sep 17 00:00:00 2001 From: James Craig Date: Thu, 28 May 2026 14:06:01 -0700 Subject: [PATCH 1/3] Add WebVTT header metadata to disambiguate metadata cues (#511) Replaces the ATTRIBUTES-block approach explored in #523 with the TTWG-consensus design: file-level key/value pairs are written on the lines immediately following the WEBVTT file header line and are terminated by a blank line, rather than living in a separate block. Key behaviors per the April 23 + May 7 TTWG meeting consensus and follow-up review comments: - Reserves four header keys: lang, kind, label, type (all optional; lang aligns with HTML ; type addresses #511 by identifying metadata schemas, with the taxonomy work continuing in #512). - Accepts both ":" and "=" as separators; "=" is permitted for HLS-compatible files but is marked non-recommended. - Matches reserved keys case-sensitively; authors must use lowercase. - Allows Unicode in keys and values, excluding bidi controls, line breaks, and the "-->" substring. - Requires non-reserved keys to contain a hyphen and not start with one, reserving the hyphen-free namespace for future standardization. - Parses leniently: invalid lines are ignored individually and do not invalidate the rest of the header; no parser warnings are required. - Adds the parser plumbing (a new |header metadata| slot on the parser signature, a "collect WebVTT header metadata" algorithm, and a new return path from "collect a WebVTT block" when the in-header flag is set). - Updates the WebVTT metadata text prose to recommend declaring kind and type for files delivered outside an HTML context, addressing the cue-format ambiguity from #511. Closes #511. Defers HTML-integration details (precedence between VTT header metadata and attributes, processing model) to whatwg/html#11665. --- index.bs | 358 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 346 insertions(+), 12 deletions(-) diff --git a/index.bs b/index.bs index 1336e57..9fc7828 100755 --- a/index.bs +++ b/index.bs @@ -366,6 +366,63 @@ CSS comment (e.g. /**/).

+

WebVTT header metadata

+ +

This section is non-normative.

+ +

A WebVTT file may include file-level metadata as key/value pairs on the lines immediately +following the WEBVTT file header line. This is most useful for disambiguating the +track kind and identifying the metadata schema in use, particularly for files +delivered outside of an HTML context where a containing <track> element +might not be available.

+ +
+ +

In this example, a captions track is identified with a kind, language, and human-readable + label.

+ +
+ WEBVTT
+ kind: captions
+ lang: es-MX
+ label: Español (SDH)
+
+ NOTE
+ Captions (SDH aka Subtitles for the Deaf and Hard-of-Hearing)
+ typically include spoken dialog as well as important audible
+ sounds such as "floor boards creak", "dogs barking", or in
+ this case, "[♫ música ♫]".
+
+ 1
+ 00:00:10.123 --> 00:00:15.432
+ ¡Hola! ¿Qué tál?
+
+ 2
+ 00:00:47.462 --> 00:01:04.028
+ [♫ música ♫]
+ 
+ +
+ +
+ +

In this example, a descriptions track for blind or low-vision audiences is identified. + Descriptions are typically rendered as text-to-speech or braille rather than as on-screen + text.

+ +
+ WEBVTT
+ kind: descriptions
+ lang: en-US
+ label: English (AD)
+
+ 1
+ 00:00:10.123 --> 00:00:15.432
+ A young girl tiptoes down a dark hallway.
+ 
+ +
+

Other caption and subtitling features

This section is non-normative.

@@ -675,10 +732,13 @@ signifies the end of the WebVTT cue.

-

In this example, a talk is split into each slide being a chapter.

+

In this example, topics mentioned in a talk are provided as URLs for reference. The + kind: metadata WebVTT header metadata pair signals to consumers that the + cue payloads should be processed by a script rather than displayed as text.

  WEBVTT
+ kind: metadata
 
  NOTE
  Thanks to http://output.jsbin.com/mugibo
@@ -1455,6 +1515,65 @@ navigation tree.

time-aligned metadata.

+

WebVTT header metadata

+ +

WebVTT header metadata is an optional set of file-level key/value pairs declared at +the start of a WebVTT file, immediately following the WEBVTT file header +line and terminated by a blank line. It is represented as an ordered list of (key, value) string +pairs.

+ +

The keys listed below are reserved by this specification. Their values, when present, give +the named property of the WebVTT header metadata; any pair whose key is not in this list +is an unrecognized header metadata pair and is preserved for use by the consuming +application.

+ +
+ +
lang
+
+

A BCP 47 language tag identifying the + primary language of the cue payloads. Mirrors the srclang attribute of the HTML + <track> element. [[!BCP47]]

+
+ +
kind
+
+

One of "subtitles", "captions", "descriptions", + "chapters", or "metadata", identifying the intended use of the + cues. Mirrors the kind attribute of the HTML + <track> element.

+
+ +
label
+
+

A human-readable label suitable for presentation in a track selection menu. Mirrors the + label attribute of the HTML <track> element.

+
+ +
type
+
+

An identifier for the schema or application that the metadata + cues or unrecognized header metadata pairs + conform to. Disambiguates between different uses of kind: metadata and is the + primary mechanism for addressing the format ambiguity described in + WebVTT issue #511.

+
+ +
+ +

A future revision of this specification, tracked in +WebVTT issue #512, is expected to +register additional type values and to define taxonomies of related keys (for +example, for time-coded flashing-content metadata).

+ +

How WebVTT header metadata values are exposed to, and combined with, the +embedding context is defined by that context. In particular, HTML defines how +lang, kind, and +label interact with the corresponding attributes of an +HTML <track> element; see +whatwg/html issue #11665.

+ +

Syntax

@@ -1475,8 +1594,13 @@ with the MIME type text/vtt. [[!RFC3629]]

followed by any number of characters that are not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) characters. -
  • Two or more WebVTT line terminators to terminate the line - with the file magic and separate it from the rest of the body.
  • +
  • A WebVTT line terminator.
  • + +
  • Zero or more WebVTT header metadata pairs, each + followed by a WebVTT line terminator.
  • + +
  • One or more WebVTT line terminators to separate the file + header from the rest of the body.
  • Zero or more WebVTT region definition blocks, WebVTT style blocks and WebVTT comment @@ -1501,6 +1625,83 @@ with the MIME type text/vtt. [[!RFC3629]]

  • A single U+000D CARRIAGE RETURN (CR) character.
  • +

    A WebVTT header metadata pair consists of the following components, in the given +order:

    + +
      +
    1. A WebVTT header metadata key.
    2. +
    3. Zero or more U+0020 SPACE or U+0009 CHARACTER TABULATION (tab) characters.
    4. +
    5. A WebVTT header metadata separator.
    6. +
    7. Zero or more U+0020 SPACE or U+0009 CHARACTER TABULATION (tab) characters.
    8. +
    9. A WebVTT header metadata value.
    10. +
    + +

    A WebVTT header metadata key is any sequence of one or more Unicode characters with +all of the following properties:

    + +
      +
    • It contains no U+000A LINE FEED (LF), U+000D CARRIAGE RETURN (CR), U+0009 CHARACTER + TABULATION (tab), or U+0020 SPACE characters.
    • +
    • It contains no U+003A COLON (:) or U+003D EQUALS SIGN (=) + characters.
    • +
    • It contains none of the following bidirectional formatting characters: U+061C ARABIC LETTER + MARK, U+200E LEFT-TO-RIGHT MARK, U+200F RIGHT-TO-LEFT MARK, U+202A LEFT-TO-RIGHT EMBEDDING, + U+202B RIGHT-TO-LEFT EMBEDDING, U+202C POP DIRECTIONAL FORMATTING, U+202D LEFT-TO-RIGHT + OVERRIDE, U+202E RIGHT-TO-LEFT OVERRIDE, U+2066 LEFT-TO-RIGHT ISOLATE, U+2067 RIGHT-TO-LEFT + ISOLATE, U+2068 FIRST STRONG ISOLATE, or U+2069 POP DIRECTIONAL ISOLATE.
    • +
    • It does not contain the substring "-->" (U+002D HYPHEN-MINUS, U+002D + HYPHEN-MINUS, U+003E GREATER-THAN SIGN).
    • +
    + +

    A WebVTT header metadata key must additionally satisfy one of the following:

    + +
      +
    • It is one of the reserved key names defined for WebVTT header metadata in + [[#header-metadata]]: "lang", "kind", "label", or + "type".
    • +
    • It contains at least one U+002D HYPHEN-MINUS character, and its first character is not a + U+002D HYPHEN-MINUS character.
    • +
    + +

    The hyphen requirement only applies to keys that are not reserved by this +specification. It reserves the hyphen-free key-name space for future standardization (for +example, by the taxonomies anticipated in +WebVTT issue #512). Reserved keys +themselves (such as "lang") have no hyphen and remain valid.

    + +

    Keys are matched in a case-sensitive manner. Authors must use the +lowercase form of reserved keys.

    + +

    A WebVTT header metadata separator is one of the following:

    + +
      +
    • A U+003A COLON character (":").
    • +
    • A U+003D EQUALS SIGN character ("=").
    • +
    + +

    The U+003A COLON form is the recommended separator. The U+003D EQUALS SIGN form +is permitted for compatibility with files originally authored for delivery via HLS +(HTTP Live Streaming) and other transports that use the equals-sign convention; authoring tools +should prefer the colon form.

    + +

    A WebVTT header metadata value is any sequence of zero or more Unicode characters +with all of the following properties:

    + +
      +
    • It contains no U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) characters.
    • +
    • It contains none of the bidirectional formatting characters listed for + WebVTT header metadata key, above.
    • +
    • It does not contain the substring "-->" (U+002D HYPHEN-MINUS, U+002D + HYPHEN-MINUS, U+003E GREATER-THAN SIGN).
    • +
    • Its first and last characters are not U+0020 SPACE or U+0009 CHARACTER TABULATION (tab) + characters.
    • +
    + +

    A WebVTT header metadata pair whose WebVTT header metadata key is reserved by +this specification must use a WebVTT header metadata value that conforms to the +constraints given for that key in [[#header-metadata]]. A given reserved key must not appear +more than once in a single WebVTT file.

    +

    A WebVTT region definition block consists of the following components, in the given order:

    @@ -1691,8 +1892,14 @@ separated from the next by a WebVTT line terminator. (In other words, any have two consecutive WebVTT line terminators and does not start or end with a WebVTT line terminator.)

    -

    WebVTT metadata text cues are only useful for scripted applications (e.g. using the -metadata text track kind in a HTML text track).

    +

    WebVTT metadata text cues are typically used by scripted applications (e.g. using the +metadata text track kind in a HTML text track). Because the cue +payload is otherwise an opaque string, authors of metadata files delivered outside of an HTML +context should declare a kind of +"metadata" together with a type +identifying the schema in use; see [[#header-metadata]]. Consumers that do not recognise the +type value should treat the cue payloads as opaque and +must not present them as caption or subtitle text.

    WebVTT caption or subtitle cue text

    @@ -2454,11 +2661,14 @@ chapters, or metadata. Most of the steps will be skipped for chapters or metadat

    WebVTT file parsing

    A WebVTT parser, given an input byte stream, a text track list of cues -|output|, and a collection of CSS style sheets |stylesheets|, must decode the byte -stream using the UTF-8 decode algorithm, and then must parse the resulting +|output|, a collection of CSS style sheets |stylesheets|, and optionally a +slot |header metadata| for a WebVTT header metadata object, must decode the byte stream +using the UTF-8 decode algorithm, and then must parse the resulting string according to the WebVTT parser algorithm below. This results in WebVTT cues -being added to |output|, and CSS style sheets being added to |stylesheets|. -[[!RFC3629]]

    +being added to |output|, CSS style sheets being added to |stylesheets|, and, +if the file contains any WebVTT header metadata pairs, +the resulting WebVTT header metadata being assigned to |header metadata| (when +provided). [[!RFC3629]]

    A WebVTT parser, specifically its conversion and parsing steps, is typically run asynchronously, with the input byte stream being updated incrementally as the resource is @@ -2529,9 +2739,25 @@ stream lacks this WebVTT file signature, then the parser aborts.

    processed, but it contains no useful data and so no WebVTT cues were added to |output|.

    -
  • Header: If the character indicated by |position| is not a U+000A LINE FEED (LF) - character, then collect a WebVTT block with the in header flag set. Otherwise, - advance |position| to the next character in |input|.

  • +
  • +

    Header: Run these substeps:

    +
      +
    1. +

      If the character indicated by |position| is a U+000A LINE FEED (LF) character, then + advance |position| to the next character in |input| and skip the remaining substeps of + this step.

      +
    2. +
    3. +

      Let |header block| be the result of running the steps to collect a WebVTT block with the in header flag set.

      +
    4. +
    5. +

      If |header block| is a WebVTT header metadata object and a |header metadata| + slot was provided to this WebVTT parser, then set the |header metadata| slot to + |header block|.

      +
    6. +
    +
  • collect a sequence of code points that are U+000A LINE FEED (LF) characters.

  • @@ -2809,6 +3035,10 @@ header set, the user agent must run the following steps:

    using |region| for the results. Construct a WebVTT Region Object from |region|, and return it.

    +
  • Otherwise, if in header is set, then let |header metadata| be the result of + collecting WebVTT header metadata from |buffer|, + and return |header metadata|.

  • +
  • Otherwise, return null.

  • @@ -2816,6 +3046,110 @@ header set, the user agent must run the following steps:

    +

    WebVTT header metadata parsing

    + +

    When the algorithm in [[#file-parsing]] says to collect WebVTT header metadata from a +string |input|, the user agent must run the following algorithm. The algorithm returns either a +WebVTT header metadata object or null.

    + +
      + +
    1. Let |metadata| be a new WebVTT header metadata object, with each of its reserved + keys ("lang", "kind", "label", "type") + unset and with an empty list of unrecognized header + metadata pairs.

    2. + +
    3. Let |lines| be the result of splitting |input| on U+000A LINE FEED (LF) + characters.

    4. + +
    5. Let |had pair| be false.

    6. + +
    7. +

      For each string |line| in |lines|, run the following substeps:

      +
        + +
      1. If |line| does not contain either a U+003A COLON character (":") or a + U+003D EQUALS SIGN character ("="), then jump to the step labeled next + line.

      2. + +
      3. Let |separator index| be the lowest index in |line| at which either a U+003A COLON + character or a U+003D EQUALS SIGN character appears.

      4. + +
      5. Let |name| be the substring of |line| from the first character up to but not + including the character at |separator index|, with any trailing U+0020 SPACE and U+0009 + CHARACTER TABULATION (tab) characters removed.

      6. + +
      7. If |name| is not a WebVTT header metadata key, then jump to the step labeled + next line. (Invalid lines are ignored individually; they do not invalidate other + pairs.)

      8. + +
      9. Let |value| be the substring of |line| starting from the character immediately after + |separator index| to the end of |line|, with any leading and trailing U+0020 SPACE and + U+0009 CHARACTER TABULATION (tab) characters removed.

      10. + +
      11. If |value| is not a WebVTT header metadata value, then jump to the step + labeled next line.

      12. + +
      13. +

        Process |name| and |value| as follows:

        +
        + +
        If |name| is "lang"
        +
        If lang has already been set on |metadata|, + do nothing. Otherwise, set lang on |metadata| to + |value|.
        + +
        If |name| is "kind"
        +
        If kind has already been set on |metadata|, + do nothing. Otherwise, if |value| is one of "subtitles", + "captions", "descriptions", "chapters", or + "metadata", set kind on |metadata| + to |value|. Otherwise, do nothing.
        + +
        If |name| is "label"
        +
        If label has already been set on |metadata|, + do nothing. Otherwise, set label on |metadata| + to |value|.
        + +
        If |name| is "type"
        +
        If type has already been set on |metadata|, + do nothing. Otherwise, set type on |metadata| + to |value|.
        + +
        Otherwise
        +
        Append the pair (|name|, |value|) to |metadata|'s list of unrecognized header metadata pairs.
        + +
        +
      14. + +
      15. Set |had pair| to true.

      16. + +
      17. Next line: Continue.

      18. + +
      +
    8. + +
    9. If |had pair| is false, return null.

    10. + +
    11. Return |metadata|.

    12. + +
    + +

    String comparisons of key names in this algorithm are exact (i.e. +case-sensitive). The four reserved keys are defined in lowercase; an input that uses any +other case (for example "Lang" or "LANG") matches none of the +reserved-key clauses and is collected as an unrecognized header metadata pair instead. +Such a pair will only be valid if it also satisfies the hyphen requirement for unrecognized +keys; otherwise it is ignored.

    + +

    Parsers are not required to produce warnings for invalid or unrecognized +header metadata pairs. Pairs that fail validation are dropped without further effect; pairs +that pass validation but use a key not understood by the consuming application are surfaced +through the unrecognized header metadata pairs +list for that application to interpret as it sees fit.

    + +

    WebVTT region settings parsing

    When the WebVTT parser algorithm says to collect WebVTT region settings from a From 895c1aac0ab768e68103d391f361184023dca90b Mon Sep 17 00:00:00 2001 From: James Craig Date: Thu, 28 May 2026 15:35:47 -0700 Subject: [PATCH 2/3] Address PR #548 review feedback - Data model: explicitly state that unrecognized pairs are preserved only when the key meets the hyphen requirement; pairs that don't are dropped by the parser. Explains the webcompat motivation for reserving the unhyphenated key-name space. - Reframe the `type` reserved key as a name reservation with no values defined here; defer the registry (with `video.strobing. general-flash` as the expected first entry) to #512. Drop the now redundant "future revision" note. - Replace duplicated reserved-key lists in the syntax section and in the "collect WebVTT header metadata" algorithm with references to the data-model section. - Drop the "opaque" framing in the metadata-text paragraph per Nigel's earlier objection. - Remove the parenthetical formatting from the parser's "ignore invalid line" step. --- index.bs | 62 +++++++++++++++++++++++++++++--------------------------- 1 file changed, 32 insertions(+), 30 deletions(-) diff --git a/index.bs b/index.bs index 9fc7828..9898833 100755 --- a/index.bs +++ b/index.bs @@ -1522,10 +1522,14 @@ the start of a WebVTT file, immediately following the WEBVTT line and terminated by a blank line. It is represented as an ordered list of (key, value) string pairs.

    -

    The keys listed below are reserved by this specification. Their values, when present, give -the named property of the WebVTT header metadata; any pair whose key is not in this list -is an unrecognized header metadata pair and is preserved for use by the consuming -application.

    +

    The keys listed below are reserved by this specification. Any pair whose key is not reserved +but which satisfies the syntactic requirements for an unreserved key (see [[#syntax]] — +in particular, the requirement that the key contain a U+002D HYPHEN-MINUS character and not +start with one) is an unrecognized header metadata pair and is preserved for use by +the consuming application. Pairs whose keys do not satisfy those requirements are dropped by +the parser and not exposed to the consuming application; this reserves the unhyphenated +key-name space for future standardization and prevents arbitrary applications from claiming +short, unnamespaced keys.

    @@ -1552,20 +1556,20 @@ application.

    type
    -

    An identifier for the schema or application that the metadata - cues or unrecognized header metadata pairs - conform to. Disambiguates between different uses of kind: metadata and is the - primary mechanism for addressing the format ambiguity described in - WebVTT issue #511.

    +

    Reserved by this specification as the key used to identify the schema or application that + the metadata cues or + unrecognized header metadata pairs conform to.

    +

    This specification does not itself define any values for type; + the name is reserved so that a follow-up specification can define a registry of + type values without colliding with author-defined keys. The first such value is + expected to be defined alongside the time-coded flashing-content metadata work in + WebVTT issue #512 (e.g. + video.strobing.general-flash). Until at least one value is registered, the + type key has no defined behavior beyond reserving the name.

    -

    A future revision of this specification, tracked in -WebVTT issue #512, is expected to -register additional type values and to define taxonomies of related keys (for -example, for time-coded flashing-content metadata).

    -

    How WebVTT header metadata values are exposed to, and combined with, the embedding context is defined by that context. In particular, HTML defines how lang, kind, and @@ -1656,9 +1660,8 @@ all of the following properties:

    A WebVTT header metadata key must additionally satisfy one of the following:

      -
    • It is one of the reserved key names defined for WebVTT header metadata in - [[#header-metadata]]: "lang", "kind", "label", or - "type".
    • +
    • It is one of the keys reserved by this specification for WebVTT header metadata; + see [[#header-metadata]] for the current list of reserved keys.
    • It contains at least one U+002D HYPHEN-MINUS character, and its first character is not a U+002D HYPHEN-MINUS character.
    @@ -1893,13 +1896,13 @@ have two consecutive WebVTT line terminators or end with a WebVTT line terminator.)

    WebVTT metadata text cues are typically used by scripted applications (e.g. using the -metadata text track kind in a HTML text track). Because the cue -payload is otherwise an opaque string, authors of metadata files delivered outside of an HTML -context should declare a kind of -"metadata" together with a type -identifying the schema in use; see [[#header-metadata]]. Consumers that do not recognise the -type value should treat the cue payloads as opaque and -must not present them as caption or subtitle text.

    +metadata text track kind in a HTML text track). Authors of metadata +files delivered outside of an HTML context should declare a +kind of "metadata" together with a +type identifying the schema in use; see +[[#header-metadata]]. Consumers that do not recognise the +type value must not present the cue payloads as +caption or subtitle text.

    WebVTT caption or subtitle cue text

    @@ -3054,10 +3057,9 @@ string |input|, the user agent must run the following algorithm. The algorithm r
      -
    1. Let |metadata| be a new WebVTT header metadata object, with each of its reserved - keys ("lang", "kind", "label", "type") - unset and with an empty list of unrecognized header - metadata pairs.

    2. +
    3. Let |metadata| be a new WebVTT header metadata object, with each of its + reserved keys (see [[#header-metadata]]) unset and with an empty list of + unrecognized header metadata pairs.

    4. Let |lines| be the result of splitting |input| on U+000A LINE FEED (LF) characters.

    5. @@ -3080,8 +3082,8 @@ string |input|, the user agent must run the following algorithm. The algorithm r CHARACTER TABULATION (tab) characters removed.

    6. If |name| is not a WebVTT header metadata key, then jump to the step labeled - next line. (Invalid lines are ignored individually; they do not invalidate other - pairs.)

    7. + next line. Invalid lines are ignored individually; they do not invalidate other + pairs.

    8. Let |value| be the substring of |line| starting from the character immediately after |separator index| to the end of |line|, with any leading and trailing U+0020 SPACE and From faa15d0fb74abd73c70af37ccff6d177b7d7cbbd Mon Sep 17 00:00:00 2001 From: James Craig Date: Fri, 29 May 2026 10:31:24 -0700 Subject: [PATCH 3/3] Fix stale hardcoded reserved key count and clarify case-sensitivity note Addresses a missed cleanup from PR #548 feedback. Removes the hardcoded count of "four" reserved keys in the parser notes, and corrects the explanation of how incorrectly-cased keys are handled (they are ignored immediately due to lacking a hyphen, rather than being collected as unrecognized pairs). --- index.bs | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/index.bs b/index.bs index 9898833..a4a78f3 100755 --- a/index.bs +++ b/index.bs @@ -3139,11 +3139,10 @@ string |input|, the user agent must run the following algorithm. The algorithm r

    String comparisons of key names in this algorithm are exact (i.e. -case-sensitive). The four reserved keys are defined in lowercase; an input that uses any +case-sensitive). Reserved keys are defined in lowercase; an input that uses any other case (for example "Lang" or "LANG") matches none of the -reserved-key clauses and is collected as an unrecognized header metadata pair instead. -Such a pair will only be valid if it also satisfies the hyphen requirement for unrecognized -keys; otherwise it is ignored.

    +reserved-key clauses. Furthermore, because such an input lacks a hyphen, it also fails the +syntactic requirements for an unreserved key and is therefore ignored by the parser.

    Parsers are not required to produce warnings for invalid or unrecognized header metadata pairs. Pairs that fail validation are dropped without further effect; pairs