Captioning of audiovisual content, such as broadcast television programs, cable television programs, and streaming media content, has become widely available. Captioning typically provides a transcript of the verbal audio in the audiovisual content, or provides textual information related to nonverbal audio, such as music or other nonverbal sounds.
To provide captioning information, broadcast signals for various television programming typically include both audiovisual data as well as caption data. Similarly, streaming media content may be transmitted with caption data, or additional metadata to be used for captioning.
Typically, a receiver is required to process both the audiovisual data, and to display captioning along with the audiovisual content. Receivers typically must rely on caption data that is embedded or transmitted with audiovisual data from a broadcast source to provide captioning. Thus, when audiovisual content is provided without caption data, it is impossible to provide captioning for the audiovisual content. This may occur, for example, during live events, such as live broadcasts and live video streams, or other audiovisual content for which caption data has not been created by a broadcaster or other content provider.
Accordingly, tools and techniques for real-time and/or on-demand captioning and translation of audiovisual content are provided.