Videos can be used to convey a wide variety of audiovisual content. From entertainment video content, such as movies, television programs, music videos, and the like, to informational or instructional content, like newscasts, documentaries, product advertisements, and educational material, video content offers a rich and effective means for communicating information.
Video content is available in digital form and can be recorded or transmitted in one or more electronic formats. For example, traditional cable and satellite television service providers transmit live and prerecorded digital video signals to consumers over corresponding wired and wireless electronic communication media in real time according to a broadcast schedule. That is, conventional television (TV) viewers generally consume TV content linearly; e.g., they generally watch a TV program from beginning to end, with limited interactions such as pausing, rewinding, and fast-forwarding. In addition many cable and satellite television service providers, and other web based services, have developed functionality to provide video content to consumers using so-called “video-on-demand” (VOD) systems. VOD systems allow service providers to provide specific video assets, such as television shows, movies, and the like, in response to user requests to any number of client devices for viewing. Such live video and VOD content is usually transmitted as video data. The video data can include constituent visual data, audio data, and, in some instances, textual data (e.g., closed captioning data). As users experience other video technologies, they expect more functionality and experiences from their TV content providers. More specifically, users expect the ability of searching for content, watching content in a non-linear manner, or watching only the content that interests them.
In many of the video formats, the visual data is recorded as a sequence of frames that include still images resulting from the arrangement of pixels. Accordingly, the visual data can include a set of frames in which each frame includes a specific set of pixel data that, when rendered by a computer system, results in the corresponding visual content (e.g., images of people, places, and objects) of the video content.
In some scenarios, the visual content might include images of text. Images of text may include images of text on objects in a scene (e.g., words or characters on buildings, signs, or written documents, etc.). The visual content may also include rendered text superimposed over the images of a scene of the visual content. For instance, some television stations may embed on-screen text into visual content of a news broadcast to display summary information, captioning, or to introduce individual stories or segments. Similarly, talk shows may use on-screen text to identify people or topics, while programs showing or discussing sporting events may display on-screen text with running statistics about one or more games (e.g., score, period, time, etc.). Text that appears in the images of a scene or text that is embedded into or superimposed on the image of the scene are referred to herein as “on-screen text.”
On-screen text is distinguishable from text rendered from textual data (e.g., a text string from closed captioning information) in that on-screen text does not correspond to underlying data that includes specifications or other indications of the text. Rather, on-screen text is only recognizable by examining the images that result from rendering the corresponding pixel data of the visual data.
Audio data and/or textual data often accompanies the visual content to present a complete audiovisual experience. The audio data typically includes sounds, such as voices, scene noises, music and the like. The textual data can be rendered along with the visual content to give additional context, labels, and titles to the visual content. In some scenarios the textual data can give textual representation of speech and other sounds in the audio content so hearing impaired individuals can access it.