Videos can be used to convey a wide variety of audiovisual content. From entertainment video content, such as movies, television programs, music videos, and the like, to informational or instructional content (e.g., news broadcasts, documentaries, product advertisements, educational shows, etc.), video content offers a rich and effective means for communicating information.
Most contemporary video content is available in digital form and can be recorded or transmitted in one or more electronic formats. For example, traditional cable and satellite television service providers transmit live and prerecorded digital video signals to consumers over corresponding wired and wireless electronic communication media in real time according to a broadcast schedule. In addition many cable and satellite television service providers, and other web based services, have developed functionality to provide video content to consumers using so-called “video-on-demand” (VOD) systems. VOD systems allow service providers to provide specific video assets, such as television shows, movies, and the like, in response to user requests to any number of client devices for viewing.
Such live video and VOD content is usually transmitted as video data. The video data can include constituent visual data, audio data, and, in some instances, textual data (e.g., closed captioning data). In many of the video formats, the visual data is recorded as a sequence of frames that include still images resulting from the arrangement of pixels. Accordingly, the visual data can include a set of frames in which each frame includes a specific set of pixel data that, when rendered by a computer system, results in the corresponding visual content (e.g., images of people, places, and objects) of the video content.
In some scenarios, the visual content might include images of text. Images of text may include images of text on objects in a scene (e.g., words or characters on buildings, signs, or written documents, etc.). The visual content may also include rendered text superimposed over the images of a scene of the visual content. For instance, some television stations may embed on-screen text into visual content of a news broadcast to display summary information, captioning, or to introduce individual stories or segments. Similarly, talk shows may use on-screen text to identify people or topics, while programs showing or discussing sporting events may display on-screen text with running statistics about one or more games (e.g., score, period, time, etc.). Text that appears in the images of a scene or text that is embedded into or superimposed on the image of the scene are referred to herein as “on-screen text.”
On-screen text is distinguishable from text rendered from textual data (e.g., a text string) in that on-screen text does not correspond to underlying data that includes specifications or other indications of the text. Rather, on-screen text is only recognizable by examining the images that result from rendering the corresponding pixel data of the visual data.