It is known in the prior art to provide interactive user interfaces for television programs. Such interactive user interfaces include, for example, electronic program guides (EPG) that may be manipulated to search for broadcast programs or schedule recordings. Interactive user interfaces also include simple video games, menuing systems to access video on demand, and other similar such mechanisms.
Interactive user interfaces may be combined with source video, such as video on a broadcast or cable channel. There are two broad ways to combine such interfaces with source video: scale down the source video and fill the rest of the screen with the interactive user interface, or keep the source video full-screen but overlay the user interface onto the screen. As an example of the first combination, modern EPGs often show dynamically-generated channel information with a small preview window that shows video for a current channel. As an example of the second combination, television sets often provide volume controls as elements that overlay an area of the screen, typically near the bottom or along one side, while continuing to display the underlying source video content full-screen.
The latter method to combine user interfaces with source video can itself be broken into two different categories: opaque user interfaces and translucent, or partially transparent, user interfaces. Different techniques can be used for these different categories. For example, if it is known in advance that a user interface will be opaque, then the pixels of the underlying source video content may be discarded at the beginning of the overlay process. This ability to discard pixels simplifies processing of the overlays and permits compositing of the user interface directly into the source image. For certain block-based encoding schemes, compositing can be accomplished at a block level. However, for partially transparent user interfaces, the underlying pixels must be retained and blended with the user interface.
It is also known in the art to overlay images using blending. For purposes of the present disclosure, “blending” refers to a process of alpha compositing; that is, the process of combining two colors using a transparency coefficient, a. Using this technique, each pixel of each image may be viewed as being associated with four values: three color values and one alpha value, each between 0.0 and 1.0, either by storing these values per pixel or in a lookup table such as for example a palette. If the color values are red-green-blue, for example, then these four values are denoted RGBA. Alpha blending takes as input the RGBA values of a foreground pixel and a background pixel, and produces as output a pixel having RGBA values color(output)=α(f)*color(f)+(1−α(f)*color(b) and a(output)=α(f)+α(b)*(1−α(f), where α(f) and α(b) are the transparency coefficients of the foreground and background pixels, respectively. In other words, the colors and transparency coefficients of the output are a weighted average of the foreground and background pixel, using “α” as the weight. Thus, if α=0.0 in the foreground pixel, then the colors in the output pixel are the same as that of the background (that is, the foreground pixel is not visible). If α is increased from 0.0 toward 1.0, more of the foreground pixel becomes visible, until when α=1.0 the color of the output pixel is the same as that of the foreground pixel (that is, the background pixel is completely overlaid by the foreground pixel).
However, it is generally disadvantageous to blend user interfaces at the server (e.g., at a cable headend), for a number of reasons. First, a typical television provider will have hundreds of thousands or millions of subscribers, a significant portion of whom will, at any given time, require interactive user interfaces. Each subscriber may be watching a different source video, and blending all of these source videos with any number of user interfaces is a problem that does not scale well. Second, blending a user interface with a source video requires access to the pixels of the source video, but the source video that is broadcast is typically ingested from a content provider, encoded according to a transmission encoding that exceeds available computational power. Third, a significant latency may be caused by the blending process, creating an unacceptable ‘sluggishness’ in the response of the user interface.