Picture-in-picture (PIP) allows viewers to view multiple separate video sources simultaneously. For example, some Blu-ray Disc titles include a picture-in-picture track that allows the viewer to see the director's comment on a film they are watching. Traditionally, such as in Blu-ray Disc applications, the picture-in-picture is implemented by generating a hard coded PIP video, i.e. by replacing the regions in the background video with a foreground video. The hard coded PIP video is compressed and transmitted to the receiver. As a result, viewers are not able to dynamically adjust the PIP, such as to enable/disable the PIP feature (unless a copy of the background video is sent separately), to change the position of the foreground video, etc. Another traditional PIP application is to overlay two independent video streams at the player side, where video transport cannot provide any correlation information of the PIP video streams.
With the development of interactive media technology, multiple video components can be correlated and form a set of media, i.e. PIP media. The rendering of the PIP media can be dynamic, which means the position, scaling and alpha blending of the foreground videos can vary during playback, determined by either content creation or user interactions. In the previous example wherein a foreground video shows a director commenting on the background video, dynamic PIP enables the effect that the director points to different positions of the background video by e.g. moving the foreground video.
One deficiency of the current media file formats, such as MPEG2 transport stream and ISO media file format, is that they cannot live or dynamically update the information of position, layer and scaling for the PIP stream in the system layer (i.e. the transport layer). Without the dynamic position and scaling information, it is not possible to reliably fit or overlay a video source on a display region that does not share the same resolution. One possibility is to retrieve the information from the video decoder. Depending on the codec and the output of a particular encoder, such information may not exist or may not be reliable. It may also be difficult for a system to extract this information from the codec as well. A system-level approach is a better approach to offering a consistent experience regardless of the underlying video codec used.
The present invention provides solutions to support picture-in-picture functionality and improve the flexibility of picture-in-picture for multimedia applications such as transmission and rendering.