Video is typically compressed using Intra coded frames (I-frames) and Inter or Predicted frames (P-frames). I-frames are frames that are coded without prediction from other frames and thus do not require reference frames. P-frames are those frames that use prediction from a reference frame. I-frames also contain only intra macroblocks, whereas, P-frames contain either intra macroblocks or predicted macroblocks. P-frames require prior decoding of some other reference frame in order to be decoded and often require fewer bits for encoding than I-frames.
There has been increased development of video applications where multiple views (multiviews) of a scene are captured simultaneously, encoded using I-frames and/or P-frames, and delivered to users. In these types of video applications, users are given the ability to switch among the multiple views in real-time and are thus afforded greater levels of interactivity than with conventional video applications. Among applications for multiview video coding (MVC) tools are those where users are allowed to select for playback only a subset of those views or, potentially, virtual views generated from the actual captured video data.
A schematic diagram of a conventional coding tree 100 based on I-frames (I(i,j)), wherein i is the time index and j is the view, encoded at a server and communicated to a client is depicted in FIG. 1. As shown therein, the coding tree 100 is encoded based solely on I-frames and thus, some of the I-frames may generate different views as compared with other I-frames, without requiring reference frames for differential coding. However, the coding tree 100 depicted in FIG. 1 requires large transmission costs because I-frames are typically several times larger than P-frames, as discussed above.
Many of the recent activities in MVC have focused on image capture and compression. For example, the MVC standardization process has concentrated on developing new compression algorithms to encode all of the frames in the multiview sequence in a rate-distortion optimal manner. As such, much consideration has not been given to affording more efficient multiview control of streaming video to clients.