The provision of a stereoscopic (3D) user experience has been a long held goal of both content providers and display manufacturers. Recently, the urgency of providing a stereoscopic experience to home users has increased with the production and release of multiple popular 3D movies and other 3D material such as sports events, concerts and documentaries. A number of methods have been proposed that would enable the delivery of stereoscopic 3D content to home users. One technique that has been proposed is to multiplex the two stereoscopic views into a single frame configuration (frame compatible) using a variety of filtering, sampling, and arrangement methods. Sampling could, for example, be horizontal, vertical, or quincunx, while an offset could also be considered between the two views allowing better exploitation of redundancies that may exist between them. Similarly, arrangements could be side-by-side, over-under, line-interleaved, and checkerboard packing among others.
The above methods, however, require each view to be downsampled to half the original resolution. Therefore, a number of methods have been proposed that would enable the delivery of full resolution 3D. One method is to utilize two separate and independent bitstreams (simulcast), where each bitstream represents a different view (e.g., left and right eye). This method, however, is complex in terms of storage and bandwidth requirements since the redundancies that exist between the two views are not exploited. An extension of this method that tries to exploit some of the redundancies was proposed and adopted as the Multiview Video Coding (MVC) extension of the MPEG-4 AVC/H.264 video coding standard. See Advanced video coding for generic audiovisual services, http://www.itu.int/rec/recommendation.asp?type=folders&lang=e&parent=T-REC-H.264, March 2009, incorporated herein by reference in its entirety. This method is a scalable system that delivers one view as a base layer image and the other view or views as enhancement layers. In this case, redundancies among the views are exploited using only translational motion compensation based methods, while the system is based on “intelligent” reference buffer management for performing prediction compared to the original design of MPEG-4 AVC. Unfortunately, even though coding efficiency was somewhat improved (20-30% over simulcast), the reliance on translational only motion compensation limits the performance of this scheme. Another method that uses an affine model to generate a prediction of one view from the other view is proposed in U.S. Pat. No. 6,144,701, also incorporated herein by reference in its entirety.
Other applications that are of considerable interest include scalable video delivery applications (e.g., 2D scalable video encoding) where it is desirable that a video signal is encoded using multiple layers, each layer enabling a different quality level or resolution (spatial or temporal) of the video signal.