Stereoscopic video, also referred to as three dimensional (3-D) video, creates the illusion of depth for displayed images. One method for creating depth perception is to display two different two dimensional (2-D) images, where each image represents two perspectives of the same object, similar to the perspectives that both eyes naturally receive in binocular vision.
With the arrival of many high-quality stereoscopic displays in the market, it is desirable to provide a compression solution for stereoscopic video with superior coding efficiency and with reasonable coding complexity.
In recent years, much effort has been put in the design of efficient methods for compressing stereoscopic video. Conventional monoscopic compression methods can be applied independently to the left and right views of a stereo image pair. However, higher compression ratios can be achieved if the high correlation between views is exploited.
Regarding a prior art approach in which both views of a stereoscopic image pair are encoded, a Multi-View Profile (MVP) was defined in the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-2 (MPEG-2) standard to transmit a pair of video signals. MVP relies on a multi-layer signal representation approach such that one view (often the left view) is assigned to a base layer, and the other view is assigned to an enhancement layer. Monoscopic coding with the same tools as Main Profile (MP) is applied to the base layer. The enhancement layer is coded using temporal scalability tools and a hybrid prediction of motion and disparity fields.
In prior art methods relating to the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 standard (hereinafter the “MPEG4/H.264 standard” or simply the “H.264 standard”), stereoscopic video coding can be performed in two different ways: (i) as a particular case of interlaced image coding, where all the fields of a particular parity are assigned to the left view and all the fields of the opposite parity are considered the right view of the stereo-view content; or alternatively (ii) by alternating frames from the left and rights views to create a single monoscopic video sequence. A stereovision supplemental enhancement information (SEI) message provides an indication to the decoder of whether or not the coded video sequence represents stereoscopic content and which method was used to encode the corresponding content.
These previously known methods require minimum modifications of existing monoscopic coding techniques. However, they show a limited ability for reducing the redundancy existing between the two views in a stereoscopic pair. As a result, the encoding of stereo-view results in a large overhead when compared to the encoding of a single monoscopic view. This problem has prevented the spread of stereovision for consumer applications with limited transmission bandwidth.
Other prior art methods include methods in which encoding is performed for one view plus some “additional 3-D information”. This more general and simple approach to code stereoscopic content is to encode one single view plus some additional 3-D information allowing the receiver to render the second view of the stereoscopic pair. Traditionally, the transmitted 3-D information is represented by a depth and/or parity map. A depth map includes a 2-D image representation of the 3-D scene for which each pixel is assigned a depth value. Differences in pixel values correspond to differences in depth in the 3D scene. Often, depth data is encoded as a luminance channel only video stream.
In MPEG-4 Part 2, video object syntax includes so-called multiple auxiliary components (MAC), which are coded as gray-level images using motion-compensated DCT. Motion vectors of a video object will be used for the motion compensation of its auxiliary components. One utility of auxiliary components is to code depth or disparity map data. However, there is a restriction that auxiliary components must have the same size as the luminance component of the video object. The previous method shows an improved performance compared to MPEG-2 MVP. However, the MPEG-4 Part 2 standard has not been successfully deployed in the industry because of the superior coding gains of MPEG-4 part 10 and the high complexity of the proposed object oriented coding methods.