Recently, there has been considerable interest and traction in the industry towards stereoscopic (3D) video delivery. High grossing movies have brought 3D stereoscopic video into the mainstream, while major sports events are also being produced and broadcast in 3D. Animated movies, in particular, are increasingly being generated and rendered in stereoscopic format.
While though there is already a sufficiently large installed base of 3D-capable cinema screens, the same is not true for consumer 3D applications. Efforts in this space are still in their infancy, but several industry parties are investing considerable effort into the development and marketing of consumer 3D-capable displays [Reference 1].
Stereoscopic display technology and stereoscopic content creation are issues that have to be properly addressed to ensure sufficiently high quality of experience. The delivery of 3D content is equally critical. Content delivery comprises several components, including compression. Stereoscopic delivery is challenging because a stereoscopic delivery system handles twice as much information as a 2D delivery system does. Furthermore, the computational and memory throughput requirements increase considerably as well.
In general, there are two main distribution channels through which stereoscopic content can be delivered to the consumer: fixed media, such as Blu-Ray discs, and streaming solutions where the content is delivered primarily to a set-top box and secondarily to a PC.
The majority of the currently deployed Blu-Ray players and set-top boxes support only codecs such as those based on the profiles of Annex A of the ITU-T/ISO/IEC H.264/14496-10 [Reference 2] state-of-the-art video coding standard (also known as MPEG-4 Part 10 AVC) and the SMPTE VC-1 standard [Reference 3].
Each of these codec solutions enables a service provider to deliver a single HD image sequence at 1920×1080-pixel resolution. However, to deliver stereoscopic content involves transmitting information for two sequences, a left and a right one. A straightforward approach is to encode two separate bitstreams, one for each view, an approach also known as simulcast.
First, simulcast or similar approaches have low compression efficiency. They also use high bandwidth to maintain an acceptable level of quality. This is because the left and right view sequences are coded independently even though they are correlated.
Second, the two separate bitstreams are de-multiplexed and decoded in parallel in two properly synchronized decoders. To implement such decoders, one may use two existing off-the-shelf decoders. In addition, parallel decoding fits Graphics Processing Unit architectures.
Codecs that support multiple layers may provide high compression efficiency for stereoscopic video while at the same time maintaining backwards compatibility.
Multi-layer or scalable bitstreams are composed of multiple layers that are characterized by pre-defined dependency relationships. One or more of those layers are so-called base layers that are decoded before any other layer and are independently decodable.
Other layers are usually known as enhancement layers since their function is to improve the content obtained by parsing and decoding the base layer or layers. These enhancement layers are also dependent layers in that they depend on the base layers. The enhancement layers use some kind of inter-layer prediction, and often one or more of the enhancement layers may also be dependent on the decoding of other higher priority enhancement layers. Thus, decoding may also be terminated at one of the intermediate layers.
Multi-layer or scalable bitstreams enable scalability in terms of quality/signal-to-noise ratio (SNR), spatial resolution, and/or temporal resolution, and/or even availability of additional views. For example, using codecs based on Annex A profiles of H.264/MPEG-4 Part 10, VC-1, or VP8, one may produce bitstreams that are temporally scalable.
A first base layer, if decoded, may provide a version of the image sequence at 15 frames per second (fps), while a second enhancement layer, if decoded, can provide, in conjunction with the already decoded base layer, the same image sequence at 30 fps.
SNR and spatial scalability are also possible. For example, when adopting Scalable Video Coding (SVC) extension of the H.264/MPEG-4 Part 10 AVC video coding standard (Annex G), the base layer (coded under Annex A) generates a coarse quality version of the image sequence. The enhancement layer or layers may provide additional increments in terms of visual quality. Similarly, the base layer may provide a low resolution version of the image sequence. The resolution may be improved by decoding additional enhancement layers, spatial or/and temporal. Scalable or multi-layered bitstreams are also useful for providing multi-view scalability.
The Stereo High Profile of the Multi View Coding (MVC) extension (Annex H) of H.264/AVC was recently finalized and has been adopted as the video codec for the next generation of Blu-Ray discs (Blu-Ray 3D) that feature stereoscopic content. This coding approach attempts to address, to some extent, the high bit rate requirements of a stereoscopic video stream.
The Stereo High Profile utilizes a base layer that is compliant with the High Profile of Annex A of H.264/AVC and which compresses one of the views (usually the left) that is termed the base view. An enhancement layer then compresses the other view, which is termed the dependent view. While the base layer is on its own a valid H.264/AVC bitstream, and is independently decodable from the enhancement layer, the same may not be, and usually it is not, true for the enhancement layer. This is because the enhancement layer can utilize as motion-compensated prediction references decoded pictures from the base layer. As a result, the dependent view (enhancement layer) may benefit from inter-view prediction and compression may improve considerably for scenes with high inter-view correlation (i.e. low stereo disparity). Hence, the MVC extension approach attempts to tackle the problem of increased bandwidth by exploiting stereoscopic disparity.
However, such an approach might not provide compatibility with the existing deployed set-top box and Blu-Ray player infrastructure. Even though an existing H.264 decoder may be able to decode and display the base view, it will simply discard and ignore the dependent (right) view. As a result, existing decoders do not provide the capability to decode and display 3D content encoded using MVC. Hence, while MVC retains 2D compatibility, MVC does not deliver 3D content in legacy devices. The lack of backwards compatibility is an additional barrier towards rapid adoption of consumer 3D stereoscopic video.