1. Field of the Invention
The present invention relates to a method for decoding a stereoscopic digital video stream, i.e. a video stream which, when appropriately processed in a visualization device, produces sequences of images which are perceived as being three-dimensional by a viewer.
This method allows a user of a conventional (i.e. non-stereoscopic) decoder and/or television set to view stereoscopic images in 2D mode, as well as to use a stereoscopic (3D) decoder and/or television set for 2D display.
2. Present State of the Art
In recent years, the cinematographic production world has paid much attention and has devoted huge resources to the production of stereoscopic 3D contents under the stimulus of new production tools made available by the new digital technologies.
The interest in 3D is now extending to domestic use, i.e. for displaying images on a television set. For example, some pay-TV operators will shortly broadcast 3D programs.
The most common approach to presenting stereoscopic video contents involves displaying two independent video streams intended for the right eye and for the left eye, respectively, which are then reassembled by the human brain into a three-dimensional object.
Stereoscopic contents for domestic use are generally high-definition video contents and may be distributed on a mass memory medium (DVD or Blu-ray discs, magneto-optic or solid-state media, etc.) or via (wired or wireless) broadcasting channels or via a telecommunication network (IP).
In the production environment, however, with the existing structures it may be impossible to transfer and handle separately the two streams produced by stereoscopic video cameras shooting a scene from two different points of view.
Moreover, the distribution networks reaching the final user are so big that it is uneconomical to use two independent high-definition streams for providing a single service. As a consequence, a number of measures must be taken during the production process in order to reduce the bit-rate required for content transfer and fruition.
Since the results of studies carried out on the differences in the perception of details in two-dimensional and three-dimensions images seem to indicate that the quality perceived by the user remains acceptable even when the resolution of stereoscopic contents is lowered compared to that of two-dimensional contents, different techniques have been developed for packing the two images composing the stereoscopic view into a single frame (frame packing).
For example, in the case of a single high-definition frame C (1920×1080 pixels), the two images composing the left and right channels (hereafter referred to as L and R) are acquired with a horizontal resolution equal to half the resolution of a high-definition frame and are then arranged side by side into a single frame (side-by-side format), as shown in FIG. 1a. 
In this way, it is possible to use a single high-definition stream for transporting the two independent video channels; at decoding time, the two half-frames are then separated and brought back to the 16/9 format by applying suitable interpolation techniques.
Likewise, an alternative process may be used which involves halving the vertical resolution and leaving the horizontal resolution unchanged, and then arranging the two frames L and R one on top of the other (top-bottom format), as shown in FIG. 1b. 
The stereoscopic video stream consisting of composite frames is then compressed in order to reduce its transport bit-rate before distributing it on a broadcasting network, an IP network or a mass memory medium.
One of the most important requirements on which the attention of the various service providers (especially public service broadcasters) is focused is the 2D compatibility of the stereoscopic signals.
In fact, in order to allow those users who already own a high-definition decoder to enjoy the broadcast services, it is desirable that 3D programs can also be displayed as 2D programs. Likewise, it is desirable that a 3D content on a DVD, a Blu-ray disc 3D or an Internet site can be displayed by both 2D and 3D television sets and monitors.
This result can be achieved in two ways: either by simultaneously broadcasting both the 2D and 3D versions of one program or by adopting an appropriate technique for coding the stereoscopic stream.
Of course, the first option involves wasting bandwidth, which is one thing that service providers would rather avoid.
As to the second option, several techniques are known in the art for generating 2D compatible stereoscopic streams.
One of these technique relates to the application of so-called “depth maps”, as described, for example, in US patent applications no. US 2002/0048395 and no. US 2004/0101043.
In practice, a signal is associated with the two-dimensional colour video in the form of a supplementary black and white video that carries the depth maps. A suitable decoder can rebuild a stereoscopic video starting from the received data. However, this technique suffers from the very same problems of the aforementioned 2D and 3D transmission of the same program: in fact, two video signals must be transferred in parallel, resulting in a high transport bit-rate.
Another 2D-compatible stereoscopic stream coding technique is, for example, the one referred to as “multiview”.
Because the pairs of right and left images making up the stereoscopic video stream are characterized by a high degree of resemblance, the space-time redundancy suppression techniques employed when coding two-dimensional streams can be used in this case as well. In fact, once a certain offset due to the geometric distance between the shooting points (i.e. the interocular distance) has been subtracted, the differences between the right image and the left image are small.
The MPEG2 standard has been extended with a supplementary specification called Multi View Profile (MVP); likewise, the subsequent H.264/AVC standard has been extended by including the Multi View Coding (MVC) specification.
A common characteristic of these two specifications is the use of scalable video coding: the stereoscopic video stream is compressed into a base layer (the 2D base stream) plus an enhancement layer, which transports the second view. The syntax of the coded stream ensures that the 2D video can also be decoded by old-generation decoders, so long as they comply with the MPEG2 or H.264/AVC standards.
However, the bit-rate necessary for coding stereoscopic streams into one of the above described formats is still too high to allow it to be used in the broadcasting environment and, as a consequence, frame packing formats remain the only feasible short-term solution for starting up 3D services.