The development of stereoscopic, i.e., three-dimensional, video applications largely depends on the availability of efficient formats for representing and compressing the three-dimensional video signal. Moreover, in television broadcast applications (3D-TV) it is necessary to maintain the highest possible degree of backward compatibility with existing 2D systems.
The currently most widespread technical solutions are based on the so-called “frame compatible arrangement”, wherein the two views (video images to be presented to the left eye and to the right eye, respectively) relating to the same time instant are suitably re-scaled, if necessary or appropriate, and then put together to form a single image. The most typical solutions are known as Top Bottom and Side by Side arrangements, wherein the two views are entered into a single frame one on top of the other or side by side from left to right. These solutions allow using the entire existing video signal distribution infrastructure (terrestrial, satellite or cable broadcasting, or streaming over IP network), and do not require new representation and compression standards. In addition, the AVC/H.264 coding standard already includes the possibility of signaling this type of composition of the stereoscopic signal, so that it can be correctly reconstructed and displayed by the receiver.
Applications which are more advanced than stereoscopy use more than two viewpoints, resulting in the necessity of representing, coding and transmitting a larger set of video sequences. In this frame, the state of the art is represented by the MVC standard (Annex G of AVC/H.264). MVC utilizes the known transform-type hybrid video coding paradigm, and allows to eliminate some redundancy among the various views. Said standard has been chosen for disk-stored stereoscopic videos and for Blu-ray players.
Finally, another possibility consists of 3D video representations not only using the video signal. The best-known example is represented by the approach known as video plus depth map (V+D, i.e., Video+Depth) and variants thereof with more views and depth maps. The depth map enables new methodologies, such as the possibility of synthesizing intermediate viewpoints more or less close to the main view, to be used, for example, with a self-stereoscopic 3D display. There is currently just one standard, i.e., MPEG-C, for signaling this type of format. The values of the depth map can be estimated or measured by suitable sensors. Such values are generally represented as images with 256 grayscale levels and compressed by using standard techniques (MPEG-x, H.26x)