The area of three dimensional (3D) video and three dimensional television (3DTV) is gaining momentum and is considered the next logical step in consumer electronics, mobile devices, computers and the movies. The additional dimension on top of two dimensional (2D) video offers multiple different directions for displaying the content and improves the potential for interaction between viewers and the content.
A new generation of auto-stereoscopic displays allows the viewer to experience depth perception without glasses. These displays project slightly different pictures in the different directions as shown in FIG. 1, which illustrates an exemplifying display scheme. Therefore, if the viewer is located in a proper position in front of the display, his/her left and right eye see slightly different pictures of the same scene, which make it possible to create the perception of depth. In order to achieve smooth parallax and change of the viewpoint when the user moves in front of the screen, a number of views, typically 7-28, are generated. A view is a picture or a video of the scene taken from a certain camera position.
When using the above mentioned approach, a problem may be that transmission of the views may require a high bit rate. However, the problem can be overcome by transmitting a lower number, e.g. 1 to 3, of key views and generating the other views by the so-called view synthesis process from the transmitted key views. These synthesized views can be located between the key views (interpolated) or outside the range covered by the key views (extrapolated).
One of the view synthesis techniques is Depth Image Based Rendering (DIBR). In order to facilitate the view synthesis, DIBR uses depth map(s) of the key view(s) (could theoretically also be depth maps of other views). A depth map can be represented by a grey-scale image having the same resolution as the view, such as a video frame. Then, each pixel of the depth map represents the distance from the camera to the object for the corresponding pixel in the image/video frame.
There are a number of parameters that may be used in view synthesis. These parameters may be referred to as view synthesis related parameters.
In order to facilitate the DIBR view synthesis, the number of parameters needs to be signaled for the device or program module that performs the view synthesis. Among those parameters are first of all z_near and z_far that represent the closest and the farthest depth values in the depth maps for the frame under consideration. These values are needed in order to map the quantized depth map samples to real depth values that they represent (one of the formulas below). The upper formula is used if all the depth values from the origin of the space are positive or all negative. Otherwise, the formula below is used.
                    Z        =                  1.0                                                    v                255.0                            ·                              (                                                      1.0                                          Z                      near                                                        -                                      1.0                                          Z                      far                                                                      )                                      +                          1.0                              Z                far                                                                        (        1        )                                Z        =                  Tz          +                                    1.0                                                                    v                    255.0                                    ·                                      (                                                                  1.0                                                  Z                          near                                                                    -                                              1.0                                                  Z                          far                                                                                      )                                                  +                                  1.0                                      Z                    far                                                                        .                                              (        2        )            
These formulas 1, 2 are used for translating quantized depth value to real depth value. Variable v represents luminance value for each pixel in a grey-scale depth image (for 8-bit depth map, between 0 and 255). Tz represents a z component (z coordinate) of translation vector.
Another set of parameters that is needed for the view synthesis are camera parameters.
The camera parameters for the 3D video are usually split into two parts. The first part that is called the intrinsic (internal) camera parameters represents the optical characteristics of the camera for the image taken, such as the focal length, the coordinates of the images principal point and the radial distortion. The extrinsic (external) camera parameters, in their turn represent the camera position and the direction of its optical axis in the chosen real world coordinates (the important aspect here is the position of the cameras relative to each other and the objects in the scene). It shall here be noted that the extrinsic parameters, or extrinsic camera parameters, may include translation parameters, which may be comprised in a translation vector. Both intrinsic and extrinsic camera parameters are required in the view synthesis process based on usage of the depth information (such as DIBR).
As an alternative to the DIBR solution above, a Layered Depth Video (LDV) may be utilized. The LDV solution uses multiple layers for scene representation. These layers can be foreground texture, foreground depth, background texture and background depth.
In order to make different devices compatible with respect to how camera parameters and the like are signaled, ways of how to send the camera parameters to the decoder have been standardized.
One of these standardized ways is defined in the Multi-view Video Coding (MVC) standard, which is defined in the annex H of the well-known Advanced Video Coding (AVC) standard, also known as H.264. The scope of MVC covers joint coding of stereo or multiple views representing the scene from several viewpoints. The standard eventually exploits correlation between these views of the same scene in order to achieve better compression efficiency comparing to compressing the views independently. The MVC standard also covers sending the camera parameters information to the decoder. The camera parameters are sent as Supplementary Enhancement Information (SEI) message. The syntax of this SEI message is shown in Table 0.1.
A contribution to the Moving Pictures Experts Group (MPEG) standardization has also proposed to signal the z_near and z_far values to the decoder. The proposed syntax for signaling z_near and z_far parameters is shown in Table 0.2.
One can see from Table 0.1 that camera parameters are sent in floating point representation. The floating point representation allows to support a higher dynamic range of the parameters and to facilitate sending the camera parameters with higher precision. The higher precision of the camera parameters is important for the view synthesis that has been shown by Vetro et al.
In many video coding standards/solutions, in order to get higher coding efficiency and support temporal scalability, video pictures may be coded in a different order than their display order. One of the examples of such coding structure is hierarchical B coding, which extensively uses bi-directional picture prediction.
In H.264/AVC, both coding order and display order are signaled in Network Abstraction Layer (NAL) unit header, represented by Frame Number and Picture Order Count (POC) respectively. A decoder shall follow non-decreasing order of Frame Number to decode a sequence. A display, on the other hand, shall follow increasing POC order to render the image on the screen. FIG. 2 shows the concept of different coding order and display order.
Even though the methods above, such as DIBR and LDV, reduce the bit rate between the encoder and the decoder, it would be desired to further reduce the required bit rate.