For the representation of stereo and 3D-video several methods have been proposed [1]. One of the methods for 3D video is the Multi-View plus Depth (MVD) format. The MVD-format stores the scene information as two or multiple texture views depicting the 3D-scene from different perspectives. Additionally the scene geometry is represented by a full dense depth map per texture view. The MVD format supports the generation additional texture views located in between the provided views by depth image based rendering (DIBR). For this the samples of the views' textures are warped using disparities obtained from their depth map.
Modern auto stereoscopic displays provide a high view density with eight to 28 or even more views. However, recording of a 3D scene in a real live scenario can only be accomplished with a small number of cameras. Thus, the possibility to generate intermediate views as provided by the MVD format is a feature that may be used for a 3D video system. Moreover the usage of depth maps and view interpolation provide advantages regarding the transmission of 3D-video. Depth maps can be coded at a highly reduced rate compared to a video view and may use less bandwidth.
Compared to multi-view video, the generation and transmission of depth based video involves additional processing steps at the sender and receiver side. In particular, depth modifications due to, for example, lossy compression, results in distortions of the depth map itself. However, most importantly is the distortion of a synthesized view synthesized from the view of the modified depth map, and accordingly, for performing a rate/distortion optimization correctly, the distortion caused by the modification of depth map would have to be somehow taken into account when optimizing. However, until now, such determination is not performed in an exact manner due to the overhead associated therewith.