Three-dimensional (3D) video capture devices generally include two cameras in a formation that generally mimics the arrangement of the human eyes. The two cameras each capture two-dimensional (2D) video data of a scene although from slightly shifted perspectives that mimic the perception of the scene from the respective left and right human eye. This mimicked left and right eye 2D video data is often referred to as a left and right eye 2D view, respectively. From this mimicked left and right eye 2D view, depth information can be extracted given the focal length of the cameras and the baseline distance between the centers of the cameras. This depth information may be used to augment one or more of the left and/or right eye 2D views to form 3D video data.
Typically, the depth information is provided in conjunction with only one of the views as the other view can be generated from the provided view and the depth information. This technique to render the other view from the provided view and the depth information is referred to as depth-image-based rendering (DIBR). DIBR reduces the size of 3D video data considering that only one view is required and that the depth information may be encoded as a gray-scale image, which consumes considerably less space than full color 2D video data. The resulting 3D video data in DIBR may be further compressed to further reduce the size of the video data. Compression of this 3D video data may facilitate wireless delivery of this 3D video data to, for example, a wireless display.
A 3D video encoder may implement a depth map estimation module to produce 3D video data that includes a single view and depth information from the two captured views. A 3D video decoder may implement DIBR to render the additional view from the provided view and the depth information for presentation by a 3D display device. Each of the 3D video encoder and 3D video decoder may additionally perform some analysis of the 3D video data to evaluate the quality of the views. Commonly, the 3D video encoder and decoder utilize existing 2D quality metrics (2DQM) to assess the quality of each of these views and combine these 2D quality metrics in a manner that speculatively reflects the quality of the captured 3D video and the rendered 3D video data, respectively. Some of these 2D quality metrics have been augmented to consider depth map metrics to further refine the resulting quality metrics for the 3D video data. In response to this formulated pseudo-3D quality metric, the 3D video encoder may revise the generation of the depth map from the two captured views and the 3D video decoder may revise the generation of the view from the provided view and the depth information.