A 3D display presents an image of a different view of a 3D scene for each eye. In conventional stereo systems, images for left and right views are acquired, encoded, and either stored or transmitted, before decoded and displayed. In more advanced systems, a virtual image with a different viewpoint than the existing input views can be synthesized to enable enhanced 3D features, e.g., adjustment of perceived depth for the 3D stereo display, and generation of a large number of virtual images for novel virtual views of the scene to support multiview autostereoscopic displays.
Depth image based rendering (DIBR) is a method for synthesizing the virtual images, which typically requires depth images of the scene. Depth images are likely to include noise, which can produce artifacts in the rendered images, and pixel-level depth images cannot always represent depth discontinuities that typically occur at object boundaries, which is another source of artifacts in the rendered images.
As shown in FIG. 1 prior art view synthesis includes a warping step 110, in which pixels corresponding to virtual positions are warped from reference input images 101-102, i.e., texture and depth images for reference images, based on a geometry of the scene to warped images. In the texture images, each pixel (sample) has a 2D location and intensity, which can be a color if three (RGB) channels are used. In the depth images, each pixel at a 2D location is a depth from the camera to the nearest point in the scene.
During blending 120, the warped images, for each input viewpoint, are combined into a single image. Hole filling 130 fills any remaining holes in the blended images to produce a synthesized virtual image 103. The blending is only performed when there are multiple input viewpoints from which the synthesized virtual image is generated.
The warping step can include forward warping and backward warping. With forward warping, the pixels in the reference image are mapped to a virtual image via a 3D projection. With backward warping, the pixels in the reference images are not directly mapped to the virtual image. Instead, the depths are mapped to the virtual image, and the warped depth image is then used to determine a corresponding pixel in the reference image for each pixel location in the virtual image.
Most of the pixels in the virtual image are mapped after the warping process. However, some pixels do not have any corresponding mapped depths, which are caused by disocclusion from one viewpoint to another. The pixels without mapped depths are known as holes in the virtual image.
When there are multiple input reference images, the blending is used to merge the warping results into a single image. Some holes can be filled in a complementary way during this step. That is, a hole in the left reference image can have a mapped value from the right reference image. In addition, the blending can also resolve mapping conflicts, which arise when there are different mapped values from different reference images. For example, a weighted average can be applied, or one of the mapping values is selected depending on the proximity of the virtual viewpoint location relative to the reference images.
Following the blending process, some holes remain. Hence, final hole filling is required. For example, in-painting can be used to propagate surrounding pixel values into the remaining holes. One implementation propagates the background pixels into small holes.
Prior art methods cannot deal with errors in the depth map images. Therefore, there is a need for a more accurate view synthesis to improve a quality of the synthesized image so that the synthesized image is free of boundary artifacts, and is geometrically consistent with the image characteristics that are present in the input images.