Three dimensional (3D) displays add a third dimension to the viewing experience by providing a viewer's two eyes with different views of the scene being watched. This can be achieved by having the user wear glasses to separate two views that are displayed. However, as this may be considered inconvenient to the user, it is in many scenarios preferred to use autostereoscopic displays that use means at the display (such as lenticular lenses, or barriers) to separate views, and to send them in different directions where they individually may reach the user's eyes. For stereo displays, two views are required whereas autostereoscopic displays typically require more views (such as e.g. nine views).
However, the quality of the presented three dimensional image depends on the quality of the received image data, and specifically the three dimensional perception depends on the quality of the received depth information.
Three dimensional image information is often provided by a plurality of images corresponding to different view directions for the scene. Specifically, video content, such as films or television programs, are increasingly generated to include some 3D information. Such information can be captured using dedicated 3D cameras that capture two simultaneous images from slightly offset camera positions.
However, in many applications, the provided images may not directly correspond to the desired directions, or more images may be required. For example, for autostereoscopic displays, more than two images are required, and indeed often 9-26 view images are used.
In order to generate images corresponding to different view directions, view point shifting processing may be employed. This is typically performed by a view shifting algorithm which uses an image for a single view direction together with associated depth information. However, in order to generate new view images without significant artefacts, the provided depth information must be sufficiently accurate. In particular, dense and accurate depth maps are required when rendering multi-view images for autostereoscopic displays.
Unfortunately, the depth information generated at sources tend to be suboptimal and in many applications, it is not as accurate as desired.
One way of capturing depth information when capturing a scene is to use multiple cameras at different spatial positions representing different view ranges. In such examples, depth information is generated by estimating and extracting depth values by comparing view images for different view directions.
In many applications, three dimensional scenes are captured as stereo images using two cameras at slightly different positions. Specific depth values may then be generated by estimating disparities between corresponding image objects in the two images. However, such depth extraction and estimation is problematic and tends to result in non-ideal depth values. This may again result in artefacts and a degraded three dimensional image quality.
Another approach for capturing depth information is to directly use depth cameras or range imaging cameras. Such cameras may directly estimate the depth to objects in the scene based on time-of-flight measurements for emitted (typically infrared) signals. However, such cameras are also associated with imperfections and typically provide suboptimal depth information.
Indeed, for both disparity estimation from a stereo camera setup and an infrared based depth camera, certain areas are inherently hard to estimate. For example, for disparity estimation occlusion areas exist that are visible in one camera view but not in the other, and this prevents accurate depth determination in such areas. Also, homogeneous areas that have the same or very similar visual properties in the different input images do not provide suitable basis for disparity estimation. In such areas, disparity estimates based on matching will be very uncertain. For infrared depth cameras, distant objects will result in a low infrared reflectance, and thus a low signal-to-noise ratio of the depth estimates. Also, certain types of objects, such as hair, have a particular infrared scattering behavior that results in a low back-scatter and thus in poor depth estimates from a depth camera.
For both the stereo camera system and a depth sensor there are ways to detect which disparity or depth estimates are reliable and which disparity or depth estimates are not reliable. Areas for which reliable depth estimates cannot be generated are typically filled using a weighted average of surrounding depth values where the color image is used as guidance in the interpolation/diffusion. However, such an approach may in many scenarios result in suboptimal depth estimates which further may degrade image quality and depth perception for a three dimensional image generated using such depth information.
Hence, an improved approach for processing depth information would be advantageous and in particular an approach allowing increased flexibility, facilitated implementation, reduced complexity, improved depth information, an improved three dimensional experience and/or improved perceived image quality would be advantageous.