Three dimensional (3D) displays add a third dimension to the viewing experience by providing a viewer's two eyes with different views of the scene being watched. This can be achieved by having the user wear glasses to separate two views that are displayed. However, as this may be considered inconvenient to the user, it is in many scenarios preferred to use autostereoscopic displays that use means at the display (such as lenticular lenses, or barriers) to separate views, and to send them in different directions where they individually may reach the user's eyes. For stereo displays, two views are required whereas autostereoscopic displays typically require more views (such as e.g. nine views).
The quality of the presented three dimensional image depends on the quality of the received image data, and specifically the three dimensional perception depends on the quality of the received depth information. However, in many practical applications and scenarios the provided depth information tends to be suboptimal.
For example, in many embodiments it may be desirable to generate view images for new viewing directions. Whereas various algorithms are known for generating such new view images based on an image and depth information, they tend to be highly dependent on the accuracy of the provided (or derived) depth information.
Indeed, three dimensional image information is often provided by a plurality of images corresponding to different view directions for a scene. Specifically, video content, such as films or television programs, are increasingly generated to include some 3D information. Such information can be captured using dedicated 3D cameras that capture two simultaneous images from slightly offset camera positions.
However, in many applications, the provided images may not directly correspond to the desired directions, or more images may be required. For example, for autostereoscopic displays, more than two images are required, and indeed often 9-26 view images are used.
In order to generate images corresponding to different view directions, view point shifting processing may be employed. This is typically performed by a view shifting algorithm which uses an image for a single view direction together with associated depth information. However, in order to generate new view images without significant artefacts, the provided depth information must be sufficiently accurate.
Unfortunately, in many applications and use scenarios, the depth information may not be as accurate as desired. Indeed, in many scenarios depth information is generated by estimating and extracting depth values by comparing view images for different view directions.
In many applications, three dimensional scenes are captured as stereo images using two cameras at slightly different positions. Specific depth values may then be generated by estimating disparities between corresponding image objects in the two images. However, such depth extraction and estimation is problematic and tends to result in non-ideal depth values. This may again result in artefacts and a degraded three dimensional image quality.
Three dimensional image degradation and artefacts tend to be particularly significant for transitions between different image objects. Further, determination of depth information based on disparity estimation for associated images are also typically related to consideration of characteristics of image objects. Typically, disparity estimation algorithms search for correspondences between a left and right image by comparing color differences locally between a point in the left image and its corresponding point in the right image.
However, the resulting depth map is typically relatively inaccurate and in order to improve the depth map, post-filtering of the depth map is applied. The post-filtering may specifically be a bilateral color and/or luminance adaptive filter wherein the filtering kernel is adaptive to reflect the visual properties of the image. Such a bilateral filter may result in the depth map being adapted to more closely follow the characteristics of the image, and it may result in improved consistency and temporal stability of the estimated disparities, or may e.g. provide a sharper depth transition between different image objects.
FIG. 1 illustrates an example of a typical processing flow that may be used to produce a disparity map. A left-eye and a right-eye image are input to a disparity estimation block 101 which outputs a disparity map that is typically at block resolution (e.g. 4×4, 8×8 or 16×16 pixels). One of the original images is then used in a bilateral filter 103 to filter this disparity map and produce a bilaterally filtered depth map. After filtering, the disparity map is modified at pixel resolution. The filter may typically force pixels that are spatially close-by and which have the same color to have the same disparity.
Due to color/luminance similarity of objects on both sides of long object boundaries, the bilateral color/luminance-adaptive filters may cause disparity errors or artefacts close to such boundaries. As a consequence, close to an object boundary, the distance to objects in the background may be underestimated whereas the distance to objects in the foreground may be overestimated. When using the obtained disparity map for view generation, e.g. for auto-stereoscopic viewing, the boundaries may become distorted. Human observers tend to be very sensitive to such distortions and artefacts.
For example, the bilateral filter 103 of FIG. 1 works very well for boundaries between two objects which have fairly constant visual properties that are very different for the two objects. For example, it may work extremely well for two objects both having a uniform color (with no large intensity fluctuations within the object) but with a large difference in intensity.
However, for other objects, such as specifically objects that have a high degree of texture, a bilateral filter will be much less effective and indeed may introduce artefacts. Specifically, for textured objects the disparity of the foreground may often leak into the background and vice-versa. This artefact is most visible on object boundaries since we know as human observers that the shape of an object's boundary as projected in the image plane will only exhibit small changes in geometry with a small shift in camera position. This effect is illustrated in FIG. 2. As illustrated, a smooth object boundary of the perspective projection of a 3D object will not likely show high-frequency changes in geometry from a left-eye image IL to a right-eye image IR, i.e. the scenario of FIG. 2a is likely to reflect a real scene whereas the irregularities of FIG. 2b (which may be due to textured objects) is much less likely to reflect the scene.
Hence, an improved approach for determining suitable depth information would be advantageous and in particular an approach allowing increased flexibility, facilitated implementation, reduced complexity, improved depth information, reduced sensitivity to visual variations such as texture, an improved 3D experience and/or improved perceived image quality would be advantageous.