Three dimensional image rendering and processing has become increasingly prevalent in recent years. This is to a large extent due to the development of three dimensional (3D) displays which add a third dimension to the viewing experience by providing a viewer's two eyes with different views of the scene being watched. This can be achieved by having the user wear glasses to separate two views that are displayed. However, as this may be considered inconvenient to the user, it is in many scenarios preferred to use autostereoscopic displays that use means at the display (such as lenticular lenses, or barriers) to separate views, and to send them in different directions where they individually may reach the user's eyes. For stereo displays, two views are required whereas autostereoscopic displays typically require more views (such as e.g. nine views).
The quality of the presented three dimensional image depends on the quality of the received image data, and specifically the three dimensional perception depends on the quality of the received depth information.
Three dimensional image information is often provided by a plurality of images corresponding to different view directions for the scene. Specifically, video content, such as films or television programs, are increasingly generated to include some 3D information. Such information can be captured using dedicated 3D cameras that capture two simultaneous images from slightly offset camera positions.
However, in many applications, the provided images may not directly correspond to the desired directions, or more images may be required. For example, for autostereoscopic displays, more than two images are required and indeed often 9-26 view images are used.
In order to generate images corresponding to different view directions, view point shifting processing may be employed. This is typically performed by a view shifting algorithm that uses an image for a single view direction together with associated depth information. However, in order to generate new view images without significant artefacts, the provided depth information must be sufficiently accurate.
Unfortunately, in many applications and use scenarios, the depth information may not be as accurate as desired.
Whereas three dimensional imaging based on conventional two dimensional images may be possible using various depth estimation techniques, these tend to be complex and inaccurate and often require substantial human input. However, increasingly when content is captured, depth information is also being captured. For example, when filming or video recording a scene, depth is also recorded in order to generate a combined output reflecting both the visual image and the depth.
Such capturing of depth is typically performed using depth sensors arranged to estimate the depth characteristics of the scene. Various depth sensors are known.
An often used approach is to use a passive depth sensor in the form of stereo camera. Such a stereo camera may simply record two images corresponding to two slightly different view directions. In this way, a three dimensional scene may be captured as stereo images using two cameras at slightly different positions. Specific depth values may then be generated by estimating disparities between corresponding image objects in the two images.
Another approach is to use an active depth sensor. Specifically, active depth sensors are known which include infrared light emitters which project an infrared light pattern on the scene being recorded. An infrared camera may then capture an infrared image and detect distortions in the expected pattern. Based on these distortions, the depth sensor may generate depth information.
In yet another example, an active depth sensor may comprise a light emitter emitting infrared light in different directions. The time of arrival for reflected light in the different directions may be detected and used to derive depth information.
However, although such depth sensors may often improve the quality of the generated depth information compared to estimation based on a single two-dimensional image, they tend to also have suboptimal performance. Typically, the generated depth information is not optimal in all scenarios and e.g. generated depth maps may comprise some inaccurate or erroneous depth values. This may again result in artefacts and a degraded three dimensional image quality when image processing or rendering is performed based on this depth information.
Further, improving the depth estimates may often necessitate dedicated and specifically modified depth sensors to be used. However, this is inflexible and increases cost in comparison to the use of standard or off-the-shelf depth sensors.
Accordingly, an improved approach for determining suitable depth would be advantageous and in particular an approach allowing increased flexibility, facilitated implementation, reduced complexity, an improved 3D experience, improved resulting perceived image quality, improved suitability for the use of standard functionality, and/or improved depth information being generated would be advantageous.