Three dimensional (3D) displays add a third dimension to the viewing experience by providing a viewer's two eyes with different views of the scene being watched. This can be achieved by having the user wear glasses to separate two views that are displayed. However, as this may be considered inconvenient to the user, it is in many scenarios preferred to use autostereoscopic displays that use means at the display (such as lenticular lenses, or barriers) to separate views, and to send them in different directions where they individually may reach the user's eyes. For stereo displays, two views are required whereas autostereoscopic displays typically require more views (such as e.g. nine views).
However, practical displays tend to not have ideal performance and are typically not able to present perfect three dimensional images.
For example, lenticular based auto-stereoscopic 3D displays tend to suffer from out-of-screen blur. This effect is similar to what is known as depth-of-field blur in camera systems.
Also, the quality of the presented three dimensional image depends on the quality of the received image data, and specifically the three dimensional perception depends on the quality of the received depth information.
Three dimensional image information is often provided by a plurality of images corresponding to different view directions for the scene. Specifically, video content, such as films or television programs, are increasingly generated to include some 3D information. Such information can be captured using dedicated 3D cameras that capture two simultaneous images from slightly offset camera positions.
However, in many applications, the provided images may not directly correspond to the desired directions, or more images may be required. For example, for autostereoscopic displays, more than two images are required and indeed often 9-26 view images are used.
In order to generate images corresponding to different view directions, view point shifting processing may be employed. This is typically performed by a view shifting algorithm which uses an image for a single view direction together with associated depth information. However, in order to generate new view images without significant artefacts, the provided depth information must be sufficiently accurate.
Unfortunately, in many applications and use scenarios, the depth information may not be as accurate as desired. Indeed, in many scenarios depth information is generated by estimating and extracting depth values by comparing view images for different view directions.
In many applications, three dimensional scenes are captured as stereo images using two cameras at slightly different positions. Specific depth values may then be generated by estimating disparities between corresponding image objects in the two images. However, such depth extraction and estimation is problematic and tends to result in non-ideal depth values. This may again result in artefacts and a degraded three dimensional image quality.
Three dimensional image degradation and artefacts tend to be particularly significant for text image objects, such as e.g. subtitle blocks. Rather than being part of the scene, text image objects tend to be isolated objects that are not perceived as being integrated or embedded in the scene. Further, depth variations for text image objects tend to be more perceptible to the viewer. Also, in a typical application, text (such as especially subtitles) is expected to be sharp and in focus with well-defined edges. Accordingly, it is of high importance to in particular present text image objects, such as subtitle blocks, with a high image quality.
Hence, an improved approach for determining suitable depth information for text image objects would be advantageous and in particular an approach allowing increased flexibility, facilitated implementation, reduced complexity, an improved 3D experience and/or improved perceived image quality would be advantageous.