With the development of autostereoscopic displays, rendering of three-dimensional imagery without the need for special headgear or glasses has become a reality. At present, autostereoscopic displays may already utilize different technologies; these technologies include for example flat-panel displays fitted with so-called lenticular lens foils and flat panel displays fitted with so-called parallax barriers.
Autostereoscopic displays provide facilities for the generation of different views for different viewing angles. Typically an autostereoscopic display is arranged to generate a left image for the left eye of a viewer and a right image for the right eye of a viewer. By displaying appropriate images, i.e. appropriate from the viewpoint of the left and right eye respectively, it is possible to convey an impression of a three-dimensional representation to the viewer.
A variety of techniques can be used to create and/or generate images for such autostereoscopic displays. For example multiple images can be recorded using multiple cameras the position of which corresponds with the respective viewpoint of a view for the autostereoscopic display. Alternatively, images may be computer-generated for each of the respective views, using a three-dimensional computer model.
However, in order to maintain backwards compatibility and improve on bandwidth usage it is preferred to use an input format for an autostereoscopic display that comprises an image sequence, similar to a conventional image sequence, and an accompanying depth map sequence. The autostereoscopic display then generates the respective images required, using images from the image sequence and the corresponding depth maps.
A depth map typically comprises multiple values indicative of the distance between the object(s) depicted in the image and the (possibly virtual) camera position. A depth map can be pixel-based, i.e. pixels in the depth map indicate the depth of pixels in the image. Alternatively, depth information can be object-based, i.e. the depth values are indicative of the distance of groups of pixels.
It is known to derive depth maps from stereo images obtained using e.g. a stereoscopic camera. Here the depth of a point generally refers to the distance between the object represented in a (group of) pixel(s) of the image and the plane through the point and perpendicular to the optical axis of the camera. Differences in viewpoint between each of the views of a stereoscopic camera can be used to establish depth information. The depth value of a (group of) pixel(s) in one image can be determined from the amount of translation of the position of a corresponding (group of) pixel(s) in the other image.
In fact, when the image is obtained by point projection, the translation is proportional to the amount of displacement of the camera and inversely proportional to the depth of the (group of) pixel(s). Using both views of a stereoscopic camera, a so-called disparity map can be generated indicative of the translation or displacement of a (group of) pixel(s) in either image. As a result of the aforementioned relationship this translation displacement or disparity data is in fact depth-related information. Throughout the text, depth-related information is to be understood to comprise both depth information and disparity information.
To convert stereo images into disparity data, a window-based matching approach can be applied to establish a measure of translation. Accordingly, the pixel values in a window around a pixel in a first image having a first orientation of the camera are compared to the pixel values in a window around a pixel in a second image having a second orientation. Matching here typically involves determining an aggregate of differences between the pixel values of the pixels in the matching windows.
A method for determining a depth-map using a stereoscopic image pair is disclosed in “Depth Estimation from Stereoscopic Image Pairs Assuming Piecewise Continuous Surfaces”, by L. Falkenhagen, published in Proc. of European Workshop on combined Real and Synthetic Image Processing for Broadcast and Video Production, Hamburg, November 1994.
When an image sequence and associated first depth-related information are used as input for an autostereoscopic display, multiple views have to be generated either by the autostereoscopic display or by a device providing input to the autostereoscopic display. Three-dimensional display technologies however tend to have technology-specific requirements. For example the maximum allowable translation, i.e. disparity, of pixels on an autostereoscopic display is substantially more limited than that for shutter glasses. This is attributable to the fact that the amount of crosstalk between respective views in an autostereoscopic display is substantially higher than for the respective views of shutter glasses. As a result, there is a need for providing depth-related information in a manner that can accommodate such technology-specific requirements.