Three dimensional (3D) displays are receiving increasing interest, and significant research in how to provide three dimensional perception to a viewer is being undertaken. 3D displays add a third dimension to the viewing experience by providing a viewer's two eyes with different views of the scene being watched. This can be achieved by having the user wear glasses to separate two views that are displayed. However, as this is relatively inconvenient to the user, it is in many scenarios desirable to use autostereoscopic displays that directly generate different views and projects them to the eyes of the user. Indeed, for some time, various companies have actively been developing autostereoscopic displays suitable for rendering three-dimensional imagery. Autostereoscopic devices can present viewers with a 3D impression without the need for special headgear and/or glasses.
Autostereoscopic displays generally provide different views for different viewing angles. In this manner, a first image can be generated for the left eye and a second image for the right eye of a viewer. By displaying appropriate images, i.e. appropriate from the viewpoint of the left and right eye respectively, it is possible to convey a 3D impression to the viewer.
Autostereoscopic displays tend to use means, such as lenticular lenses or parallax barriers/barrier masks, to separate views and to send them in different directions such that they individually reach the user's eyes. For stereo displays, two views are required but most autostereoscopic displays typically utilize more views (such as e.g. nine views).
In order to fulfill the desire for 3D image effects, content is created to include data that describes 3D aspects of the captured scene. For example, for computer generated graphics, a three dimensional model can be developed and used to calculate the image from a given viewing position. Such an approach is for example frequently used for computer games that provide a 3D effect.
As another example, video content, such as films or television programs, are increasingly generated to include some 3D information. Such information can be captured using dedicated 3D cameras that capture two simultaneous images from slightly offset camera positions thereby directly generating stereo images, or may e.g. be captured by cameras that are also capable of capturing depth.
Typically, autostereoscopic displays produce “cones” of views where each cone contains multiple views that correspond to different viewing angles of a scene. The viewing angle difference between adjacent (or in some cases further displaced) views are generated to correspond to the viewing angle difference between a user's right and left eye. Accordingly, a viewer whose left and right eye see two appropriate views will perceive a 3D effect. An example of such a system wherein nine different views are generated in a viewing cone is illustrated in FIG. 1.
Many autostereoscopic displays are capable of producing a large number of views. For example, autostereoscopic displays which produce nine views are not uncommon. Such displays are e.g. suitable for multi-viewer scenarios where several viewers can watch the display at the same time and all experience the 3D effect. Displays with even higher number of views have also been developed, including for example displays that can provide e.g. 28 different views. Such displays may often use relatively narrow view cones resulting in the viewer's eyes receiving light from a plurality of views simultaneously. Also, the left and right eyes will typically be positioned in views that are not adjacent (as in the example of FIG. 1).
FIG. 2 illustrates an example of the formation of a 3D pixel (with three color channels) from multiple sub-pixels. In the example, w is the horizontal sub-pixel pitch, h is the vertical sub-pixel pitch, N is the average number of sub-pixels per single-colored patch. The lenticular lens is slanted by s=tan θ, and the pitch measured in horizontal direction is p in units of sub-pixel pitch. Within the 3D pixel, thick lines indicate separation between patches of different colors and thin lines indicate separation between sub-pixels. Another useful quantity is the sub-pixel aspect ratio: a=w/h. Then N=a/s. For the common slant ⅙ lens on RGB-striped pattern, a=⅓ and s=⅙, so N=2.
As for conventional 2D displays, image quality is of the utmost importance for a 3D display in most applications, and especially is very important for the consumer market, such as e.g. for 3D televisions or monitors. However, the representation of different views provides additional complications and potential image degradations.
Specifically, in order to keep the amount of data to distribute and process to a manageable amount, 3D content is typically provided in a format based on a very low number of 2D images. For example, 3D image data may be provided by a single 2D image corresponding to one viewing angle supported by a depth map indicating a depth for each pixel. Another common representation provides two 2D images, with one being intended for the viewer's left eye and the other for the viewer's right eye.
Thus, the three dimensional image information is typically provided in a compressed format, and typically is represented by a relatively low number of different view point images. In order to provide view images for each individual view direction of the autostereoscopic display, it is accordingly necessary to apply substantial processing to the received data. Specifically, in order to generate the view images for the autostereoscopic display, it is typically necessary to perform rendering and also (3D) image processing. For example, view point shifting based on depth information is often necessary to generate additional views.
As another example, in some applications, 3D image data is generated directly by evaluating a 3D model, such as for example by generating the image data based on a 3D model. The model may for example conform to the OpenGL graphics standard and may contain triangles and/or meshes in combination with textures. Thus, in some applications, an image for a specific viewing angle may be generated by evaluating a 3D graphical model of a three-dimensional scene.
When using an autostereoscopic display for 3D presentation, a relatively large number of individual views corresponding to different viewing angles are projected, such as typically 9, 15 or 29 views. Accordingly, a large number of images corresponding to different viewing angles must be generated. This can be achieved by performing a 3D processing of the input 3D image data for each view. E.g., for each view, an input image corresponding to a default viewing angle is processed to generate the corresponding view for the desired viewing angle. This 3D processing in particular includes disparity shifting of pixels depending on their depth, filling in de-occluded areas etc.
Similarly, in some system based on evaluating a 3D model, the model may be evaluated for each view to generate the image corresponding to that viewing angle.
Thus, in some systems, 3D processing is performed to generate an image for each view of the autostereoscopic display. However, a disadvantage of such an approach is that it is very computationally demanding and requires a high computational resource. This may be particularly critical for 3D images that are part of e.g. a video sequence or game requiring real time processing.
In order to reduce the computational complexity, it has been proposed to only render a subset of the images that are required for the autostereoscopic display. For example, the 3D processing may generate only 8 images for a 15 view image. In such systems, the image for each view may be generated by selecting the generated image that corresponds to the closest viewing angle to that of the view. In some systems, a given view image may be generated by a simple linear interpolation between e.g. the two rendered images surrounding the current view image (i.e. corresponding to the images having the closest viewing angle in each direction).
Indeed, currently, when content is rendered for an autostereoscopic 3D display, the typical approach is to render a fixed number of images corresponding to fixed viewing angles. Subsequently, for each pixel of the autostereoscopic display the required output viewing angle is determined, and the pixel is then generated by selecting the corresponding pixel in the rendered image for the nearest viewing angle, or by a weighted summation of the pixels of the images with the nearest viewing angles.
However, although such an approach may reduce the overall computational resource usage, it also tends to introduce a number of disadvantages. In particular, the approach tends to reduce the perceived image quality and introduces a number of artefacts.
For example, if a relatively low number of images are rendered by the 3D processing, edges in the perceived 3D image will tend to exhibit a ghosting effect (e.g. multiple slightly displaced copies of the edge of an object will often be experienced). Also, if a user moves relative to the display such that the eyes of the viewer move through multiple views of the view cone, a relatively uneven experience will result where image objects may appear to jump or jitter in position as the viewer's eyes move between the views.
Therefore, in order to produce a high image quality, it is desirable that a large number of images are rendered. However, this increases complexity and resource use, and thus there is an inherent trade-off between quality and complexity which tends to be suboptimal in prior art systems.
Hence, an improved approach for generating view images would be advantageous, and, in particular, an approach allowing increased flexibility, improved image quality, reduced complexity, reduced resource demand, an improved trade-off between complexity and perceived image quality, and/or improved performance would be advantageous.