In recent years the increasing interest in providing a three dimensional (3D) perception of images and video content has led to the introduction of 3D displays that can provide a 3D effect by providing different views to the two eyes of a viewer. Such displays include time sequential stereoscopic displays which project images to the right and left eyes in a time sequential fashion. The viewer wears glasses comprising LCD elements that alternatively block the light to the left and right eye thereby ensuring that each eye sees only the image for that eye. Another type of display is an autostereoscopic display which does not require the viewer to wear glasses. Such a display typically renders a relatively large number of images in different view cones. For example, typically autostereoscopic displays may implement nine different view cones each of which corresponds to a different set of viewpoints. Such displays thus present nine different images simultaneously.
As another example, a 3D effect may be achieved from a conventional two-dimensional display implementing motion parallax function. Such displays track the movement of the user and adapt the presented image accordingly. In a 3D environment, the movement of a viewer's head results in a relative perspective movement of close objects by a relatively large amount whereas objects further back will move progressively less, and indeed objects at an infinite depth will not move. Therefore, by providing a relative movement of different image objects on the two dimensional display based on the viewer's head movement a perceptible 3D effect can be achieved.
In order to fulfill the desire for 3D image effects, content is created to include data that describes 3D aspects of the captured scene. For example, for computer generated graphics, a three dimensional model can be developed and used to calculate the image from a given viewing position. Such an approach is for example frequently used for computer games which provide a three dimensional effect.
As another example, video content, such as films or television programs, are increasingly generated to include some 3D information. Such information can be captured using dedicated 3D cameras that capture two simultaneous images from slightly offset camera positions. In some cases, more simultaneous images may be captured from further offset positions. For example, nine cameras offset relative to each other could be used to generate images corresponding to the nine viewpoints of a nine view cone autostereoscopic display.
However, a significant problem is that the additional information results in substantially increased amounts of data, which is impractical for the distribution, communication, processing and storage of the video data. Accordingly, the efficient encoding of 3D information is critical. Therefore, efficient 3D image and video encoding formats have been developed which may reduce the required data rate substantially.
One such encoding format encodes a left eye image and a right eye image for a given viewer position. The coding efficiency may be increased by encoding the two images relative to each other. E.g. inter-image prediction may be used or one image may simply be encoded as the difference to the other image.
Another encoding format provides one or two images together with depth information that indicates a depth of the relative image objects. This encoding may further be supplemented by occlusion information that provides information of image objects which are occluded by other image elements further in the foreground.
The encoding formats allow a high quality rendering of the directly encoded images, i.e. they allow high quality rendering of images corresponding to the viewpoint for which the image data is encoded. The encoding format furthermore allows an image processing unit to generate images for viewpoints that are displaced relative to the viewpoint of the captured images. Similarly, image objects may be shifted in the image (or images) based on depth information provided with the image data. Further, areas not represented by the image may be filled in using occlusion information if such information is available.
Thus, based on the received data, an image processing unit may generate images for other viewpoints. For example, an image processing unit may generate views to represent motion parallax when a user moves his head, or may generate views for all nine viewpoints of a nine-view cone autostereoscopic image. Such processing allows images to be generated which may enable the viewer to e.g. “look around” objects.
However, a problem is that images for other viewpoints than the viewpoint of the originally encoded images typically have degraded quality relative to the originally encoded images, i.e. relative to the images that were generated for the original camera position. For example, the relative offset of image objects may only be approximately correct, or occlusion information may simply not be available for image objects that are de-occluded as a consequence of the change of viewpoint. In fact, it has been found that the perceived quality degradation increases non-linearly with the displacement of the viewpoint. Thus, a doubling of the viewpoint offset is typically perceived to result in substantially more than a doubling of the quality degradation.
Hence, an improved approach would be advantageous and in particular an approach allowing increased flexibility, increased perceived image quality, an improved spatial experience, improved viewpoint adaptation, and/or improved performance would be advantageous.