Recently, technologies of creating 3D display contents are attracting increased attention, corresponding to an advance of technologies of 3D image display apparatuses. When a 3D image display apparatus receives image signal through a transmission line and performs 3D display, image signal corresponding to each viewpoint is required to be transmitted to the apparatus. Especially, a multi-viewpoint stereoscopic image requires several times as much data for an image display terminal as a normal two-dimensional image. Therefore, a reduction of the amount of image data to be transmitted is one of research topics in the field of creating 3D display contents.
Based on the above background, a technology to use signal of one of two-dimensional images for realizing 3D display and signal of a depth image representing depth information of a three-dimensional object for each pixel of the two-dimensional image to generate the other of the two-dimensional images, is being studied. The other of the two-dimensional images is generated through an image processing technology using parameters of a virtual camera. Therefore, it will be referred as a virtual-viewpoint image hereafter. The use of the method to generate a virtual-viewpoint image by using signal of a two-dimensional image and signal of a depth image corresponding to the two-dimensional image through an image conversion process, has allowed an image display terminal to perform 3D display based on a reduced amount of transmitted image data.
However, it is not easy to prepare a depth image which is so accurate as to precisely reflect per-pixel depth information of signal of a two-dimensional image. When a depth image is captured by using an actual stereo camera, the resolution of the depth image is limited by inherent properties of the stereo camera, to be lower than that of signal of a two-dimensional image. For example, a depth image which has been captured with a TOF (Time of Flight) camera developed by Mesa Imaging AG (Swiss Ranger SR3100, where “Swiss Ranger” is a trademark) has the resolution of 176×144 pixels, which is far more from the resolution of signal of a two-dimensional image. Further, many 3D-CG tools sacrifice the resolution of a depth image in favor of increasing their rendering speed. In the situation that a depth image is estimated from signal of plural two-dimensional images corresponding to various viewpoints, the given depth-image signal becomes less accurate because of the influence of misditecting the corresponding pixels between the two-dimensional images. The deteriorated accuracy of the depth-image signal makes a problem that noise in the generated virtual-viewpoint image becomes inconspicuous and the image quality is deteriorated.
To solve such the problem in deterioration of image quality, for example, Japanese Unexamined Patent Application Publication (JP-A) No. 2001-175863 discloses the following apparatus: the apparatus includes a distance-information-detecting section which detects distance information from information of plural images, a smoothing section which smoothes the detected distance information with a weighting process so as to minimize a weight function defined by a curved surface given based on the distance information, a weight-information-detecting section which detects weight information required to perform weighting in the smoothing process by using an index representing the likelihood of values of the detected distance information, and an image-representing-and-interpolating section which shifts each pixel of an inputted viewpoint image based on the smoothed distance information and obtains an interpolated image at an arbitrary viewpoint.
Further, JP-A No. 2005-228134 discloses the method to extract corresponding feature points from both of a planar-image information of a three-dimensional-image data and a planar-image information of a two-dimensional-image data, to correct a displacement of an object in the three-dimensional-image data by modifying a closed geometry enclosed by plural feature points based on a correspondence of the extracted feature points, and to generate a virtual-viewpoint-image data by using depth information of the corrected three-dimensional-image data and the planar-image information of the two-dimensional-image data.
Further, Ilsoon Lim, Hocheon Wey, Dusik Park, “Depth Super-resolution for Enhanced Free-viewpoint TV”, SID Symposium Digest of Technical Papers May 2010 Volume 41, Issue 1, pp. 1268-1271, discloses the method to convert low-resolution depth-image signal into high-resolution image signal by using contour-line information of depth-image signal.
As described above, when a virtual-viewpoint image is generated by using signal of one of two-dimensional images for realizing 3D display and signal of a depth image representing per-pixel depth information of an object corresponding to the signal of two-dimensional image, deteriorated accuracy of the signal of the depth image makes noise of the generated virtual-viewpoint image inconspicuous, which deteriorates the image quality.
To solve the problem, JP-A No. 2001-175863 proposes the method to reduce errors in the signal of the depth image, which has been generated from signal of plural two-dimensional images, by using a weighting process and a smoothing process, to enhance a deteriorated image quality of an image at an interpolated viewpoint. However, this method is hardly correct errors in signal of the depth image completely. Especially, in the signal of the depth image corrected by a smoothing process, a peripheral area of a contour line of each object is obscure and does not represent the difference in depth between objects clearly. Therefore, in the signal of a virtual-viewpoint image generated based on this signal, pixels corresponding to the contour area of each object are scattered, which makes noise around the contour area inconspicuous.
Further, JP-A No. 2005-228134 proposes the method to extract feature points from both of a planar-image information of a three-dimensional-image data and a planar-image information of a two-dimensional-image data. However, this method provides feature-point extraction from depth information with low accuracy under the situation that depth information (signal of depth image) and of two-dimensional-image data are inputted in place of three-dimensional-image data and the two-dimensional-image data, because there are no color information and texture information. Therefore, accurate correspondence of the feature points is hardy obtained.
Further, Ilsoon Lim, Hocheon Wey, Dusik Park, “Depth Super-resolution for Enhanced Free-viewpoint TV”, SID Symposium Digest of Technical Papers May 2010 Volume 41, Issue 1, pp. 1268-1271, proposes the method to use just contour information of signal of a depth image, in order to obtain a virtual-viewpoint image. However, when the accuracy of the depth image is deteriorated, the contour information of the depth image originally includes errors and this method can provide a corrected depth image with much lower accuracy.