1. Field of the Invention
The present invention relates to an image processing technique, and more particularly, to a technique for achieving image conversion, such as image enlargement, illumination conversion, viewpoint conversion, and the like.
2. Description of the Related Art
With the advent of digital image apparatuses and digital networks, different kinds of image apparatuses can be easily connected to each other, and the degree of freedom of image exchange is increased. For example, an image captured by a digital still camera is output to a printer, is published on a network, or is viewed on a home television. In other words, an environment has been developed under which a user can freely handle an image without a limitation due to a difference between systems.
On the other hand, in order to achieve such an environment, each system needs to support various image formats, and perform a high level of image format conversion. For example, an up-converter (conversion apparatus for increasing the number of pixels and the number of lines) and a down-converter (conversion apparatus for decreasing the number of pixels and the number of lines) are required to perform image size conversion which frequently occurs. For example, when printing is performed with a resolution of 600 dpi on A4 paper (297 mm×210 mm), data of 7128 pixels×5040 lines is required. However, since most digital still cameras have a resolution lower than this resolution, an up-converter is required. Also, an image published on a network needs to be converted into an image size corresponding to an output device every time the output device is determined. Regarding home televisions, since digital terrestrial broadcasting services have been started, conventional standard televisions and High Definition (HD) televisions coexist, so that image -size conversion is frequently performed.
In order to enlarge an image, image data which does not exist when the image is captured needs to be newly created. To this end, various techniques have been proposed. For example, techniques employing interpolation, such as the bi-linear technique, the bi-cubic method, and the like, are generally used (Non-patent Document 1). However, when interpolation is used, only intermediate values of sampling data can be generated, so that the sharpness of an edge or the like is deteriorated, likely resulting in a blurred image. Therefore, a technique has been disclosed in which an interpolated image is used as an initially enlarged image, and thereafter, an edge portion is extracted and only the edge portion is emphasized (Patent Document 1, Non-patent Document 2). However, it is difficult to separate an edge portion from noise, so that noise is likely to be emphasized along with an edge portion, resulting in a deterioration in image quality.
Therefore, there is a learning technique of performing image enlargement while suppressing a deterioration in image quality. Specifically, a high-resolution image corresponding to an enlarged image is previously captured using a high-definition camera or the like, and a low-resolution image is created from the high-resolution image. The creation of a low-resolution image is typically performed using a method of performing sub-sampling using a low-pass filter. A large number of such sets of a low-resolution image and a high-resolution image are prepared, and a relationship therebetween is learnt as an image enlargement technique. Therefore, in the learning technique, the above-described emphasis technique does not exist, and therefore, it is possible to achieve image enlargement with a relatively less deterioration in image quality.
As an example of the learning technique, a technique of statistically performing learning based on the assumption that a relationship in luminance value between adjacent pixels is determined as a Markov process, has been disclosed (Non-patent Document 3). Also, a technique of calculating a feature vector for each pixel in a conversion pair from a low resolution to a high resolution, and generating an enlarged image based on the degree of matching with a feature vector of an input pixel and the consistency with a peripheral, has been disclosed (Non-patent Document 4).
The learning technique is also utilized for conversion of an illumination direction, and the like, as well as image enlargement (Non-patent Document 5). Non-patent Document 5 discloses a technique of illuminating a plurality of objects having different textures (unevenness, a pattern, or the like on an object surface) from a plurality of directions to create learning data, and converting an illumination direction while keeping the sense of texture.
Patent Document 1: U.S. Pat. No. 5,717,789 (FIG. 5)
Non-patent Document 1: Shinji Araya, “Clear Commentary on 3D Computer Graphics”, Kyoritsu Shuppan, Sep. 25, 2003, pp. 144-145
Non-patent Document 2: Makoto Nakashizuka, et al., “Image Resolution Enhancement on Multiscale luminance Gradient Planes”, The Journal of The Institute of Electronics, Information and Communication Engineers, D-II, Vol. J81-D-II, No. 10, pp. 2249-2258, October 1998
Non-patent Document 3: Freeman, et al., “Learning Low-Level Vision”, International Journal of Computer Vision, 40(1), pp. 25-47, 2000
Non-patent Document 4: Hertzmann, et al., “Image Analogies”, SIGGRAPH 2001, Proceedings, pp. 327-340, 2001
Non-patent Document 5: Malik, et al., “Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons”, International Journal of Computer Vision, 43(1), pp. 29-44, 2001
However, in conventional techniques, there are the following problems.
In the above-described learning techniques, since an enlarged image is selected from images used for learning, an enlargement method depends on learning data. A similar problem arises not only in image enlargement, but also in other image conversions, such as conversion of an illumination direction and the like.
Also, since a large number of sets of a low-resolution image and a high-resolution image need to be prepared, a large number of steps for a preprocess of performing learning are required. In addition, since image data for learning needs to be created from actually captured images, image data may be spontaneously biased, which is not preferable for image conversion with a high degree of freedom.