1. Field of the Invention
The present invention relates to an image processing method, apparatus, program, and recording medium for the same, and more particularly, to interpolation, enlargement and coding processes for generating (restoring) high-image-quality information which does not exist before the processes are performed, from low-image-quality information.
2. Description of the Related Art
As a method of generating an output image having a high resolution (high image quality) from an input image having a low resolution (low image quality), various techniques have been proposed in which a pair of a low-resolution image and a high-resolution image is previously learned for content of many images, a relationship of conversion from low-resolution information to high-resolution information is obtained, and this conversion relationship is used to generate (restore) an image including the high-resolution information from the input image having the low resolution. For example, a Bicubic method known as a cubic interpolation method is an approach which provides a natural image with less information loss in comparison with a nearest neighbor method, a bilinear method and the like, while the Bicubic method causes image blurring.
On the other hand, a method has been known in which a correspondence relationship between the low-resolution image and the high-resolution image is previously learned, and also stored as the pair of the low-resolution image and the high-resolution image for each patch (a divided region of each predetermined number of pixels), in a database thereof, and the low-resolution image is converted into the high-resolution image with reference to this database of the patch pair of the low-resolution image and the high-resolution image. This approach requires a processing time proportional to the number of databases×a filter size×the number of pixels. If a high-resolution image of about 640×480 pixels is obtained, the number of databases becomes about a hundred thousand. Thus, a storage capacity required for storing the databases increases, and also, a conversion process requires a long processing time.
As an approach which solves the above described problem, the next approach has been known. FIGS. 22A and 22B are diagrams illustrating a schematic configuration of an image processing method for solving the above described problem. The image processing method illustrated in the same figures applies a reduction (thinning) process to a high-resolution image x in a learning step illustrated in FIG. 22A, and a low-resolution image z which makes a pair with the high-resolution image x is obtained. Then, a cluster vector Y is generated as a difference between a noted pixel and a vector filter in the low-resolution image z.
In nine pixels of 3×3 size illustrated in FIG. 23A, one pixel with a data value of “14” positioned in the center is the noted pixel, and eight pixels surrounding the noted pixel are the vector filter. The difference between the noted pixel and the vector filter illustrated in FIG. 23A is calculated (a differential process and a normalization process), and thereby, an eight-dimensional cluster vector Y illustrated in FIG. 23B is generated. Similarly, the difference between the noted pixel and the vector filter illustrated in FIG. 23C is calculated, and thereby, a cluster vector Y illustrated in FIG. 23D is generated. This cluster vector Y is generated for each pixel in a low-resolution pixel (a pair of the low-resolution pixel and a high-resolution pixel).
When as many cluster vectors Y as the number of pixels in the low-resolution image are generated, a representative vector μi (i is a class number) is generated as a representative of the cluster vectors Y. In other words, a smaller number of the representative vectors μi than the number of pixels in the low-resolution image are generated from as many cluster vectors Y as the number of pixels in the low-resolution image. For example, about 100 representative vectors μi are generated for a hundred thousand cluster vectors. Each pixel in the low-resolution image is classified (classification) into any of i classes by this representative vector μi.
Approaches for generating the representative vector μi include, for example, an approach in which an EM algorithm is applied to a GMM (Gaussian Mixture Model). In this way, significant reduction in the number of databases is realized by obtaining the representative vector μi in which the cluster vectors Y is converted into a representative value. In addition, interpolation filter coefficients Ai, Bi and μi (i is the class number) are generated from the high-resolution image x, the low-resolution image z, the cluster vector Y and the representative vector μi.
The interpolation filter coefficients Ai, Bi and μi generated in the learning step are used in a restoration step illustrated in FIG. 22B. Here, the interpolation filter coefficient Ai requires a size larger than the cluster vector Y in order to restore frequency components from a low-frequency component to a medium-frequency component in the low-resolution image. For example, as the interpolation filter coefficient Ai, a size of about 5×5 (a vector of about 25 dimensions) with the noted pixel in the center is required.
On the other hand, in the restoration step illustrated in FIG. 22B, the cluster vector Y is generated for each pixel, from a high-frequency component of the inputted low-resolution image z, and also, interpolation filter coefficients Ai, Bi and μi as well as the representative vector μi are set depending on the inputted low-resolution image z and the cluster vector Y. In other words, an interpolation calculation ((Ai×z)+Bi) using the interpolation filter coefficients Ai and Bi is weighted by a weight (wi((μi−Y),πi)) determined using the cluster vector Y, the representative vector μi and the interpolation filter coefficient πi, and all calculation results obtained for the respective classes are added to generate a high-frequency image as an output.
In other words, the inputted low-resolution image z is multiplied by a compound matrix (filter coefficient) Ai, supplemented with a bias Bi, and also multiplied by the weight depending on the difference (μi−Y) between the representative vector μi and the cluster vector Y for each input pixel. This calculation is repeated for i, and a weighted sum of all the classes is calculated and outputted as the high-frequency image. The above described interpolation filtering process can be represented as Σ((Ai·z)+Bi)·wi((μi−Y),πi). It should be noted that πi means a contribution rate (a contribution rate for each class) in the normalization process, and Σπi=1 (a sum of π of all the classes is 1).
“OPTIMAL IMAGE SCALING USING PIXEL CLASSIFICATION” (https://engineering.purdue.edu/−bouman/publications/pdf/icip01atkins.pdf, C. Brian Atkins, Charles A. Bouman, Jan P. Allebach) discloses an image processing method similar to the above described approach. Here, an outline of the image processing method according to “OPTIMAL IMAGE SCALING USING PIXEL CLASSIFICATION” (https://engineering.purdue.edu/−bouman/publications/pdf/icip01atkins.pdf, C. Brian Atkins, Charles A. Bouman, Jan P. Allebach) will be described. In the image processing method according to “OPTIMAL IMAGE SCALING USING PIXEL CLASSIFICATION” (https://engineering.purdue.edu/−bouman/publications/pdf/icip01atkins.pdf, C. Brian Atkins, Charles A. Bouman, Jan P. Allebach), an interpolation calculation is performed with a combination of representative values of the pair (patch pair) of the high-resolution image x (High-resolution pixels x) and the low-resolution image z (Low-resolution image), information on the high-resolution image x which does not exist in the low-resolution image z is interpolated, and thus, the low-resolution image z is converted into the high-resolution image x.
The input low-resolution image (input pixel) illustrated in FIG. 24 is placed in the center of a window of 5×5 size, and the window of the 5×5 size which is peripheral pixels around this input pixel is vectorized to generate an observation vector z of 1×25 columns. This observation vector z is used for the calculation for the entire frequency domain from the low-frequency component to the high-frequency component.
On the other hand, a projection operator f is applied to the observation vector z to generate a cluster vector y. The cluster vector y is generated for each input pixel, and is for converting the input pixel into a feature so as to be associated with a context class (classify). As described above, the cluster vector y is generated as an eight-dimensional vector from a window of 3×3 size with the noted pixel of the inputted low-resolution image in the center (see FIGS. 23A to 23D). Furthermore, in a training step, weighting wi for each context class is determined based on a previously obtained distribution parameter θ.
On the other hand, according to an offline training approach, the above described distribution parameter θ and an interpolation filter coefficient Ψ are previously obtained. The high-resolution image is generated by filtering (linear filter) in which this interpolation filter coefficient Ψ is applied. The filtering with this interpolation filter coefficient Ψ can be represented as x=Σ(wi×(A×z+β)). It should be noted that A is an interpolation function, β is a bias vector, and A and β are equivalent to Ai and Bi as described above, respectively. Also in an approach disclosed in “OPTIMAL IMAGE SCALING USING PIXEL CLASSIFICATION” (https://engineering.purdue.edu/−bouman/publications/pdf/icip01atkins.pdf, C. Brian Atkins, Charles A. Bouman, Jan P. Allebach), the significant reduction in the number of databases is realized by converting the cluster vectors y obtained for the respective input pixels, into representative values.
Japanese Patent No. 3724008 discloses an apparatus which converts normal-resolution image information into high-resolution image information, and outputs the high-resolution image information. The apparatus according to Japanese Patent No. 3724008 is configured so that coefficient data used for estimating a video signal of a high-definition system (HD image) corresponding to a video signal of an NTSC system (SD data) is previously obtained by learning for each class, and then, an SD image is interpolated based on this coefficient data, and thus, data which is more approximate to actual HD data is obtained.
Moreover, Japanese Patent Application Laid-Open No. 2009-524861 discloses an apparatus and a method for improving a spatial resolution of a digital image. The apparatus and the method are configured so that an interpolation filter is used to classify and interpolate input pixels in a low-resolution image, and the classification and updating of the interpolation filter are repeated until a predetermined convergence condition is satisfied. Paragraph [0044] of Japanese Patent Application Laid-Open No. 2009-524861 describes that a corresponding low-resolution image is obtained from a high-resolution image in order to obtain a training image.
Japanese Patent Application Laid-Open No. 2009-253765 discloses an image processing system which models a feature region of an input image, and reduces a calculation time required for generating a high-quality image from a low-quality image. Paragraph [0124] of Japanese Patent Application Laid-Open No. 2009-253765 describes that a high-frequency component is extracted from a sample image, and the high-frequency component of an image of an object is stored.
Japanese Patent Application Laid-Open No. 2009-188891 discloses an image processing apparatus in which, when a still image is extracted from a moving image, it is determined whether effective analysis between images is forward analysis or reverse analysis, and a predetermined super resolution enlargement process is applied to the extracted still image based on a result of the determination. Paragraph [0130] of Japanese Patent Application Laid-Open No. 2009-188891 describes that a high-frequency component is extracted from supplied image information.