There has so far been proposed a method for image resolution enhancement in which a plurality of low resolution images, obtained by capturing the same scene, and presenting positional shifts, are synthesized to generate a high resolution image.
FIG. 21 shows an example of the formulation of a conventional apparatus for image resolution enhancement in which a plurality of low resolution images are synthesized to generate a high resolution image. The conventional apparatus for image resolution enhancement includes a motion estimating means 71 and a high resolution image estimating means 72. The motion estimating means 71 receives a plurality of low resolution images as input, and estimates the motion (i.e. positional shift) of each pixel of a basis image with respect to the reference images to output the results estimated. The basis image is one of the input low resolution images, and is an image the resolution of which is to be enhanced. In the apparatus for image resolution enhancement, positional shifts (displacements) between the low resolution images need to be detected with accuracy higher than pixel-based accuracy, that is, with a sub-pixel accuracy.
The techniques for motion estimation may be classified into an area based technique and a feature based technique. As a commonplace area based technique there is a block matching method. The block matching method initially enlarges a low resolution image to a resolution with which the degree of the accuracy needed may be achieved. A movement vector is then found by block matching processing. With the processing of the block matching, a movement vector (ux, uy) of pixel accuracy, which will minimize a value ε of a difference evaluation function of a pixel value of a template block of a given size for a pixel of interest (i,j) in an image I1 and a pixel value of a given block in a reference image I2 is found. An example of the method to find ε is shown in the equations (1) and (2), in which I1(i,j) denotes a pixel value at a coordinate (i,j) of the image I1, and I2(i,j) denotes a pixel value at a coordinate (i,j) of the image I2, provided that BL denotes a block size.
                              ɛ          ⁡                      (                          ux              ,              uy                        )                          =                              ∑                          m              =                                                -                  BL                                /                2                                                    BL              /              2                                ⁢                                    ∑                              n                =                                                      -                    BL                                    /                  2                                                            BL                /                2                                      ⁢                                                                                              I                    1                                    ⁡                                      (                                                                  i                        +                        m                                            ,                                              j                        +                        n                                                              )                                                  -                                                      I                    2                                    ⁡                                      (                                                                                                                                                      i                              +                              m                              +                              ux                                                        ,                                                                                                                                                                            j                            +                            n                            +                            uy                                                                                                                )                                                                                                                        (        1        )                                          ɛ          ⁡                      (                          ux              ,              uy                        )                          =                              ∑                          m              =                                                -                  BL                                /                2                                                    BL              /              2                                ⁢                                    ∑                              n                =                                                      -                    BL                                    /                  2                                                            BL                /                2                                      ⁢                                          (                                                                                                                                                          I                            1                                                    ⁡                                                      (                                                                                          i                                +                                m                                                            ,                                                              j                                +                                n                                                                                      )                                                                          -                                                                                                                                                                          I                          2                                                ⁡                                                  (                                                                                    i                              +                              m                              +                              ux                                                        ,                                                          j                              +                              n                              +                              uy                                                                                )                                                                                                                    )                            2                                                          (        2        )            
The high resolution image estimating means 72 receives the low resolution images and the results of motion estimation, as inputs, and estimates a high resolution image from the so received information to output a so estimated high resolution image. The high resolution image estimating means 72 may be exemplified by a technique that outputs x, which will minimize an evaluation function of the maximum probability (likelihood) estimation represented by equation (3) or the maximum a-posteriori probability (MAP) estimation represented by equation (4), as being a result of estimation (see Non-Patent Document 1, for example). It should be noted that, in these equations, x denotes a high resolution image and y denotes a low resolution image. Also, A denotes an image transformation matrix, including the motions between the images, down-sampling and so forth, C a high-pass filter and λ a constant. The motions between the images, included in the image transformation matrix A, reflect the movement vector calculated using the aforementioned equations (1) and (2).
                                          g            1                    ⁡                      (            x            )                          =                              ∑                          ∀              n                                ⁢                                                                                    y                  n                                -                                                      A                    n                                    ⁢                  x                                                                    2                                              (        3        )                                                      g            2                    ⁡                      (            x            )                          =                                            ∑                              ∀                n                                      ⁢                                                                                                y                    n                                    -                                                            A                      n                                        ⁢                    x                                                                              2                                +                      λ            ⁢                                                                            C                  ⁡                                      (                    x                    )                                                                              2                                                          (        4        )            
The technique for image resolution enhancement, consisting in synthesizing the multiple low resolution images to yield a high resolution image, is generally termed super-resolution processing. In the super-resolution processing, including motion estimation processing, it is necessary to carry out motion estimation with sub-pixel accuracy, as described above. It is however difficult to estimate the motion with the sub-pixel accuracy based on the pixel-based low resolution images, and hence the result of estimation errors is unavoidably corrupted with errors. In most cases, these errors account for noisse (or artifacts) in the generated high resolution image.
However, these errors in the motion estimation are not taken into account in the conventional technique that generates a high resolution image using the aforementioned equations (3) and (4) as described in Non-Patent Document 1. The conventional technique, disclosed in Non-Patent Document 1, thus suffers a problem that noises are produced in the high resolution image generated because the technique is based on the premise that the results of motion estimation are free of errors.
On the other hand, the technique as now described has so far been proposed as a technique that may improve the image quality of the high resolution image generated by super-resolution processing (see Patent Document 1).
In Patent Document 1, weighting for each pixel of each low resolution image is determined based on a motion estimation vector between a basis image and each reference image, the temporal distance between the basis image and each reference image, such as difference in the frame numbers, and on the distance between a pixel in a high resolution image being generated (a pixel being generated) and a pixel in each low resolution image nearest to the pixel being generated (pixel-to-pixel distance). The respective pixels of the respective low resolution images are synthesized as the weighting is taken into account to generate a high resolution image.
FIG. 22 depicts a block diagram showing an example of the formulation of the apparatus for image resolution enhancement described in Patent Document 1. This apparatus for image resolution enhancement includes a motion estimating means 91, a motion distance evaluating means 92, a motion estimation evaluating means 93, a weight generating means 94 and a high resolution image estimating means 95.
The motion estimating means 91 receives a plurality of low resolution images as input to output the results of estimation of the motion between the low resolution images (motion estimation vectors). The motion distance evaluating means 92 receives the motion estimation vectors as inputs to evaluate the magnitudes of the motion estimation vectors, the temporal distance between images and the pixel-to-pixel distances. More specifically, the estimation by the motion distance evaluating means 92 is such that, the larger the values of the motion estimation vectors, the temporal distance between images and the pixel-to-pixel distances, the more likely it is that the image is deteriorated in quality.
The motion estimation evaluating means 93 integrates the above three evaluations from the motion distance evaluating means 92. The weight generating means 94 decides on the weights on the respective pixels of the respective low resolution images, and synthesizes respective low resolution images, by using the weighting, as determined by the weight generating means 94, to output a high resolution image.    Patent Document 1: JP Patent Kokai JP-A-2006-033062    Non-Patent Document 1: S. C. Park, M. K. Park and M. G. Kang, “Super-Resolution Image Reconstruction: A Technical Overview”, IEEE Signal Processing Magazine, vol. 20, no. 3, pp. 21-36, May 2003