An example of a method of generating a high-resolution image using images of a plurality of frames is a multiple-frame degradation inverse transformation method (for example, see Patent Literature (PTL) 1). Generally, in the case of capturing an image of a subject over a plurality of frames with a camera, a position or a posture of the camera slightly changes from frame to frame. This causes a subpixel-level displacement of a sampling position of the subject between different images. The subpixel-level displacement mentioned here means, for instance, a displacement expressed with accuracy smaller than one pixel. Due to such a slight displacement, a pixel of the same part of the subject differs in pixel value between images. In the multiple-frame degradation inverse transformation method, the positional displacement of the subject is estimated with high accuracy that is smaller than a pixel spacing, thereby generating a high-resolution image from pixel values of a plurality of images captured for the same part of the subject.
This method includes a positional displacement amount estimation process of estimating the positional displacement of the subject with high accuracy, and a high-quality image generation process of generating a high-quality image based on the obtained positional displacement amount. These processes are described in more detail below. FIG. 20 shows an example of an image that is subject to the positional displacement amount estimation. An image 101 shown in FIG. 20(a) is a reference image which serves as a reference among a plurality of input images, whereas an image 102 shown in FIG. 20(b) is an input image other than the reference image. A building 103 and a house 104 in the reference image 101 and a building 105 and a house 106 in the other image 102 are the same subjects. A position or a posture of a camera when capturing the image 102 is different from that when capturing the reference image 101. This causes a displacement in position of a pixel representing the same part between the images 101 and 102. In the case of estimating such a positional displacement of a pixel between a plurality of images, a geometric deformation model is assumed beforehand, and a positional displacement amount is calculated for each pixel based on the deformation model. After estimating the positional displacement amount of each pixel between the plurality of input images, a pixel value of a high-resolution image is obtained from the plurality of input images based on the estimated positional displacement amount. For example, a ML (Maximum Likelihood) method, a MAP (Maximum A Posteriori) method, and the like are known as such techniques (see Non Patent Literature (NPL) 1).
In the case where a moving object is included in the subject, the moving object moves differently from a change indicated by the assumed deformation model. This causes an incorrect estimated positional displacement amount of a pixel. To generate a high-resolution image from images including such a subject, the pixel with the incorrect estimated positional displacement amount is detected, and the ML method or the MAP method is applied using pixels other than the detected pixel. A method of detecting the pixel with the incorrect estimated positional displacement amount is described in NPL 2. In the method described in NPL 2, the pixel with the incorrect positional displacement amount is detected based on a pixel difference between images.
Moreover, a method of generating a high-resolution image in consideration of a motion part in an image is described in PTL 2. In the method described in PTL 2, a maximum value and a minimum value of luminance values are calculated between an object pixel that is subject to motion determination in a target image other than a reference frame and pixels in the reference frame which surround the object pixel. The maximum value is denoted by Vmax, and the minimum value is denoted by Vmin. A luminance value of the object pixel is denoted by Vtest. A threshold is denoted by ΔVth. In the case where the following two expressions are satisfied, the object pixel is determined to have no motion. Otherwise, the object pixel is determined to have motion.Vtest>Vmin−ΔVth Vtest<Vmax+ΔVth 