One of known techniques in the related art for aligning a plurality of images involves modeling the misalignment amount of each pixel position by using a projection conversion matrix. This technique includes determining the misalignment amounts of characteristic regions of images by performing optical flow or feature-point matching on a standard image and a reference image that are arbitrarily selected from among a plurality of acquired images, and subsequently determining the misalignment amounts of the pixel positions by estimating geometric changes in the entire screen from information about the misalignment amounts.
When determining the misalignment amount of each pixel position by using a projection conversion matrix, it is possible to properly calculate in which direction and how much the entire image has moved if an object having depth is not included. However, if an object having depth is included, such as when there is a mixture of a plurality of planes, the misalignment amounts vary greatly among different planes. Thus, it is necessary to take into account misalignment amounts that vary from region to region. If an appropriate misalignment amount cannot be applied to an appropriate region, it is not possible to perform the alignment properly, thus causing an artifact (data error) to occur in the combined image.
A known image processing apparatus disclosed in Patent Literature 1 generates a combined image while solving the aforementioned problem. In order to suppress an artifact, the image processing apparatus disclosed in Patent Literature 1 first determines a plurality of projection conversion matrices that express misalignment amounts. In a subject having a mixture of a plurality of planes, a projection conversion matrix indicates an optimal projection conversion matrix for any one of the planes. While switching between the projection conversion matrices, a plurality of images to which misalignment amounts are applied (referred to as “alignment images” hereinafter) are generated, and a difference value between each of these alignment images and the standard image is determined. A region with a large difference value is determined as being another planar region with a different misalignment amount, and a projection conversion matrix used when generating an alignment image with the smallest difference value is selected as a projection conversion matrix to be applied to that pixel. Then, by using a selection result (plane map) of all the pixels, appropriate misalignment amounts are applied to appropriate regions, thereby suppressing artifacts in the combined image.