Image processing can be used as a technique to correct shake (displacement between pictures) on an image captured by an ultra-wide optical system, such as a fisheye optical system. Based on the information on an object captured in common between two pictures obtained by temporally-continuous capturing, the technique detects a motion vector used in an MPEG technique, and estimates camera shake between frames (pictures) in order to correct the shake. The technique to use the motion vector inevitably faces limitations in terms of accuracy and calculation cost, since an algorithm of the technique characteristically detects the motion vector in an area of the pictures. Such limitations require the motion-vector-based technique to previously set an upper limit of the magnitude of the camera shake. Thus, the technique cannot detect such great shake as one included in, for example, an image captured while walking and an image captured with a finder-less camera. In other words, some camera shake is too great to be corrected by the motion-vector-based technique.
In contrast, a feature-point-based matching technique is capable of correcting the shake which the motion-vector-based technique cannot correct. The matching technique uses some of the feature points on an object found in common between two pictures obtained by temporally-continuous capturing.
Specifically described here is a matching technique using feature points (also referred to as feature point matching).
FIGS. 1A to 1D illustrate a matching technique using feature points. Hereinafter, of the two pictures, the picture captured earlier is referred to as Picture t−1, and the picture captured later is referred to as Picture t.
FIG. 1A illustrates Picture t−1 and Picture t which is captured after Picture t−1. FIG. 1B shows feature points extracted from Picture t−1 and Picture t illustrated in FIG. 1A. FIG. 1C shows characteristic types of the feature points extracted from Picture t−1 and Picture t in FIG. 1B. FIG. 1D shows matching of the feature points extracted from Picture t−1 and Picture t in FIG. 1B. Here, the feature points are characteristic points to be detected by image processing and found on the picture.
Pixels having greater contrast on Picture t−1 and Picture t in FIG. 1A are selected as the feature points in FIG. 1B. As FIG. 1B shows, some feature points, found on corners and having significantly great contrast, are easily extracted in common from both of the pictures (Picture t−1 and Picture t). Meanwhile, some feature points whose contrast is not so great are not easily extracted from both of the pictures (Picture t−1 and Picture t).
The feature points in FIG. 1B include (i) feature points (feature points indicated in O in FIG. 1C) obtained from a common area of view between the pictures (Picture t−1 and Picture t) and (ii) feature points (feature points indicated in Δ in FIG. 1C) obtained from the common area of view between the pictures (Picture t−1 and Picture t) but their positions have shifted between the pictures (Picture t−1 and Picture t). Moreover, some of the feature points in FIG. 1B (feature points indicated in x in FIG. 1C) are obtained from areas not in common between the pictures (Picture t−1 and Picture t). Feature points to be matched among the feature points in FIG. 1B are the ones (feature points indicated in O in FIG. 1C) obtained from the common area of view between the pictures (Picture t−1 and Picture t).
Before the matching, however, it is impossible to find the positions and the ratios of the feature points obtained from the common area of view between the pictures (Picture t−1 and Picture t). Hence, it is also impossible to find which feature points are obtained from the common area of view between the pictures (Picture t−1 and Picture t). Thus, a technique such as the Random Sample Consensus (RANSAC) is used to select pairs of feature points from the feature points extracted from Picture t−1 and the feature points extracted from Picture t, and calculate an evaluation value of each pair of feature points based on a preset evaluation function (FIG. 1D). The evaluation value is designed to be likely to increase when the obtained pair (hereinafter referred to as inlier) of feature points is from the common area of view between the pictures (Picture t−1 and Picture t).
Specifically, a rotation matrix is calculated from a combination of two pairs of feature points selected among the feature points extracted from Picture t−1 and the feature points extracted from Picture t. In order to recalculate to find out whether or not the calculated rotation matrix is correct, the calculated rotation matrix rotates feature points included in Picture t−1 and representing other than the feature points of the selected pairs. Then, the rotated feature points in Picture t−1 are checked whether or not the rotated feature points match the feature points in Picture t. In the case where the rotated feature points in Picture t−1 match the feature points in Picture t, the calculated rotation matrix is likely to represent a correct shake amount (degree of displacement) between the pictures. Hence, based on a degree of the matching, an evaluation function is set as the evaluation value. Searches are conducted for predetermined times based on the evaluation function. Once the searches are conducted for the predetermined times, the searches are terminated, and a rotation matrix is estimated based on the inlier having the largest evaluation value at the moment of the termination. It is noted that the inlier is a feature point found in common between pictures, such as the feature points indicated in O in FIG. 1C. Such feature points are obtained mainly from a distant view area in a captured picture. Then, the shake in the pictures is corrected, using the rotation matrix estimated based on the inlier; that is, feature points in a distant view.
That is how typical matching is conducted using the feature points. In other words, the feature point matching involves the operations below. First, displacement; namely shake, developed between pictures (Picture t−1 and Picture t) is repetitively searched so that the distribution of feature points in Picture t−1 and the distribution of feature points in Picture t match each other as much as possible. Here, the matching feature points in Picture t−1 and Picture t are found in a common area between the Picture t−1 and Picture t. Then, a shake amount between the pictures (Picture t−1 and Picture t) is estimated as the motion amount that is calculated when the distributions of the feature points obtained in the common area between Picture t−1 and Picture t match with each other at the greatest degree. The feature point matching is carried out to continuously estimate the shake amount developed between the pictures (between the frames) for each picture in order to correct the shake on an image (every picture) based on the estimated shake amount.
Moreover, as characteristics of a typical algorithm, the feature point matching is based on the similarity between pictures (frames) in feature point distribution. Thus, the feature point matching has an advantage over the motion-vector-based technique carried out using partial area information of a picture, since the feature point matching is low in calculation cost. Furthermore, the feature point matching is capable of matching, using feature points throughout a picture. Consequently, the matching technique can estimate a rather great amount of shake. Hence, the use of the feature point matching as makes it possible to estimate a great shake included in an image captured while walking and an image captured with a finder-less camera. In other words, the feature point matching can correct camera shake which is too great to be corrected by the motion-vector-based technique.
It is noted that when shake to be corrected is in an image captured by, for example, a fisheye optical system, a traveling route of incident light from outside into the lens alters, depending on the projection technique adopted to the fisheye optical system. Such an alteration requires transformation of coordinates, depending on the projection technique adopted to the fisheye optical system. This is because when a shake amount of the camera between pictures (frames) is estimated in image processing, it is necessary to know how the camera has moved with respect to the world coordinate. In other words, in order to obtain a correct camera shake amount, it is necessary to know that each pixel is obtained from which position in the world coordinate. Thus, either the motion-vector-based technique or the feature point matching is used for estimating a shake amount, the coordinate transformation should be taken into consideration before the estimation.
For example, Patent Literature 1 discloses a technique to estimate a shake amount of an image captured by a fisheye optical system, based on a motion vector for image processing.