Image processing can be used as a technique to correct camera shake on an image (moving picture). For example, based on the information on an object captured in common between two pictures obtained by temporally-continuous capturing, the technique detects a motion vector used in an MPEG technique, and estimates camera shake between frames (pictures) in order to correct the shake. The technique to use the motion vector inevitably faces limitations in terms of accuracy and calculation cost, since an algorithm of the technique characteristically detects the motion vector in an area of the pictures. Such limitations require the motion-vector-based technique to previously set an upper limit of the magnitude of the camera shake. Thus, the technique cannot detect such great shake as one included in, for example, an image captured while walking and an image captured with a finder-less camera. In other words, some camera shake is too great to be corrected by the motion-vector-based technique.
In contrast, a feature-point-based matching technique is capable of correcting the shake which the motion-vector-based technique cannot correct. The matching technique uses some of the feature points on an object found in common between two pictures obtained by temporally-continuous capturing.
Specifically described here is a matching technique using feature points (also referred to as feature point matching).
FIGS. 1A to 1D illustrate a matching technique using feature points. Hereinafter, of the two pictures, the picture captured earlier is referred to as Picture t−1, and the picture captured later is referred to as Picture t
FIG. 1A illustrates Picture t−1 and Picture t which is captured after Picture t−1. FIG. 1B shows feature points extracted from Picture t−1 and Picture t illustrated in FIG. 1A. FIG. 1C shows characteristic types of the feature points extracted from Picture t−1 and Picture t in FIG. 1B. FIG. 1D shows matching of the feature points extracted from Picture t−1 and Picture t in FIG. 1B. Here, the feature points are characteristic points to be detected by image processing and found on the picture.
Pixels having greater contrast on Picture t−1 and Picture t in FIG. 1A are selected as the feature points in FIG. 1B. As FIG. 1B shows, some feature points, found on corners and having significantly great contrast, are easily extracted in common from both of the pictures (Picture t−1 and Picture t). Meanwhile, some feature points whose contrast is not so great are not easily extracted from both of the pictures (Picture t−1 and Picture t).
The feature points in FIG. 1B include (i) feature points (feature points indicated in O in FIG. 1C) obtained from a common area of view between the pictures (Picture t−1 and Picture t) and (ii) feature points (feature points indicated in Δ in FIG. 1C) obtained from the common area of view between the pictures (Picture t−1 and Picture t) but their positions have shifted between the pictures (Picture t−1 and Picture t). Moreover, some of the feature points in FIG. 1B (feature points indicated in x in FIG. 1C) are obtained from areas not in common between the pictures (Picture t−1 and Picture t). Feature points to be matched among the feature points in FIG. 1B are the ones (feature points indicated in O in FIG. 1C) obtained from the common area of view between the pictures (Picture t−1 and Picture t).
Before the matching, however, it is impossible to find the positions and the ratio of the numbers of the feature points obtained from the common area of view between the pictures (Picture t−1 and Picture t). Hence, it is also impossible to find which feature points are obtained from the common area of view between the pictures (Picture t−1 and Picture t). Thus, a technique such as the Random Sample Consensus (RANSAC) is used to select pairs of feature points from the feature points extracted from Picture t−1 and the feature points extracted from Picture t, and calculate an evaluation value of each pair of feature points based on a preset evaluation function (FIG. 1D). The evaluation value is designed to be likely to increase when the obtained pair (hereinafter referred to as inlier) of feature points is from the common area of view between the pictures (Picture t−1 and Picture t)
Specifically, a rotation matrix is calculated from a combination of two pairs of feature points selected among the feature points extracted from Picture t−1 and the feature points extracted from Picture t. In order to recalculate to find out whether or not the calculated rotation matrix is correct, the calculated rotation matrix rotates feature points included in Picture t−1 and representing other than the feature points of the selected pairs. Then, the rotated feature points in Picture t−1 are checked whether or not the rotated feature points match the feature points in Picture t. Searches are conducted for predetermined times based on the evaluation function. Once the searches are conducted for the predetermined times, the searches are terminated, and a rotation matrix is estimated based on the inlier having the largest evaluation value at the moment of the termination. It is noted that the inlier is a feature point found in common between pictures, such as the feature points indicated in O in FIG. 1C. Such feature points are obtained mainly from a distant view area in a captured picture. Then, the shake in the pictures is corrected, using the rotation matrix estimated based on the inlier; that is, feature points in a distant view. The area in a distant view is the background in a captured picture, which shows part of the picture appearing in a long distance.
That is how typical matching is conducted based on the feature points. In other words, the feature point matching involves the operations below. First, shake developed between pictures (Picture t−1 and Picture t) is repetitively searched so that the distribution of feature points in Picture t−1 and the distribution of feature points in Picture t match with each other as much as possible. Here, the matching feature points in Picture t−1 and Picture t appear in a common area between the Picture t−1 and Picture t. Then, a shake amount between the pictures (Picture t−1 and Picture t) is estimated as the motion amount that is calculated when the distributions of the feature points obtained in the common area between Picture t−1 and Picture t match with each other at the greatest degree. The feature point matching is carried out to continuously estimate the shake amount developed between the pictures (frames) for each picture in order to correct the shake on an image (every picture) based on the estimated shake amount.
Moreover, as characteristics of a typical algorithm, the feature point matching is based on the similarity between pictures (frames) in feature point distribution. Thus, the feature point matching has an advantage over the motion-vector-based technique carried out using partial area information of a picture, since the feature point matching is low in calculation cost. Furthermore, the feature point matching is capable of matching, using feature points throughout a picture. Consequently, the matching technique can estimate a rather great amount of shake. Hence, the use of the feature point matching makes it possible to estimate a great shake included in an image captured while walking and an image captured with a finder-less camera. In other words, the feature point matching can correct camera shake which is too great to be corrected by the motion-vector-based technique.
The feature point matching, however, has a problem in that the estimation accuracy of the shake amount is not high enough. In other words, the feature point matching involves estimating a shake amount (displacement amount between pictures) based on a feature point position on the picture. This results in a problem that the estimated shake amount between frames (between pictures) is not accurate enough once a shift is made in the position of the feature point used for estimating the shake amount between the frames (between pictures).
The shift in the feature point position between the frames (between pictures) can develop when some kind of changes appear between the frames (between pictures), such as a change in lighting condition, in short distance view, and in view due to the motion of the camera. Such a shift is inevitable in capturing pictures. The feature point matching can correct most of the shake between the frames (between pictures); however, the technique leaves a little shake in the frames when the estimation accuracy decrease as a result of the shift in the feature point position. Unfortunately, the user is acutely aware of such a little shake, and takes that the picture is shaking.
Hence, it is essential to introduce techniques to improve estimation accuracy of the feature point matching. One of such techniques employs extra post-processing after the feature point matching in order to improve the accuracy. For the post-processing, two techniques are designated as candidates: One is to use a sensor, and the other is to utilize image processing. Unfortunately, some images suffer from an impact of walking, such as an image captured while walking. In using a sensor, the impact affects the sensor, which results in decrease in estimation accuracy. Taking such a situation into consideration, it is desirable to utilize the image processing to provide the post-processing. In other words, it is desirable to estimate in high accuracy a shake amount of the image by providing the extra image processing after the feature point matching, so that the image processing can interpolate the decrease in the estimation accuracy of the feature point matching. Such image processing techniques for the post-processing may include, for example, a technique to detect a motion vector and a technique to trace a specific object. The techniques are based on the fact that even though the originally developed shake is large, the amount of the shake becomes as small as an error after the correction based on the feature point matching. Thus, it is realistic to apply the motion vector detecting technique.
When the post-processing based on the image processing is utilized to compensate the decrease in the estimation accuracy of the feature point matching, significantly essential is which area in the image is used for the image processing as the post-processing. The reason why the area is essential derives from captured various objects in a picture. Here, in estimating the shake amount through the post-processing based on the image processing, an inappropriately-set area could even decrease the estimation accuracy.
Hence, selecting an appropriate area is essential when the image processing is employed to estimate the shake amount. Some techniques have been proposed to select such an appropriate area. For example, a technique disclosed in Patent Literature 1 is used when there is a specific object previously found in a picture. The technique sets, as a shake-amount estimating area, the area in which a picture feature unique to the specific object is extracted. Techniques disclosed in Patent Literatures 2 and 3, for example, involve detecting a vanishing point in a picture and setting, as a shake-amount estimating area, the area around the vanishing point.