The present invention relates to an image processing apparatus and an image processing method for acquiring information on geometric deformation between frame images generated by the relative motion between an image-pickup apparatus and an object when a moving image is captured.
For an image-pickup apparatus, such as a video camera, various technology methods for correcting image shaking caused by camera shaking such as hand jiggling, have been proposed. Particularly, a so-called electronic image stabilization for detecting motion information from an image to electronically correct the image shaking is essential as a means for realizing image stabilization technology at a low cost.
Further, there are also a variety of technology methods for detecting motions from the image. As one of the methods, Japanese Patent Laid-Open No. 2005-269419 proposes an image processing method for detecting a plurality of motion vectors between frame images constituting a moving image to acquire the motion information (representative motion) representing a whole image from the motion vectors.
The term ‘motion vector’ as used herein is a vector amount representing the magnitude and direction of displacement of a local feature point (also referred to as an attention point, which mostly corresponds to an attention pixel) in the image between frame images adjacent to each other or frame images having an interval of one frame or more therebetween. The motion vector is also referred to as a local motion.
Further, the term ‘motion’ as used herein is an amount representing the geometrical deformation (geometric deformation) between frame images caused by a relative displacement and the like between the image-pickup apparatus and the object. In other words, it represents a change in the comprehensive appearance of the image, and is also referred to as a ‘global motion’.
The classification of motions of the geometric deformation depend on the nature of the relative displacement. The motions of the geometric deformation include translation (horizontal and vertical), scaling, rotation, shear and foreshortening (horizontal and vertical). When the object is a single, rigid body, all of the changes between the images caused by a relative positional change between the image-pickup apparatus and the object correspond to the aforementioned motions. Accordingly, a local motion vector representing the amount of a local displacement generated at each of the portions in the image can be entirely canceled by performing image correction to cancel the motion.
The local motion vector also represents a value that changes with respect to the motion depending on positions. The motion can also be referred to as the value that is acquired by normalizing the motion vector depending on positions.
Means for correcting the images disclosed in Japanese Patent Laid-Open No. 2005-269419 will be described. The means is constituted by roughly four steps.
First, in a first step, as shown in FIG. 2, a plurality of paired motion detection points (feature points) 32 that can be used to detect local motions are located concentrically so as to be located in point symmetry with respect to a center of an image 31. Black circles denote the feature points 32.
Next, in a second step, the local motion vectors are calculated for each of the feature points 32.
Then, in a third step, as shown in FIG. 3, calculations between the local motion vectors of a certain feature point 82 in an image 81 and those of a feature point 83 that is in point symmetry with the feature point 82 is performed. More specifically, first, between the local motion vectors of the feature points 82 and 83 in circular symmetry with each other, the local motion vector is divided into components oriented in the same directions and in opposite directions. Next, the opposite direction component is divided into a radial direction component and a tangential direction component.
Here, a coordinate change generated by the translation, scaling and rotation is to be expressed as the following expression (1).
                              [                                                                      x                  ′                                                                                                      y                  ′                                                              ]                =                                            [                                                                    a                                                                              -                      b                                                                                                            b                                                        a                                                              ]                        ⁡                          [                                                                    x                                                                                        y                                                              ]                                +                      [                                                            c                                                                              d                                                      ]                                              (        1        )            
In this case, the same direction components, that is motion components of the translation (horizontal and vertical), correspond to ‘c’ and ‘d’. Further, the radial direction component of the opposite direction component corresponds to ‘a’, and the tangential direction component corresponds to ‘b’. The opposite direction component is the motion component in which the rotation and scaling are mixed. As described above, parameters ‘a’, ‘b’, ‘c’ and ‘d’ relating to motions of the translation, scaling and rotation can be acquired from each of the paired feature points in point symmetry.
Parameters ‘a’, ‘b’, ‘c’ and ‘d’ can be converted into parameters of pure translation, scaling and rotation by simple processing. These motions are referred to as local motions (information) since they are calculated from the local motion vectors.
However, sets of the parameters corresponding to the plurality of paired feature points acquired as described above have variation due to the influences of errors, discretization and matching accuracy.
In a fourth step, for each of the parameters ‘a’, ‘b’, ‘c’ and ‘d’, each being plurally obtained and respectively relating to the local motions of the translation, scaling and rotation, one set of representative parameters ‘A’, ‘B’, ‘C’ and ‘D’, each corresponding respectively to a centroid of the variation of the parameters ‘a’, ‘b’, ‘c’ and ‘d’, is acquired. In this step, as shown in FIGS. 4A and 4B, a processing part performs a convolution on a frequency distribution (histogram) 51 of the parameters with a Gaussian function 53 and, as a representative value, selects a parameter that has the largest integrated value among a frequency distribution 52 on which the convolution has been performed. By the processing described above, the parameters ‘A’, ‘B’, ‘C’ and ‘D’ of the representative motions relating to the translation, scaling and rotation can be acquired.
By this method, translation in the horizontal and vertical directions, scaling and rotation between the frame images of the moving image can be readily calculated by a simple calculation for each of the plurality of paired local motion vectors.
Moreover, even when an error vector (also referred to as an outlier) is included, since the plurality of motions have been acquired, it is also possible to calculate the motion representing a whole image in which the impact of the error vector is eliminated by simple processing using the histogram. Acquiring the motion information by the aforementioned methods can realize robust processing with a small burden on the apparatus.
Meanwhile, in addition to the motions of the translation, scaling and rotation, information on the geometric deformation, such as the foreshortening between the images, may be required. The foreshortening corresponds to, among the geometric deformations generated by the relative motions between the image-pickup apparatus and the object, a change dominantly generated when a visual axis is inclined. In other words, the foreshortening is a change in which the translation, scaling, rotation and shear are eliminated from a general motion.
FIG. 5B is a diagram showing the geometric deformation (change of an object appearance) and the local motion vectors resulted from the visual axis of the image-pickup apparatus being relatively inclined towards the object in one of the horizontal directions.
Further, when the visual axis is inclined in another direction, the geometric deformation in line symmetry and the local motion vectors are generated. When the visual axis is shifted in the vertical direction, the foreshortening is dominantly generated.
Moreover, as another method for acquiring the motion between the images, Multiple View Geometry, R. Hartley, A. Zisserman, Campridge Press (2000) discloses a method of calculating a linear form with a least-square method by inputting the plurality of motion vectors for points corresponding to each other between frame images, and acquiring the motion information between the frame images.
According to this method, the motions of the translation in the horizontal and vertical directions, scaling, rotation and shear, further, foreshortening between the frame images can be calculated. However, when the error vector is included, without introducing a robust calculation method that places a troublesome and large burden on the apparatus represented by RANSAC, LMedS and the like, motion information representing the whole image that is free from the influence of the error vector cannot be calculated.
In the method disclosed in Japanese Patent Laid-Open No. 2005-269419, motions capable of being modeled are limited to the translation, scaling and rotation among motions between the image-pickup apparatus and the object. Thus, this method cannot handle a case where an unexpected motion generated when a large motion such as foreshortening occurs is included.
On the other hand, the method disclosed in Multiple View Geometry, R. Hartley, A. Zisserman, Campridge Press (2000) is suitable for being used in a case under laboratory-like conditions, such as a case where a plane check board is used as an object, the laboratory-like conditions hardly generating an error of the local motion vector (error vector). That is, the method can model all of the geometric deformation between the frame images that can be caused by any motions between the image-pickup apparatus and the object. However, in such a case of actual circumstances including error vectors among the plurality of motion vectors detected between the frame images, the robust calculation method must be introduced for eliminating the influence of the error vectors.