The present invention relates to an image stabilizing apparatus and an image stabilizing method for performing image-stabilization processing in moving images. The present invention also relates to an image-pickup apparatus on which the image stabilizing apparatus is mounted.
Image stabilizing techniques involving image processing for reducing shakes of video (moving image) due to camera shakes are widely used in image-pickup apparatuses for moving images such as video cameras. Especially when an image-pickup optical system of a long focal length is used to pick up images, a slight camera shake leads to a violent shake of video and thus an image stabilizing function is essential for the camera. Even for an image-pickup optical system of a short focal length, effective operation of the image stabilizing function is desirable when a user attempts to pick up an object image while the user is moving.
When picking up images while the use is moving, an advanced image stabilizing function is necessary in which an unintended camera shake is discriminated from an intended camera work and only the image shake due to the camera shake is suppressed. Already proposed image stabilizing techniques for supporting such movement include use of inertial motion filtering (see, Z. Zhu, et al. “Camera stabilization based on 2.5D motion estimation and inertial filtering,” ICIV, 1998), and use of low-order model fitting (see, A. Ltvin, J. Konrad, W. C. Karl, “Probabilistic video stabilization using Kalman filtering and mosaicing,” Proceedings of SPIE. January 2003, p.p. 20-24).
In these image stabilizing techniques, an approximation model such as a translation model and a Helmert model (similarity model) is used in motion estimation from images (estimation of a global motion or a camera work). A motion estimate value is given as one-dimensional time-series data set corresponding to camera work components such as horizontal translation, vertical translation, in-plane rotation, scaling, and shear. Thus, a filtering mechanism such as an inertial filter and a Kalman filter, which receives as input the one-dimensional time-series data set for use in signal processing can be used without any change.
Since motions in the image are in one-to-one correspondence with camera works, intended image stabilization is realized simply by causing the result of the filtering of the abovementioned motion amount determined from the image or the difference between the original motion amount and the filtering result to act on the image.
An image-pickup apparatus of a short focal length may be mounted on a walking robot, a helicopter, or a wearable camera which can violently shake. As the focal length is further reduced, motions appearing in a picked-up image are changed.
Specifically, the degree of the camera work allowable for a motion in the picked-up image is inversely proportional to the focal length, so that a motion referred to as “foreshortening” occurs which is not seen in video at an immediate focal length, thereby making it impossible to achieve image stabilization by the estimation of an image variation amount in the conventional approximation model and the image-stabilization processing. To address this, a proposal has been made in which the estimation of an image variation amount is performed with a projective model instead of the abovementioned approximation model and geometric correction in the image-stabilization processing is performed with projective transformation.
The abovementioned uses are based on the premise that the motion estimation is performed from the image and the image-stabilization processing is performed from a combination of image geometric transformation. In this case, it is necessary to accurately detect an image variation in response to a large and complicated camera work and to correct a large motion based on the movement of the user. However, it is difficult for only a motion sensor often used in the conventional image-stabilization processing to sense a multi-axis variation with high accuracy and at low cost. In addition, optical image-stabilization processing cannot correct violent shakes.
When image-stabilization processing of video which is picked up by a moving user is performed with the projective model, the following problems arise.
One of the problems is that the filtering method based on the conventional signal processing technique does not appropriately function as it is in the discrimination between an intended camera work and an unintended shake (motion). This is because a projective nomography representing the image variation amount is a multi-dimensional amount represented by a matrix of 3×3.
One component of the projective homography is affected by a plurality of camera works. Thus, especially when a large forward camera work occurs, appropriate image stabilization cannot be achieved even when the camera work corresponds to a linear motion at a constant speed. This is because variation of each term component in terms of the homography is the linear sum of a non-linear image variation by the forward camera work and a linear image variation by a camera work such as translation and rotation that are perpendicular to an optical axis. As a result, even when filtering premised on a linear change is applied to each term of the projective homography, appropriate image-stabilization effects cannot be provided.
Second, one of the problems results from the extension of the estimation of the image variation amount and the image-stabilization processing to the projective model. The extension to the projective model allows detection of the image variation due to a large rotational camera work. Conversely, if the projective homography determined from motion vectors between frame images constituting video is inversely transformed directly or through the motion determination and then is used as a shake correction amount, appropriate image stabilization cannot be achieved. The image stabilizing method is widely used in image stabilization with the approximation model.
However, in the projective model, the influence of a translation camera work upon the image variation amount is relevant to the orientation of a reference plane associated with spatial distribution of motion vector extraction points in calculating a new projective homography. This causes the problem.
The relationship between a projective homography representing an image variation amount between frame images, a camera work, and a reference plane is expressed as follows:
  H  =      R    +                  1        d            ⁢              t        →            ⁢                        n          →                T            
where H represents the projective homography, R and {right arrow over (t)} represent rotation and translation of the camera, respectively, and d and {right arrow over (n)} represent the distance between the reference plane determined by the spatial positions of corresponding points and one camera, and the orientation of the normal to the reference plane, respectively.
Since the reference plane provided by the spatial positions of corresponding points for which a motion vector is extracted is often different from the position of a plane in space for which an observer wishes image stabilization, a problem arises. As seen from the abovementioned expression, the problem occurs only when the translation camera work is performed. For example, the problem involves distortion of the image in which an image plane is inclined in an advancing scene or an image plane is collapsed in a panning scene.
It is possible to adopt a compromise in which image stabilization is performed by using only triaxial rotation information of a camera work determined from a motion vector between frame images as proposed in Michal Irani, et al. “Recovery of Ego-Motion Using Image Stabilization,” CVPR ('94), Seattle, June 1994. However, a camera work of a translation component from an up-and-down motion caused by a walking shake is not ignorable as a motion for which image stabilization should be performed.