Image sequences, captured with devices such as digital still or video cameras, often contain undesirable motion, referred to as jitter, between images. As an example, video captured with a hand-held video camera often exhibits some shake or jitter from frame to frame, despite the user's best efforts to hold the camera steady.
Several different approaches have been proposed for jitter removal from digital image sequences. Optical stabilizers act on light images prior to capture. For example, U.S. Pat. No. 5,581,404, describes an oscillating gyroscope and rotating prism lens used as part of a mechanism to detect and correct for angular velocity in the camera. Optical stabilization is effective, but at the cost of additional camera weight, extra components, and required power. These requirements go against the general trend of miniaturization in cameras.
Stabilization can also be accomplished by determining camera motion either electronically or digitally, and compensating for this motion by selecting an appropriately offset image region from an oversized electronic imager or image sensor, such as a CCD or CMOS imager. An electronic imager is “oversized” when the imager captures a greater field of view than is presented in output images. Reduced size images can be provided for archival storage and for display in an electronic viewfinder or other camera display during capture.
Electronic stabilization systems use motion sensing transducers to detect actual camera motion, which is then used to locate an output window relative to images produced by an oversized imager. Electronic stabilization is typically performed in-camera and has the shortcomings of the weight and cost of the transducers.
With digital stabilization, the actual camera motion must be estimated from the captured image sequence. This approach has low cost because no external motion sensors are required. Digital stabilization has had performance challenges relative to optical or electronic stabilization, because digital stabilization presents a large computational burden and image content can confound digital motion estimation required for stabilization.
Digital stabilization can be performed in-camera or off-line. Each approach has its own advantages and disadvantages. In-camera digital stabilization is constrained in terms of available processing capability. Off-line digital stabilization has the advantage that computational resources are less likely to be constrained.
In-camera digital stabilization is readily automated, since stabilization steps are performed in the camera and can provide output to the photographer during image sequence capture. Off-line stabilization is less convenient. Typically, the user is required to perform an additional procedure with appropriate software to achieve stabilized video. A further problem is that image sequences received for stabilization are likely to have been through one or more compression-decompression cycles. This presents an increased risk of artifacts in the stabilization process.
With off-line digital stabilization, output images are transferred and, during stabilization, are reduced in field of view. This raises an issue of user expectations, since the user is likely to have viewed the output images before the reduction in field of view. An additional issue, particularly in terms of user expectations, is that resolution is also reduced, unless an additional interpolation procedure is provided following stabilization.
In-camera stabilization, can provide stabilized images to a viewfinder during capture of an image sequence, but, in doing so, is limited to algorithms that do not use future frames. This leads to poor performance in identifying intentional motion such as camera pans. With off-line digital stabilization, the entire image sequence is available at the time of stabilization. This allows use of algorithms that exploit data from both future frames and previous frames when stabilizing a given frame.
With digital stabilization, the actual camera motion must be estimated from the captured video stream. This can be difficult, as it is necessary to distinguish object movement from camera movement. The first step is to estimate the motion between frames. This is followed by trajectory estimation, which computes an estimate of the desired camera motion (usually by assuming that hand shake is higher frequency than the desired motions). Jitter is estimated based on the overall motion and desired camera motion estimates, and is then compensated for through an image shift or warp function.
Many or most digital stabilization techniques use some form of block-matching for motion estimation. Block-matching divides an image into a collection of blocks, and for each block finds the best matching region in the previous image. Once a motion estimate has been obtained for each block, a set of rules must be applied to convert these local estimates into a single global estimate of the motion. Because block-based motion estimation obtains local motion estimates from different regions throughout the image, it can be very robust to independent moving objects within a scene. One technique used to eliminate incorrect motion estimates is to form a histogram of all of the local motion estimates, and eliminate all values that occur infrequently. Other local estimates may be eliminated if they are considered unreliable due to causes such as the block containing repeating patterns or very few edges. Once the local estimates have been pruned such that only reliable estimates remain, typically the median or mean is chosen as the global motion estimate.
Uomori et al, “Automatic Image Stabilizing System by Full-Digital Signal Processing”, IEEE Transactions on Consumer Electronics, 36(3), August 1990, pages 510-519, discloses digital stabilization and use of specific temporal filters in jitter removal. Other disclosures of digital filtering techniques include U.S. Pat. Nos. 5,172,226; 5,748,231; 5,648,815; 5,510,834; and 5,289,274. U.S. Pat. No. 5,748,231 discloses techniques relating to failure conditions of motion vectors. These techniques all employ a digital motion estimation unit to estimate the global motion between images in the sequence, and a jitter calculation algorithm to determine what component of the estimated motion is jitter rather than an intended pan.
The above block-based digital stabilization techniques and related techniques are satisfactory in many respects. The approach has low cost because the algorithm is entirely software based. Block-based techniques have the advantage of capturing a relatively large amount of local information and being relatively robust to factors such as independently moving objects in the scene. On the other hand, block-based techniques are computationally complex. This presents performance challenges relative to optical or electronic stabilization and limits usefulness in applications with limited computational resources, such as currently available, moderately priced digital cameras.
Some of the above techniques consider rotations, warping and general affine transformations in estimating motion. These approaches result in requirements of extensive computational resources, both in the computation of the motion, and in the subsequent interpolation required to offset the image based on the computed jitter motion.
U.S. Pat. Nos. 6,130,912 and 6,128,047 disclose the use of integral projection for motion estimation. A block-based motion estimate is disclosed in “Efficient Block Motion Estimation Using Integral Projections”, K. Sauer and B. Schwartz, IEEE Trans. On Circuits and Systems for Video Technology, 6(5), 1996, pages 513-518. The integral projections are within a block-matching framework and are subject to the limitations of block-based techniques.
The use of full frame integral projections in computing a global expansion of a block-based motion estimate is disclosed in “Real-time Digital Video Stabilization for Multi-media Applications”, K. Ratakonda, IEEE Int'l Symposium on Circuits and Systems, 1998, vol. 4, pages 69-72. Full frame integral projections operate by projecting a two-dimensional image onto two one-dimensional vectors, one horizontal and one vertical. This can be achieved by summing the elements in each column to form the horizontal projection, and summing the elements in each row to form the vertical projection. Full frame integral projections reduce the problem of two-dimensional global motion estimation into two independent one-dimensional motion estimation problems, significantly reducing computational complexity. Ratokonda's paper also discloses computational improvements in the use of full-frame integral projections through sub-sampling and interpolation-based one-half pixel accuracy. These techniques have the shortcoming that a non-causal filter is used.
Though satisfactory in many respects, the known digital and electronic image sequence stabilization algorithms are associated with numerous drawbacks, including inadequate performance and excessive computational complexity.
It would thus be desirable to provide cameras and methods, which provide improved digital stabilization of image sequences with limited computational resources.