The present invention generally relates to digital image processing and more particularly to a system and method for removing unwanted motion in digital image sequence.
Digital image sequences, such as those obtained from digital video cameras or from the scanning of motion picture film, often contain unwanted motion between successive frames in the sequence. There are many potential causes of this unwanted motion, including camera shake at the time of image capture, or frame-to-frame positioning errors (also known as jitter or hop and weave) when a film sequence is scanned. The process of removing this unwanted motion is termed image stabilization.
Some systems use optical, mechanical, or other physical means to correct for the unwanted motion at the time of capture of scanning. However, these systems are often complex and expensive, and they cannot correct for unwanted motion in a digital image sequence that was produced by an unknown and uncharacterized device. To provide stabilization for a generic digital image sequence, several digital image processing methods have been developed and described in the prior art.
A number of digital image processing methods use a specific camera motion model to estimate one or more parameters such as zoom, translation, rotation, etc. between successive frames in the sequences. These parameters are computed from a motion vector field that describes the correspondence between image points in two successive frames. The resulting parameters can then be filtered over a number of frames to provide smooth motion. An examples of such a system can be found in U.S. Pat. No. 5,629,988 to Burt et al. A fundamental assumption in these systems is that a global transformation dominates the motion between adjacent frames. In the presence of significant local motion, such as multiple objects moving with independent motion trajectories, these methods may fail due to the computation of erroneous global motion parameters.
Other image processing methods for digital image stabilization are designed primarily for digital video camera applications where system constraints include minimal buffering requirements and near real-time processing. As a result, these methods are limited to applying unwanted motion correction between only two frames at a given time, which prohibits filtering of the motion parameters over multiple frames. An example of such a system is described in U.S. Pat. No. 5,748,231 to Park et al. In this method, a weighted average motion vector is computed from the motion vector field corresponding to two successive frames. The weightings are determined from various statistical measures that indicate the reliability of a given motion vector. The weighted average motion vector is then applied to remove the motion between two successive frames. This type of processing results in all motion (including desired camera motion such as pans) being removed from the sequence, not just unwanted motion. Again, these methods assume the image sequences contain a dominant global motion, and they may fail in the presence of significant local motion.
Still other digital image processing methods for removing unwanted motion make use of a technique known as phase correlation for precisely aligning successive frames. An example of such a method has been reported by Eroglu et al. (xe2x80x9cA fast algorithm for subpixel accuracy image stabilization for Digital Film and Video, xe2x80x9d in Proc. SPIE Visual Communications and Image Processing, Vol. 3309, pp. 786-797, 1998). However, these methods require that the sequence has no local motion, or alternatively, a user must select a region in consecutive frames that has no local motion. The dependence upon areas with no local motion and the necessity for user intervention are major drawbacks of these methods.
The invention solves the problem of removing unwanted motion from a digital image sequence without removing desired motion (e.g., pan, zoom, etc.). It does so without excessive computational requirements, and it is a fully automated process. Furthermore, it is robust in the presence of significant local motion in the image sequence.
The present invention overcomes the limitations of conventional systems by using a simple model that is based on the observation that the cumulative motion vectors corresponding to the desired motion will generally vary smoothly from frame-to-frame. Further, the invention uses a motion vector histogram with a simple threshold, and does not rely on a specific camera transformation model (or the absence of local motion). It is, therefore, an object of the present invention to provide a structure and method that uses a motion vector histogram in determining the unwanted motion components in digital image sequence.
One embodiment of the invention is a method for stabilizing a digital image sequence consisting of a number of successive frames. The method includes: calculating a motion vector field between adjacent frames; forming a motion vector histogram from horizontal and vertical components of the motion vector field; applying a threshold to the motion vector histogram to produce a thresholded motion vector histogram; generating average horizontal and vertical motion components from the thresholded motion vector histogram; filtering the average horizontal and vertical motion components over a number of frames to identify unwanted horizontal and vertical motion components for each of the frames; and stabilizing the image sequence by shifting each frame according to the corresponding unwanted horizontal and vertical motion components.
The thresholding of the motion vector histogram removes undesirable motion vectors that are likely to be unreliable or correspond to objects that have a small spatial extent. This threshold can be changed for each frame by an adaptive means or can be fixed at a pre-specified level. The unwanted horizontal and vertical components correspond to high temporal frequencies, and they can be computed by applying a highpass filter to the average horizontal and vertical components. The degree of highpass filtering is user-adjustable. The stabilizing includes a displacement of each frame by the corresponding unwanted horizontal and vertical components.
Another embodiment of the invention is a computerized digital imaging system for stabilizing a digital image sequence formed from a number of successive frames including: a motion estimation unit for calculating a motion vector field between adjacent frames; a histogram generator unit for forming a motion vector histogram from horizontal and vertical components of the motion vector field; a thresholding unit for applying a threshold to the motion vector histogram to produce a thresholded motion vector histogram; and averaging unit for generating average horizontal and vertical components from the thresholded motion vector histogram; a filtering unit for filtering the average horizontal and vertical components over a number of frames to identify unwanted horizontal and vertical components for each of the frames, and a stabilizing unit for stabilizing the image sequence by translating each frame according to the corresponding unwanted horizontal and vertical components. The thresholding unit removes motion vectors that are likely to be unreliable or correspond to objects that are temporally transient or have a small spatial extent. The system further includes a threshold determination unit for adaptively computing a threshold for each frame, or alternatively, the threshold may be fixed at a pre-specified level. The filter includes a highpass filter for identifying the unwanted horizontal and vertical components. The system further includes a user interface for adjusting the degree of highpass filtering. The stabilizing unit includes a displacement unit for shifting each frame by the corresponding unwanted horizontal and vertical components.
One advantage produced by the invention is that unwanted global motion is removed from a digital image sequence in a fully automated operation. With the invention, no user intervention is required, although the user has access to some system parameters to control usage and degree of stabilization. Also, the invention has robust performance to different scene content. Thus, the sequence may contain substantial local motion without significantly affecting the removal of the unwanted global motion. Moreover, the desired camera motion (pan, zoom, etc.) is not removed during the inventive stabilization process.
Further, with the invention, there is a minimal computational load. The estimation of the motion vector field is the most time-consuming component. However, this can be done efficiently with block-based motion estimation methods. Additionally, the invention allows stabilization to sub-pixel levels (e.g., below human perceptual thresholds). Finally, the use of the motion vector histogram offers the potential for further improvements, including the tracking of multiple motion vector clusters for improved estimates of unwanted motion.