Global motion estimation is used in video coding, video analysis, and vision-based applications. The global motion in an image sequence is usually considered as the relative motion of the camera with respect to the image background.
There are a number of global motion modeling methods, which consider some or all of panning, zooming, rotation, affine motion, and perspective motion. Mathematically, these global operations can be described as different transform matrices. However, in the discrete digital image domain, it is usually quite computationally expensive to solve the global motion parameters strictly following the mathematical models, which are well defined for the continuous space.
Some global motion estimation techniques conduct global motion estimation using a motion vector field obtained by a local motion estimation algorithm. Global motion parameters are then derived based on the mathematical models. However, the complexity of local motion estimation is a computational barrier for practical usages.
In another technique, hardware sensors were mounted within a video camera to determine the camera motion. But this hardware implementation is very costly for regular consumer electronics.
Another difficulty in global motion estimation is the existence of independently moving objects that introduce bias to the estimated motion parameters. One technique uses video object masks to remove the moving objects in order to obtain higher robustness. However, it is very difficult to segment the video objects.
Another global motion estimation technique uses a truncated quadratic function to define the error criterion in order to remove the image pixels of moving objects. This method significantly improves the robustness and efficiency. However, the truncation utilizes a pre-fixed threshold, which is not well defined.
One common aspect of the global motion estimation methods mentioned above is that they derive the global motion parameters based on a comparison between two temporally consecutive image frame s using the full content in the images. However, these techniques require large amounts of computational power.
The present invention addresses this and other problems associated with the prior art.