Moving object detection is an important aspect of image sequence analysis. It is necessary for surveillance applications, for guidance of autonomous vehicles, for efficient video compression, for smart tracking of moving objects, and many other applications. The two-dimensional motion observed in an image sequence is caused by three-dimensional camera motion (referred to as ego-motion) and by three-dimensional motions of independently moving objects. The key step in moving object detection is accounting for (or compensating for) the camera-induced image motion. After compensation for camera-induced image motion, the remaining residual motions must be due to moving objects.
The camera induced image motion depends both on the ego-motion parameters and the depth of each point in the scene from the camera. Estimating all of these physical parameters (namely, ego-motion and depth) to account for the camera-induced motion is, in general, an inherently ambiguous problem. When the scene contains large depth variations, these parameters may be recovered. These scenes are referred to as three-dimensional scenes. However, in two-dimensional scenes, namely when the depth variations within the scene are not significant, the recovery of the camera and scene parameters is usually not robust or reliable.
An effective approach to accounting for camera induced motion in two-dimensional scenes is to model the image motion in terms of a global two-dimensional parametric transformation. This approach is robust and reliable when applied to flat (planar) scenes, distant scenes, or when the camera is undergoing only rotations and zooms. However, the two-dimensional approach cannot be applied to three-dimensional scenes.
Therefore, two-dimensional algorithms and three-dimensional algorithms address the moving object detection problem in very different types of scenarios. These are two extremes in a continuum of scenarios: flat two-dimensional scenes (i.e., no three-dimensional parallax) vs. three-dimensional scenes with dense depth variations (i.e., dense three-dimensional parallax). Both classes fail on the other extreme case or even on the intermediate case (when three-dimensional parallax is sparse relative to amount of independent motion).
In real image sequences, it is not always possible to predict in advance which situation (two-dimensional or three-dimensional) will occur. Moreover, both types of scenarios can occur within the same sequence, with gradual transitions between them. Unfortunately, no single class of techniques (two-dimensional or three-dimensional) can address the general moving object detection problem. It is not practical to constantly switch from one set of techniques to another, especially since neither class treats the intermediate case very well.
Therefore, a need exists in the art for a unified approach for detecting moving objects in both two-dimensional and three-dimensional scenes.