The present invention relates generally to the detection of independently moving objects in a sequence of two-dimensional video images representing a three-dimensional (3D) video scene and in particular, to a method that uses a multi-view camera motion constraint and a shape constancy constraint.
Automatic methods for processing images of a 3D scene to detect motions that are independent of camera motion are used in applications such as aerial video surveillance and monitoring, rapid model building under uncontrolled scenarios and moving object tracking. The 2D image motion in the scenarios under consideration can be attributed to the camera motion, the shape of the 3D scene and objects, and independent object motion. Automatic methods for solving the problem need to deal with the confounding effects of the various causes of image motion. It may be difficult, for example, to detect a moving object in a scene imaged by a moving camera if the object moves in the same direction as the camera motion.
A particularly difficult case of 3D scenes are sparse 3D scenes in which the xe2x80x9c3Dnessxe2x80x9d of the scene is sparsely distributed and the image parallax for the fixed scene and the independent motions may be equally dominant.
Previous attempts to automatically detect independent motion in 3D scenes have either employed only the epipolar constraints or have assumed that frame correspondences and/or image flows are available or can be reliably computed. One such system, described in an article by G. Adiv entitled xe2x80x9cDetermining 3D Motion and Structure from Optical Flows Generated by Several Moving Objects,xe2x80x9d IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 7, no. 4, pp. 384-401, 1985. The system disclosed in this article assumes that the optical flow for the sequence of images is available and uses this flow to label points that belong to planes. Subsequently, the planar hypotheses are grouped on the basis of a rigidity constraint over two frames. In essence an epipolar constraint is applied to groups of planes.
Epipolar constraints may produce erroneous results, however, when independent object motion is in the same direction as camera motion. In this instance, the epipolar constraints may erroneously be calculated based on the independent object motion instead of the underlying scene. Image flows are time consuming to calculate and are subject to error, for example, if items in one frame are erroneously classified as matching objects in another frame.
The subject invention is embodied in a system and method that detects independently moving objects in 3D scenes that are viewed under camera motion. The subject invention first calculates 2D view geometry constraints for a set of images. These constraints are tested to determine if the imaged scene exhibits significant 3D characteristics. If it does, then 3D shape constraints, are applied to the set of images. The 3D shape constraints are themselves constrained by the 2D view geometry constraints. The set of images is then tested to identify areas that are inconsistent with the constraints. These areas correspond to the moving objects.
According to one aspect of the invention, the 2D view geometry constraints are calculated by computing a dominant image alignment for successive pairs of images and then computing constrained epipolar transformations for the two image pairs.
According to another aspect of the invention, the 2D view geometry is calculated based on a plurality of target point correspondences among the plurality of frames. The geometry corresponding to a minimum median error is selected as the 2D view geometry of the scene.
According to yet another aspect of the invention, the 3D shape constraint is a parallax geometry that is calculated by iteratively minimizing errors in a parametric transformation using an estimated parallax geometry, over a plurality of images.