(1) Field of Invention
The present invention relates to an object recognition and tracking system, and more particularly, to an object recognition and tracking system that utilizes the geometric structure between image pairs corresponding to different views of three-dimensional (3D) objects to identify and track objects in 3D world coordinates.
(2) Related Art
Typically, classification of objects in an image is performed using features extracted from an analysis window which is scanned across the image. In the prior art, identification/classification from a single view is combined with motion parallax (stereo geometry) in order to recognize geometrical parameters of objects. The result is often used to generate three-dimensional (3D) models of architectural buildings. Essentially, existing methods allow the inference of 3D shape and texture where evidence from a single two-dimensional (2D) image is weak. The methods are formulated for 3D objects with highly salient linear geometric features, such as rectangular frames, corners, and square grids. Therefore, existing methods cannot be applied directly to deformable 3D objects with nebulous 2D projections, such as those of pedestrians.
Other existing systems construct an image-based visual-hull from a number of monocular views of both faces and gaits at different viewing configurations (pedestrians). If a forward viewing position is captured, the face of the pedestrian is made available to frontal face classifier which identifies the pedestrian. Alternatively, if the side view is available, the gait information is used to identify the pedestrian. This particular invention shows improved results, demonstrating how different views of a pedestrian can be combined via two different types of classifiers (face and gait), exploiting the strengths of each corresponding classifier at different viewing configurations. Although the method is interesting, it cannot be used with a single-mode classifier. It also does not exploit the constraints of multi-view geometry.
Another existing system uses a number of multi-view geometric constraints for collections of geometric primitives, such as planar shape boundaries. This is a theoretically elegant work, where several view-independent algebraic constraints are derived that are useful for matching and recognizing planar boundaries across multiple views. However, the methods are too low-level to be embedded in a multi-view classifier architecture and cannot be effectively applied to constraining and fusing the output from single-view classifiers.
Another reference describes the performance gain available by combining results of a single view object recognition system applied to imagery obtained from multiple fixed cameras. However, the system is focused on classification results for 3D objects with highly articulate geometric features (toy cars, planes, cups, etc.) that lead to drastically different appearances when viewed from different viewing angles. The reference describes performance variation in the presence of clutter and changing camera parameters. The reference concludes by suggesting that limitations exist for enhancing performance of classifiers whose single-view performance is weak to begin with. In the context of pedestrian classification, the results are not relevant.
Thus, a continuing need exists for an object recognition system using multi-view constraints and being formulated to restrict the search space by combining shape priors to reduce false alarms and speed up the process.