Automatic object detection and classification is a key enabler for applications in robotics, navigation, surveillance, or automated personal assistance. Yet, an automatic object detection is a difficult task. The main challenge is the amount of variation in visual appearance. An object detector must cope with both the variation within the object category and with the diversity of visual imagery that exists in the world at large. For example, cars vary in size, shape, color, and in small details such as the headlights, bumpers, tires etc. The lighting, surrounding scenery, and an object's pose affects its appearance. A car detection algorithm must also distinguish cars from all other visual patterns that may occur in the world, such as similar looking rectangular objects.
The common approach to automatic object detection is shifting a search window over an input image and categorizing the object in the window with a classifier. To speed up the system with losing classification performance, one can exploit the following two characteristics common to most vision-based detection tasks. First, the vast majority of the analyzed patterns in an image belong to the background class. For example, the ration of non-face to face patterns is about 50,000 to 1. These tests are done by P. Viola and M. Jones in “Rapid object detection using a boosted cascade of simple features”. In Proc. CVPR, pages 511-518. Second, many of the background patterns can be easily distinguished from the objects. Based on these two observations, object detection is always carried out in a two-stage scheme as illustrated in the block diagram of the system 100 in FIG. 1. First, all the regions in the image that potentially contain the target objects are identified. This is what is known as “focus of attention's mechanism”. Second, the selected regions are verified by a classifier.
Various focus of attention generation approaches have been proposed in the literature, each of which falls into one of the following three categories known as the knowledge-based, stereo-based, and motion-based. Knowledge-based methods employ knowledge about object shape and color as well as general information about the environment. For example, symmetry detection approaches using the intensity or edge map have been exploited based on the observation that vehicles are symmetric about the vertical axis. This is clearly disclosed by A. Kuehnle in “Symmetry-based recognition for vehicle rears,” Pattern Recognition Letters, vol. 12, pp. 249-258. Stereo-based approaches usually take advantage of the inverse perspective mapping (IMP) to estimate the locations of vehicles, people, and obstacles in images. The IMP approach is described by H. Mallot, H. Bulthoff, J. Little, and S. Bohrer in “Inverse perspective mapping simplifies optical flow computation and obstacle detection,” Biological Cybernetics, vol. 64, no. 3, pp. 177-185. Furthermore, Bertozzi et al, in “Gold: A parallel real-time stereo vision system for generic obstacle and lane detection,” IEEE Trans. on Image Processing, vol. 7, pp. 62-81, computed the IMP from the left and right images and compared the two images. Based on the comparison, one could find objects that were not on the ground plane. Using this information, one was able to determine the free space in the scene. Motion-based methods detect vehicles, people, and obstacles using optical flow. Generating a displacement vector for each pixel (continuous approach), however, is time-consuming and also impractical for a real-time system. In contrast to continuous methods, discrete methods reported better results using image features such as color blobs or local intensity minima and maxima. The method using color blobs is disclosed by B. Heisele and W. Ritter in “Obstacle detection based on color blob flow,” IEEE Intelligent Vehicles Symposium, pp. 282-286. The method using local intensity mimima and maxima is disclosed by D. Koller, N. Heinze and H. Nagel in “Algorithm characterization of vehicle trajectories from image sequences by motion verbs,” IEEE Conf. on Computer Vision and Pattern Recognition, pp. 90-95.
A number of different approaches to hypothesis verification that use some form of learning have been proposed in the literature. In these approaches, the characteristics of the object class are learned from a set of training images which should capture the intra-class variabilities. Usually, the variability of the non-object class is also modeled to improve performances. First, each training image is represented by a set of local or global features (e.g. Harr wavelet, SIFT, Shape Context) as described by P. Viola and M. Jones in “Rapid object detection using a boosted cascade of simple features”. In Proc. CVPR, pages 511-518. Then these features are converted into some underlying configuration (e.g. “bag of features”, constellation model) as disclosed by M. Weber, M. Welling, and P. Perona in “Unsupervised learning of models for recognition”, In Proc. ECCV, pages 18-32. Then, the decision boundary between the object and non-object classes is learned either by training a classifier (e.g., Adaboost, Support Vector Machine, Neural Network (NN) or by modeling the probability distribution of the features in each class (e.g., using Bayes rule assuming Gaussian distributions). These methods differ on the details of the features and decision functions, but more fundamentally they differ in how strictly the geometry of the configuration of parts constituting an object class is constrained. So, a need exists in the art to built an improved system which stably detects still or not moving objects over a wide range of viewpoints.