Many prior art object classification systems, particularly face recognition systems, use a cascade of classifiers to detect an object in an image. Instead of applying a single classifier to the image, cascades of increasingly more complex classifiers are applied to the image. Portions of the image that do not include the object are rejected early, while portions that are accepted are eventually marked as including the object. The advantages of such systems are described in U.S. patent application Ser. No. 10/200,464, “System and Method for Detecting Objects in Images,” filed by Viola et al. on Jul. 22, 2002, and in Viola et al., “Rapid Object Detection using a Boosted Cascade of Simple Features,” IEEE Conference on Computer Vision and Pattern Recognition, 2001.
Similar methods are described by Elad et al., “Rejection based classifier for face detection,” Pattern Recognition Letters 23, pp. 1459-1471, 2002, Keren et al., “Antifaces: A novel, fast method for image detection,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 23(7), pp. 747-761, 2001, and Romdhani et al., “Computationally efficient face detection,” Proc. Intl. Conf. Computer Vision, pp. 695-700, 2001.
All of those methods use simple classifiers to reject large portions of the image, leaving more time to use more complex, and time consuming classifiers for the remaining portions of the image that are more likely to include a face. All of those methods focus on the following issues: the features on which the methods operate, the process used to extract the features from the image, and the process used to select features.
Elad et al. uses features based on pixel values, and selects classifiers that maximize the rejection rate. Keren et al. use an anti-face detector that assumes a normal distribution in the background of the image. Romdhani et al. construct a support vector machine (SVM) and then approximate the SVM with a sequence of support vector classifiers that use non-linear optimization. All of the above methods process each pixel in the image at least once before a portion of the image is rejected.
In contrast, Viola et al. construct a feature space that includes a combination of rectangular regions that can be determined from pixel images using an integral image. They use a sequential feature selection process based on AdaBoost, Freund et al., “A decision-theoretic generalization of on-line learning and an application to boosting,” Computational Learning Theory: Eurocolt 95, Springer-Verlag, pp. 2337 at seq., 1995.
An important advantage of the feature space of Viola et al. is that image patches can be rejected with a small number of operations. Although the Viola rectangular filters are efficient to determine using the integral image, they do form a large feature space, thus placing a heavy computational burden on the feature selection process that follows.
Another method replaces the sequential forward searching process of Viola et al. with a floating search process with backtracking capabilities, Li et al., “Statistical Learning of Multi-View Face Detection,” Proceedings of the 7th European Conference on Computer Vision, May 2002.
Some classifiers have excellent results but take more time than the ‘greedy’ classifiers described above, Heisele et al., “Feature reduction and hierarchy of classifiers for fast object detection in video images,” Proc. CVPR, Vol. 2, pp. 1824 et seq., 2001, Schneiderman et al., “A statistical model for 3D object detection applied to faces and cars,” IEEE Conference on Computer Vision and Pattern Recognition. IEEE, June 2000, Sung et al., “Example-based Learning for View-Based Human Face Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence 20(1), pp. 39-51, 1998, and Rowley et al., “Neural network-based face detection,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(1), pp. 2338 et seq., 1998.
It is desired to improve the performance of object classifiers.