Automated object detection and/or recognition (ODR) can be used to detect types or classes of physical objects—from simple objects such as geometric shapes to more complex objects such as geographic features and faces—in raw image data (still or video). ODR can also be used to detect audio objects such as songs or voices in raw audio data. A myriad of different techniques have been developed for ODR.
Face detection in particular has attracted much attention due to the potential value of its applications as well as its theoretical challenges. Techniques known by names such as boost cascade and boosting have been somewhat successful for face detection. Still, robust detection is challenging because of variations in illumination and expressions.
A boost cascade detector uses a number of “weak” classifiers that are unified to produce a “strong” classifier. A large set of training data can be used to train the weak classifiers to recognize the possible variations in the features of the object to be detected. However, the computational costs and memory demands of training a detector on a large set of training data are unacceptably high. To put this in perspective, weeks have been spent to train a detector with 4297 features on a training set of 4916 faces. To date, the largest known set of positive samples used for training contains 20,000 face samples.