Computerized object detection, including face detection, has made significant progress in recent years. For example, face detection has shown satisfactory performance for frontal face detection tasks.
Object detection becomes substantially more difficult and less accurate when objects are observed from multiple viewpoints (multi-view). In training detectors (classifiers), if example objects are labeled as positive examples but are labeled indifferently with respect to viewpoints, detectors learned through a straightforward learning algorithm do not perform accurately.
As a result, a common practice in multi-view object detection has been “divide and conquer” in which the general class of objects to be detected is first divided into subcategories. Different classifiers are then trained for different subcategories. In face detection, for instance, faces can be categorized as frontal, left/right half profile, left/right profile, zero degrees in-plane rotation, plus or minus thirty degrees in-plane rotation, and so forth. Then, a trained pose estimator may first classify a face into one of the above subcategories.
In training, each subcategory has manually labeled data for training that category's classifier. However, the manual labeling process is very labor-intensive, and is difficult to do for certain tasks such as pedestrian detection or car detection. Clustering (e.g., conventional k-means clustering) helps to an extent, but has its own problems. Labeling is also error prone; for example, the boundary between frontal and half profile faces can be very subtle, and often differs from person to person.
Thus, while pose estimation and clustering helps with multi-view object detection, misclassification caused by the pose estimator causes problems. For example, if a profile face is misclassified as frontal, it may never be detected in later classification. Misclassification also happens in training, such as when caused by mislabeling. Moreover, manual labels or clustered results are not necessarily optimal for learning an overall detector.