1. Field of the Invention
The present invention relates to an effective technique applied in an apparatus and a method for detecting from a picked-up image a particular subject (such as a human, an animal, an object and the like) or a portion thereof contained in the image.
2. Description of the Related Art
As a conventional art, there is one that detects from a picked-up image a particular subject (such as a human, an animal, an object and the like) or a portion thereof contained in the image. As an example of such conventional art, there is one that detects faces of human from a picked-up image, i.e. face detection technique.
Face detection is, for a given image, to search it using a certain processing by a computer to determine whether a face is contained therein.
Difficulties of face detection lie in two aspects: one is the intrinsic variation of face, such as difference of the face shape; the other is the extrinsic variation of face, such as rotation in plane.
Some early works on face detection include for instance, Rowley's ANN method and Schneiderman's method based on Bayesian decision rule. Schneiderman's method partitions a face into three views as the left profile, the frontal profile and right profile, respectively, and trained three detectors based on views by using Bayesian method and wavelet transformation. The final result is obtained by combining the results from the three detectors. Schneiderman's has contributed greatly to the solution of multi-view face detection.
In terms of proposing a cascaded classifier for improving speed, there have been many related works such as Xiao et al.'s Boosting chain algorithm and Liu et al.'s Kullback-Leibler Boosting algorithm (KL Boosting). These Boosting algorithms focused on some parts of the basic framework and adopted new methods for improvement.
In recent years, the cascade classifier for face detection has been proved very successful and efficient. However, for multi-view face detection (MVFD), the most straightforward way of extending their framework is to train a different cascade classifier respectively for each view and then use them as a whole like FIG. 1A. While Bo Wo, et al., discuss the possibilities in extending even such a simple framework for the problem of multiple views, two different approaches have also been proposed:
1. Pyramid Structure
In “Statistical Learning of Multi-View Face Detection”, ECCV 2002, Li, et al., proposed a pyramid-structured multi-view face detector to detect faces with various poses. As shown in FIG. 1B, the pyramid structure has only one node (main node) in the top layer, the node covering ±90° rotation out-of-plane (ROP), and has three nodes (child nodes) in the second layer, these three nodes dividing the space into three parts. Then the space is subdivided layer by layer. The pyramid structure adopts coarse-to-fine strategy to handle pose variations of ROP. Due to the similarities that exist in different poses of faces, the pyramid method treats them as one ensemble positive class so as to improve the efficiency of face features extraction.
2. Decision Tree Structure
The decision tree structure is as shown in FIG. 1C. Contrary to the above pyramid method, a decision tree method has been proposed in order to detect face with various poses and one solution has been taught for the issue of RIP (rotation-in-plane). The decision tree method puts emphasis upon the diversities among different poses and the decision tree works as a pose estimator of RIP. With the imperative judgments made by the decision tree, it truly reduces the time spent on pose estimation significantly.