1. Field of the Invention
The present invention relates to an image processing apparatus for processing image data using weak discriminators allocated in a tree structure.
2. Description of the Related Art
In recent years, along with the progress of statistical learning methods, as an image processing method for detecting a target object in image data, various practical methods have been proposed. In particular, in case of face detection processing having a human face part as a target object, various applications of the detection result are possible. Hence, researches and developments have particularly been made.
For example, P. Viola and M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features” (Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Vol. 1, pp. 511-518, December 2001) (to be referred to as reference 1 hereinafter) proposes a method of implementing front-view face detection by selecting very simple feature amounts called rectangular features using ensemble learning such as AdaBoost or the like, and combining these features. According to reference 1, since processing for aborting detection processing for an object which is obviously not a face during detection is adopted, high-speed face detection processing is implemented.
Furthermore, various proposals have been made about face detection for views in arbitrary directions (face detection of a plurality of types) in addition to a front view. For example, B. Wu, H. Ai, C. Huang, and S. Lao, “Fast Rotation Invariant Multi-View Face Detection Based on Real AdaBoost,” (Proc. Sixth Int'l Conf. Automatic Face and Gesture Recognition, pp. 79-84, 2004) (to be referred to as reference 2 hereinafter) discloses a method that configures multi-view face detectors.
Moreover, M. Jones and P. Viola, “Fast Multi-View Face Detection,” (Mitsubishi Electric Research Laboratories TR2003-96, July 2003) (to be referred to as reference 3 hereinafter) discloses an arrangement which comprises an identifier that identifies a face view, and detectors that execute face detection to have specific face views as targets. According to this reference, a face view is determined first using the identifier to select the detectors having the determined face view as a target, and face detection is executed using the selected detectors, thus implementing face detection processing for views in arbitrary directions.
In addition, C. Huang, H. Z. Ai, Y. Li, and S. H. Lao, “Vector Boosting for Rotation Invariant Multi-View Face Detection,” (Proc. 10th IEEE Int'l Conf. Computer Vision, 2005) (to be referred to as reference 4 hereinafter) and Japanese Patent Laid-Open No. 2005-284487 implement face detection for views in arbitrary directions by adopting a structure (tree structure) in which detectors capable of detecting faces of views in all directions are allocated to start processing from these detectors before a branch point to detectors for specific face views.
However, Japanese Patent Laid-Open No. 2005-284487 and references 2 to 4, which implement face detection for views in arbitrary directions (face detection of a plurality of types), respectively suffer the following problems in association with the processing speed of face detection.
For example, in case of Japanese Patent Laid-Open No. 2005-284487, upon selecting a branch destination at a branch node of the tree structure, branch destinations are selected in turn. More specifically, when a leaf node (terminal node) is reached in a given branch destination, image data to be processed is determined as a “face of a view in a direction handled by the branch destination of interest”, and the detection processing for that image data ends. On the other hand, when processing is aborted in one branch destination during detection, the control returns to the branch node to select other branch destinations in turn.
In this manner, the branch node does not select the branch destinations based on a reference of some sort, but it executes processing according to a labeling order of branch destinations.
For this reason, upon detecting a face of a view in a direction handled by a branch destination which has a later labeling order of processing, selection of branch destinations fails many times until that branch destination is selected (the processing is aborted in branch destinations many times during detection). As a result, huge amounts of processing time are required until a leaf node of that branch destination is reached.
On the other hand, reference 2 adopts an arrangement in which face detection of views in arbitrary directions is executed using a plurality of detectors prepared for respective face views. As the sequence of face detection at this time, all the detectors execute processes a little to estimate certainty factors of all the detectors, and the detectors are selected based on the magnitudes of the estimated certainty factors. After that, the selected detectors execute the remaining processes to attain face/non-face discrimination.
Therefore, in case of reference 2, since not all the detectors are always operated, and only specific detectors are operated, the processing time can be shortened.
However, in case of reference 2, every time a detection sub-window is scanned in image data to be processed, the certainty factors of all the detectors are required to be estimated. In general, since no face is detected from most of regions in image data, it is not efficient to estimate the certainty factors of all the detectors even for these regions.
In case of reference 3, prior to processing of the detectors having a specific face view as a target, the identifier that identifies a face view needs to be activated in advance. For this purpose, a processing time required to activate the identifier that identifies a face view needs to be assured. As in the case of reference 2, every time a detection sub-window is scanned in image data to be processed, the identifier that identifies a face view needs to be activated, resulting in poor efficiency.
Furthermore, in case of reference 4, a framework called Vector Boosting is formulated as the sequence for selecting a branch destination at a branch node of the tree structure. According to this framework, a degree of confidence, which is normally calculated as a scalar value, is calculated as a vector value including elements corresponding to branch destinations, thereby determining a branch destination (a direction in which values of respective elements of a vector exceed a threshold is determined as a branch direction).
However, in case of this method, the computation volume upon detection becomes huge compared to detectors using normal AdaBoost. That is, since a value which is normally calculated as a scalar value, is calculated as a vector value, the computation volume increases by the number of dimensions of that vector.
In case of reference 4 as well, every time a detection sub-window is scanned in image data to be processed, selection processing based on Vector Boosting needs to be executed. For this reason, the selection processing is executed even for regions where no face is detected as in references 2 and 3, resulting in poor efficiency.
As described above, Japanese Patent Laid-Open No. 2005-284487 and references 2 to 4, which can execute face detection of views in arbitrary directions, suffer the following problems.
One, an overhead is large in processes until detectors corresponding to a face view are selected.
And two, the number of times of execution of processing for selecting detectors corresponding to a face view is large.
For this reason, upon detection of target objects of a plurality of types in image data, it is demanded to overcome such problems, and to improve the processing speed.