1. Field of the Invention
The present invention relates to an image processing apparatus, image data processing method, and storage medium. Particularly, the present invention relates to a technique of detecting a specific object captured in an image.
2. Description of the Related Art
An image processing method of automatically detecting a specific object pattern from an image is useful and applicable to, for example, determine a human face. This method can be used in many fields including a teleconference, a man-machine interface, a security system, a monitor system for tracking a human face, and an image compression technique. As a technique of detecting a face from an image, various methods are mentioned in non-patent literature 1 (M. H. Yang, D. J. Kriegman and N. Ahuja, “Detecting Faces in Images: A Survey,” IEEE Trans. on PAMI, vol. 24, no. 1, pp. 34-58, January, 2002).
In particular, an AdaBoost-based method by Viola et al., which is described in non-patent literature 2 (P. Viola and M. Jones, “Robust Real-time Object Detection,” in Proc. of IEEE Workshop SCTV, July, 2001), is widely used in face detection research because of a high execution speed and high detection ratio. The AdaBoost-based method increases the classification process speed by series-connecting different simple classification process filters, and when it is determined that the target region is a non-face region during the classification process, aborting a subsequent classification process. To further increase the classification process speed, classification processes may be parallel-executed or pipelined. However, because the process is aborted in accordance with the contents of image data or the like, this makes it difficult to schedule the classification process. Especially when the classification process is implemented by hardware, it needs to be implemented by utilizing limited resources at higher efficiency. The difficulty of scheduling becomes a serious challenge.
This problem will be explained in more detail by exemplifying a face detection process using a Boosting algorithm. FIG. 2 is a view showing a face detection process algorithm using the Boosting algorithm. In FIG. 2, a window image 201 is a partial image which forms input image data. The window image 201 contains an image region to be referred to by weak classifiers 210 to 250 (to be described later).
The weak classifier 210 refers to some or all pixel values of the window image 201, performs a predetermined calculation process for the pixel values, and determines whether the window image 201 contains a detection target object (for example, a human face region). Parameters such as coefficients used in the predetermined calculation are determined by machine learning prior to an actual detection process.
The weak classifiers 211 to 250 can also execute the same calculation process as that by the weak classifier 210. However, the reference position and reference range of the window image 201 to be referred to by the weak classifier 210, and parameters such as coefficients used in calculation by the weak classifier 210 are preferably changed. The weak classifiers 211 to 250 can execute almost the same process as that by the weak classifier 210 except that the reference position and range of the window image 201, and parameters such as coefficients used in calculation are different. In FIG. 2, N weak classifiers (weak classifiers 210 to 250) are series-connected. Each of the weak classifiers 210 to 250 determines PASS or NG in accordance with the calculation result. The process starts from the 0th weak classifier 210, and if the determination result is PASS, the next weak classifier 211 performs the determination process. In this way, the process proceeds sequentially. If the determination result of the weak classifier 250 serving as the final weak classifier is also PASS, it is determined that the window image 201 contains the detection target object (face). If NG is determined in the determination process by one of the N weak classifiers, a subsequent process is aborted, and it is determined that the window image 201 does not contain the detection target object (face).
FIG. 3 exemplifies a case in which an output from each weak classifier is not a binary value of PASS or NG, but a ternary value of OK (partial image contains a detection target object (face)), PASS, or NG. In FIG. 3, the same reference numerals as those in FIG. 2 denote the same parts, and a description thereof will not be repeated. Weak classifiers 310 to 350 are different from the weak classifiers 210 to 250 in FIG. 2 in that the output is a ternary value, as described above. If it is determined in the determination process by one of the N weak classifiers 310 to 350 that the partial image contains a detection target object (face) (determination result “OK”), the determination process is aborted because no subsequent determination process need be performed. This is different from the weak classifiers explained with reference to FIG. 2. The remaining process by the N weak classifiers 310 to 350 is the same as that by the weak classifiers described with reference to FIG. 2.
The face detection process is implemented by repetitively executing the determination process by the weak classifiers. At this time, if it is configured to finalize determination results as many as possible at the beginning of a series of weak classifiers, the expected value of the total process count of the weak classifiers can be suppressed small. Also, reduction of the calculation amount of the weak classifiers and an increase in process speed can be expected as a whole.
However, the average calculation amount can be reduced in the above way in terms of the algorithm. However, when processes are pipelined to further increase the speed, it becomes difficult to schedule the processes because process amounts for respective windows are not uniform.