In recent years, various methods for extracting specific objects, such as human faces, from image data have been proposed, and are being put to actual use.
Among such methods, the method disclosed in Viola, P. & Jones, M.'s “Rapid Object Detection using a Boosted Cascade of Simple Features” (Proc. of Computer Vision and Pattern Recognition, December 2001, IEEE Computer Society, pp. 511-518) is garnering attention due to its high speed (this document shall be referred to as Document 1 hereinafter). This method cascade-connects classifiers made up of multiple weak classifier groups generated using a learning algorithm with boosting, and carries out processing while performing terminating judgment for each classifier. Note that the details of learning algorithms with boosting are disclosed in, for example, Yoav Freund and Robert E. Schapire's “A decision-theoretic generalization of on-line learning and an application to boosting” (in Eurocolt '95, Computational Learning and Theory; Springer-Verlag, 1995, pp. 23-37).
FIG. 24 illustrates the overall structure used for this technique. 2401 to 240n (where n is a natural number) are classifiers (also called “stages”) generated through learning, and each classifier is configured of, for example, multiple rectangular filters whose processing load is low. Each of the rectangular filters used at this time are generally called “weak classifiers” due to the fact their classification capabilities are not very high.
FIG. 25 is a diagram illustrating a rectangular filter for extracting a specific object. 2501a to 2501c are examples of image blocks from which extraction is to be performed using the rectangular filters, and are partial images of a predetermined size cut out from the overall image data.
FIG. 26, meanwhile, is a diagram expressing image data to be processed. In this diagram, 2601 indicates a single frame's worth of the image data to be processed. Meanwhile, 2602 indicates a processing block, which is the unit of processing used when the single frame's worth of the image data to be processed is actually processed; the processing block is a partial image whose size corresponds to the size of rectangular filters 2502a to 2502c. Characteristics of a local region within the partial image are extracted by using the rectangular filters 2502a to 2502c to calculate the differences between the sum of region data indicated by the white regions and black regions.
A specific object within the image data 2601 is extracted by sequentially scanning the entirety of the image data 2601 using a predetermined step size based on the partial image 2602 and processing the image data using the unit of processing.
Each of the classifiers 2401 to 240n accumulate evaluation values that are the output of the determination results for each rectangular filter, and determine whether or not the specific object is present by performing a thresholding process using an identification threshold. As described above, the classifiers 2401 to 240n are cascade-connected, and each classifier advances the processing to the following classifier only when it has been determined that the specific object is present in the partial image 2602 of the unit of processing.
In this manner, the stated method determines whether the partial image of the unit of processing is the specific object or is not the specific object at each classifier, also called a stage; in the case where it is determined that the partial image is not the specific object, the computations are ended immediately. With actual image data, the partial image is often determined to not be the specific object in the initial stages, and therefore a high-speed extraction process can be implemented.
Here, consider a case where a function for extracting a specific object using a method such as that described in Document 1 is implemented in an embedded device or the like. Such a case requires the tradeoff between the extraction accuracy and the extraction process speed to be adjusted based on the purpose of the extraction and the computational performance of the embedded device.
A case where a unit implemented through hardware (or an integrated circuit) is installed, as a common device, in multiple embedded devices whose specifications differ from one another can be given as one example. In such a case, it is desirable to adjust the tradeoff between the extraction accuracy and the extraction process speed based on the operational clocks, use conditions, and so on of the embedded devices in which the unit is installed.
Furthermore, even when the unit is installed in identical embedded devices, there are cases where the required extraction accuracy and extraction process time differ depending on the type of applications installed in the embedded devices. The performance of the embedded devices can also be optimized in such cases if the tradeoff between the extraction accuracy and the extraction process speed can be adjusted.
Such situations have conventionally been addressed by reducing the resolution of the image data to be processed, broadening the step size used when scanning the image data to be processed based on the partial image, and so on. For example, Document discloses a method for changing the step size.
However, when addressing situations such as these, there is a problem that the tradeoff between the extraction accuracy and the extraction process speed cannot be flexibly realized. For example, when attempting to control the extraction process speed by changing the resolution, it is necessary to convert the resolution of the inputted image data to the corresponding resolution. It is further necessary to prepare multiple classifiers that correspond to each resolution.
Moreover, even when the step size is changed, there is a limit on the types (for example, setting the step size to a unit of n pixels) of tradeoff points (that is, an appropriate combination of extraction accuracy and extraction process speed), resulting in the problem that control cannot be implemented in a flexible manner.
Such a situation can conceivably be addressed as disclosed in Japanese Patent Laid-Open No. 2005-100121, which takes into consideration the amount of computations performed by the embedded device to which the unit is to be applied. Here, learning is performed in advance using multiple classifiers, and a group of classifiers suited to the embedded device is selected from multiple groups of classifiers when performing the extraction process. However, in this case, it is necessary to prepare multiple differing classifier groups for each tradeoff point in advance, resulting in the problem that an increased amount of resources, such as memory and so on, will be required. This is particularly problematic in cases where the amount of parameter information is high, such as when a large number of classifiers are to be configured.