1. Field of Application
The present invention relates to an image recognition apparatus for recognizing an object image within an input image, by processing which applies a plurality of weak classifiers.
2. Description of Related Art
Types of driver support system are known which utilize images (expressed as arrays of digitized luminance values) obtained from a camera installed in a vehicle, for monitoring the eyes of the driver while the vehicle is being driven. Such a system can for example generate warning indications to the driver when the driver's eyes attain a condition indicating a lapse of attention, such as eye closing.
With such a driver support system, it is important that the apparatus can extract a face image (i.e., a region containing the eyes, nose and mouth of a person) from within an input image supplied from the camera, and use the face image to rapidly and reliably detect the position of the face within the input image, and thereby enable the condition of the eyes to be monitored. One type of apparatus known for performing such face position detection uses a cascade of mathematical operators known as weak classifiers, for example as described in U.S. Pat. No. 7,099,510 B2, referred to in the following as reference 1. Each weak classifier is trained beforehand by a learning procedure using a number of training images, e.g., using the AdaBoost technique, for detecting a pattern of main features of the recognition-subject image (eyes, nose, mouth, etc., in the case of a face image). With the method described in reference 1 (generally known as the Viola and Jones algorithm), recognition is based on simple rectangular features referred to as “Haar-like” features. A cascade of such weak classifiers is applied to an object image in sequence, to constitute a “strong classifier”.
When used in processing for recognition of a face image, such a cascade of weak classifiers is successively applied to a sub-image extracted from an input image by a scanning window. Each weak classifier responds to the object image by producing a 1 or 0 decision value, respectively indicating whether or not recognition of a face image has occurred.
Each decision value is multiplied by a weight that has been established (as a result of the boosting procedure) for the corresponding weak classifier. Basically, the assigned weight values relate to the recognition accuracies, i.e., with weak classifiers having a low accuracy of recognition (but high speed of processing) being assigned relatively low weight values. Each time the object image has been newly evaluated by a weak classifier of the cascade, a judgement is made as to the likelihood that the recognition-subject image has been recognized. The judgement is made based on the weighted decision values obtained up to that point (e.g., by comparing the total of these values with a threshold value). If the likelihood is insufficient, processing of that object image is immediately halted and the results obtained are discarded, and the next sub-image is extracted and similarly processed as an object image.
However with such a type of image recognition apparatus, it may be difficult to reliably perform recognition if there are large variations between instances of a recognition-subject image. For example in the case of face recognition, all face images have a set of main features, i.e., eyes nose, mouth, etc. However in some instances, for example when the face is that of a person wearing a head scarf which partially covers the forehead, or sunglasses covering the eyes, accurate recognition may be difficult. Similarly, recognition may be unreliable if there are large intrinsic differences between features in various instances, e.g., variations in shape of mouth, facial expression, etc. This unreliability results because:
(1) such instances of a face image are likely to be rejected in the early stages of the sequence of applying the weak classifiers, and
(2) erroneous recognition may occur due to the fact that, in applying the weak classifiers, no consideration is given to differences in the way in which respective weak classifiers tend to respond to specific accessories such as a face scarf, etc., or to large differences in facial features, i.e., the operation does not take into account the respectively different sensitivities of the weak classifiers to such accessories or to such differences in facial features.
This is a basic problem of the prior art.
In many cases of incorrect rejection, if the processing of the object image had been continued to the later stages of the sequence (where there is a higher accuracy of detecting the main features of a face) it is probable that recognition would have been achieved.
Hence it is a disadvantage of such a prior art type of image recognition apparatus that evaluation of an object image by weak classifiers in sequence is immediately halted if it is judged (based only on results obtained up to that point in the sequence) that the object image is unlikely to correspond to the recognition-subject image, and that respectively different responses of the weak classifiers to specific accessories such as a face scarf, etc., or to large differences in facial features are not take into consideration. Thus, the object image cannot be evaluated by later-stage weak classifiers, even if these may have a higher recognition accuracy than the early-stage weak classifiers. Alternatively, an incorrect sub-image may be assigned a high probability of being a recognition-subject image (by the complete cascade of weak classifiers), due to one or more of the weak classifiers of the cascade having an excessively high sensitivity to some specific feature appearing in that sub-image. For example, such a weak classifier may have an excessive tendency to erroneously recognize the position of the boundary between a head scarf and forehead as the position of the eyes in a face image. As a result, a sub-image which should be rejected at an intermediate stage of the cascade of weak classifiers may in fact reach the final stage of the cascade, and thus be erroneously assigned a high probability of being a recognition-subject image.
For these reasons it is difficult to achieve sufficient generality of recognition, in particular, rendering it difficult to achieve reliable detection of face position within an input image.
In the case of face images that include a specific secondary feature such as a head scarf or sunglasses in addition to the main features (such images being referred to as “accessory-disposed face images” in the following) it might be envisaged that the problem could be overcome by retraining a weak classifier which has excessive sensitivity to such a secondary feature, by using training images which include instances of such accessory-disposed faces. However with such a prior art apparatus in which the weak classifiers are applied in a fixedly predetermined cascade, retraining of any one of the weak classifiers has an effect upon the weak classifiers which are located downstream from it in the cascade sequence. Thus it is not possible to simply perform retraining of a specific weak classifier. Instead, retraining of all of the weak classifiers of the cascade would be necessary.
Thus with the prior art it would be necessary to repeat the training process for all of the weak classifiers, using a large number of training images (for example with approximately 20,000 training images being used in the case of an embodiment described in reference 1). This has the disadvantage of requiring a substantial amount of time and effort.
Moreover if such retraining of all of the weak classifiers were to be performed without special emphasis on recognition of accessory-disposed face images (i.e., without using a large proportion of training images that are instances of an accessory-disposed face) then there could be insufficient reliability of recognizing such images.
On the other hand, if the weak classifiers were to be retrained with particular emphasis given to accessory-disposed face images, then the system might thereby become incapable of reliably recognizing usual face images (without such accessories). There would thus be a danger of decreased generality.