There are known driver assistance systems that monitor the driver's eyes using images picked up by an in-vehicle camera to thereby detect the driver's inattentive driving and oversight, and alert the driver of them. Specifically, these driver assistance systems are designed to extract face images each containing a facial region (facial pattern) from sequentially inputted images captured by an in-vehicle camera; the facial region contains predetermined facial features, such as right and left eyes, nose, and mouth of the driver. These driver assistance systems are also designed to detect the location of the face within the face image. In order to meet requirements for vehicle safety improvement request, it is important for these driver assistance systems to immediately detect the position of the face within the face image with high accuracy.
Boosting, which trains a plurality of weak classifiers, is well known as a machine-learning algorithm. Using the combination of the boosted weak classifiers to recognize a target image, such as a face image, in a plurality of images ensures the accuracy and robustness of target-image recognition.
Let us describe the boosting algorithm assuming that: an input image (expressed as an array of pixels) is given as x, the output of an n-th trained weak classifier in a plurality of weak classifiers is given as fn(x), the weight or importance given to the n-th trained weak classifier is given as wn, the number of the plurality of weak classifiers is given as Nf.
In the boosting algorithm, a score S1:Nf(x) as the summation of the outputs fn(x) (n=1, 2, . . . , Nf) of the trained weak classifiers respectively weighted by the weights wn (n=1, 2, . . . , Nf) is expressed as the following equation [1]:
                                          S                          1              ⁢                              :                            ⁢              Nf                                ⁡                      (            x            )                          =                              ∑                          n              =              1                        Nf                    ⁢                                    w              n                        ⁢                          f              n                                                          [        1        ]            
where the weights wn are normalized to meet the following equation [2]:
                                          ∑                          n              =              1                        Nf                    ⁢                      w            n                          =        1                            [        2        ]            
Then, the output F1:Nf(x) of the combination of the trained weak classifiers is determined based on the score S1:Nf(x) in accordance with the following equation [3]:
                                          F                          1              ⁢                              :                            ⁢              Nf                                ⁡                      (            x            )                          =                  {                                                                                          1                    ⁢                                          :                                        ⁢                                                                  S                                                  1                          ⁢                                                      :                                                    ⁢                          Nf                                                                    ⁡                                              (                        x                        )                                                                              ≥                  0.5                                                                                                                          0                    ⁢                                          :                                        ⁢                                                                  S                                                  1                          ⁢                                                      :                                                    ⁢                          Nf                                                                    ⁡                                              (                        x                        )                                                                              <                  0.5                                                                                        [        3        ]            
Specifically, when the score S1:Nf(x) of the outputs fn(x) (n=1, 2, . . . , Nf) of the trained weak classifiers respectively weighted by the weights wn (n=1, 2, . . . , Nf) is equal to or greater than a threshold value of 0.5, the final output of the combination of the trained weak classifiers is a value “1” of TRUE; this value “1” indicates that the input image x is likely to a face image. That is, the boosting algorithm identifies that the input image x is a target image that is likely to contain a facial region.
Otherwise, when the score S1:Nf(x) of the outputs fn(x) (n=1, 2, . . . , Nf) of the trained weak classifiers respectively weighted by the weights wn (n=1, 2, . . . , Nf) is less than the threshold value of 0.5, the final output of the combination of the trained weak classifiers is a value “0” of FALSE, this value “0” indicates that the input image x is not likely to a face image. That is, the boosting algorithm identifies that the input image x is not an object image that is likely to contain a facial region.
As described above, the boosting process requires a large number of weak classifiers so as to improve the accuracy and/or robustness of object-image recognition. However, the more the number of weak classifiers to be used increases, the more time taken to identify whether the input image is an object image increases. In other words, there is a trade-off between the robustness of the object-image recognition and the speed thereof.
U.S. Pat. No. 7,099,510 discloses an algorithm, referred to as Viola and Jones algorithm, designed based on the boosting algorithm; this Patent Publication will be referred to as reference 1.
Specifically, Viola and Jones algorithm uses a cascade of a plurality of classifiers based on the boosting algorithm. The Viola and Jones algorithm applies the series of classifiers to every input image in the order from the initial stage to the last stage. Each of the classifiers calculates, for each input image, the score as the summation of the outputs of applied classifiers. Each of the classifiers discards some of the input images whose scores are less than a threshold value predetermined based on the number of applied stage(s) to thereby early eliminate them as negative images to which no subsequent stages are applied. This algorithm can speed up object-image recognition.