1. Field of the Invention
This invention relates to a device and a method for detecting an object such as an image of a face an a real time basis and also to a device and a method for group learning that are adapted to practice a device and a method for detecting an object according to the invention in a group.
This application claims priority of Japanese Patent Application No. 2003-394556, filed on Nov. 25, 2003, the entirety of which is incorporated by reference herein.
2. Related Background Art
Many techniques have been proposed to date to detect a face out of a complex visual scene, using only a gradation pattern of the image signal of the scene without relying on any motion. For example, a face detector described in Patent Document 1 (Specification of Published U.S. Patent Application No. 2002/0102024) listed below employs an AdaBoost that utilizes a filter like a Haar's base for a weak discriminator (weak learner). It can compute a weak hypothesis at high speed by using an image referred to as integral image and a rectangle feature as will be described in greater detail hereinafter.
FIG. 1 of the accompanying drawings schematically illustrates a rectangle feature described in Patent Document 1. Referring to FIG. 1 that shows input images 142A through 142D, with the technique described in Patent Document 1, there are prepared a plurality of filters (weak hypotheses) that are adapted to determine the total sum of the luminance values of adjacently located rectangular areas of a same size and output the difference between the total sum of the luminance values of one of the rectangular areas and the total sum of the luminance values of the other rectangular area. For example, input image 142A in FIG. 1 shows a filter 154A that subtracts the total sum of the luminance values of shaded rectangular box 154A-2 from the total sum of the luminance values of rectangular box 154A-1. Such a filter comprising two rectangular boxes is referred to as 2 rectangle feature. On the other hand, input image 142C in FIG. 1 has three rectangular boxes 154C-1 through 154C-3 formed by dividing a single rectangular box and shows a filter 154C that subtracts the total sum of the luminance values of the shaded rectangular box 154C-2 from the total sum of the luminance values of the rectangular boxes 154C-1 and 154C-3. Such a filter comprising three rectangular boxes is referred to as 3 rectangle feature. Furthermore, input image 142D in FIG. 1 has four rectangular boxes 154D-1 through 154D-4 formed by vertically and horizontally dividing a single rectangular box and shows a filter 154D that subtracts the total sum of the luminance values of the shaded rectangular boxes 154D-2 and 154D-4 from the total sum of the luminance values of the rectangular boxes 154D-1 and 154D-3. Such a filter comprising tour rectangular boxes is referred to as 4 rectangle feature.
Now, an occasion where an image of a face as shown in FIG. 2 is judged to be a face by means of a rectangle feature 154B as shown in FIG. 1 will be described below. The 2 rectangle feature 154B comprises two rectangular boxes 154B-1 and 154B-2 produced by vertically dividing a single rectangular box and is adapted to subtract the total sum of the luminance values of the shaded rectangular box 154B-1 from the total sum of the luminance values of the rectangular box 154B-2. It is possible to estimate the input image to be a face or not a face (correct interpretation or incorrect interpretation) by a certain probability by utilizing the fact that the luminance value of an eye area is lower than that of a cheek area in a human face (object) 138. This arrangement is utilized as one of the weak discriminator of an AdaBoost.
For detecting a face, it is necessary to cut out areas of various sizes (to be referred to as search windows) in order to detect areas of a face having various different sizes contained in an input image for the purpose of judging if the input image is a face or not. However, an input image of a face that is formed by 320×240 pixels, for instance, includes face areas (search windows) of about 50,000 different sizes and it is an extremely time consuming to carry out computational operations for all the windows. Thus, the technique of Patent Document 1 utilizes an image that is referred to as integral image. Referring to FIG. 3, an integral image is an image in which the (x, y)-th pixel 162 of the input image 144 represents a value that is equal to the total sum of the luminance values of the upper left pixels relative to the pixel 162 as expressed by formula (1) below. In other words, the value of the pixel 162 is equal to the total sum of the luminance values of the pixels contained in rectangular box 160 that is located upper left relative to the pixel 162. In the following description, an image in which each pixel has a value expressed by formula (1) below is referred to as integral image.
                    [                  formula          ⁢                                          ⁢          1                ]                                                                      I          ⁡                      (                          x              ,              y                        )                          =                              ∑                                                            x                  ′                                <                x                            ,                                                y                  ′                                <                y                                              ⁢                                          ⁢                      S            ⁡                          (                                                x                  ′                                ,                                  y                  ′                                            )                                                          (        1        )            
It is possible to carry out computational operations at high speed for a rectangular box of any size by using such an integral image. FIG. 4 shows four rectangular boxes including an upper left rectangular box 170, a rectangular box 172 located to the right of the rectangular box 170, a rectangular box 174 located under the rectangular box 170 and a rectangular box 176 located lower right relative to the rectangular box 170. The four corners of the rectangular box 176 are denoted by P1, P2, P3 and P4 that are arranged clockwise. Then, P1 has a value that is equal to the total sum A of the luminance values of the rectangular box 170 (P1=A) and P2 has a value that is equal to A+the total sum B of the luminance values of the rectangular box 172 (P2=A+B), whereas P3 has a value that is equal to A+the total sum C of the luminance values of the rectangular box 174 (P3=A+C) and P4 has a value that is equal to A+B+C+the total sum D of the luminance values of the rectangular box 176 (P4=A+B+C+D). The total sum D of the luminance values of the rectangular box D can be determined by using formula of P4−(P2+P3)−P1. Thus, the total sum of the luminance values of any of the rectangular boxes can be determined at high speed by arithmetic operations using the pixel values of the four corners of the rectangular box D. Normally, the input image is subjected to scale conversions and a window (search window) having a size same as the size of the learning samples to be used for learning is cut out from each image obtained as a result of scale conversions so as to make it possible to search for search windows with different sizes. However, a vast amount of computational operations has to be carried out for scale conversions of an input image for the purpose of cutting out search windows of all different sizes as described above. Thus, with the technique described in Patent Document 1, integral images that allow to determine the total sum of the luminance values of rectangular boxes at high speed is used so as to employ rectangle features in order to reduce the amount of computations operations.
However, a face detector described in above cited Patent Document 1 can detect only an object whose size is integer times as large as the size of the learning samples used for learning. This is because above cited Patent Document 1 proposes not to change the sizes of search windows by scale conversions of an input image but to transform an input image into integral images and detect face areas of different search windows by utilizing the integral images. More specifically, integral images are made discrete by a unit of pixel so that, when a window size of 20×20 is used, it is not possible to define a window size a 30×30 and hence it is not possible to detect a face of this window size.
Additionally, only the difference of the luminance values of adjacently located rectangular boxes are used for the above rectangle feature for the purpose of raising the speed of computational operations. In other words, it is not possible to detect the difference of luminance values of rectangular boxes that are separated from each other to consequently limit the capability of detecting on object.
While it is possible to search for windows of any sizes by scale conversions of the integral images and hence it is possible to utilize the difference of the luminance values of rectangular boxes that are separated from each other, a vast amount of computational operations will be required for scale conversions of integral images so that the advantage of the high speed processing operation using integral images will be offset. Additionally, the number of different types of filters will be enormous to accommodate the differences of the luminance values of rectangular boxes that are separated from each other and consequently a vast amount of computational operations will be required.