1. Field of the Invention
The present invention relates to information processing apparatuses and methods, recording media, and programs. More specifically, the present invention relates to an information processing apparatus and method, a recording medium, and a program that allow quick detection of an object of interest, such as a face image, by a small amount of computation.
2. Description of the Related Art
Hitherto, various techniques for detecting faces from complex video scenes based only on grayscale patterns of image signals without considering motion have been proposed. For example, a face detector described in United States Unexamined Patent Application Publication No. 2002/0102024 employs AdaBoost, which uses filters, such as Haar-basis filters, as weak classifiers (weak learners). The face detector is capable of quickly calculating weak hypotheses using integral images and rectangle features described later.
FIGS. 1A to 1D are schematic diagrams showing rectangle features described in United States Unexamined Patent Application Publication No. 2002/0102024. As shown in FIGS. 1A to 1D, according to the techniques described in the document, a plurality of filters (weak hypotheses), regarding input images 142A to 142D, calculates the respective sums of luminance values in adjacent rectangular boxes of the same size and outputs the difference between the respective sums of luminance values in the two rectangular boxes. For example, regarding the input image 142A, a filter 154A that subtracts the sum of luminance values in a rectangular box 154A-2, shown as shaded, from the sum of luminance values in a rectangular box 154A-1 is constructed. Such a filter based on two rectangular boxes is referred to as a 2-rectangle feature.
Regarding the input image 142C, a rectangular box is divided into three rectangular boxes 154C-1 to 154C-3, and a filter 154C that subtracts the sum of luminance values in the middle rectangular box 154C-2, shown as shaded, from the sum of luminance values in the rectangular boxes 154C-1 and 154C-3 is constructed. Such a filter based on three rectangular boxes is referred to as a 3-rectangle feature. Regarding the input image 142D, a rectangular box is divided vertically and horizontally into four rectangular boxes 154D-1 to 154D-4, and a filter 154D that subtracts the sum of luminance values in the rectangular boxes 154D-2 and 154D-4, shown as shaded, from the rectangular boxes 154D-1 and 154D-3 is constructed. Such a filter based on four rectangular boxes is referred to as a 4-rectangle feature.
Now, an example of classifying a face image shown in FIG. 2 as a face using a rectangle feature 154B shown in FIG. 1B will be described. The 2-rectangle feature 154B is vertically divided into two rectangular boxes 154B-1 and 154B-2, and the sum of luminance values in the rectangular box 154B-1, shown as shaded, is subtracted from the sum of luminance values in the rectangular box 154B-2. Based on the fact that a region of an eye has a lower luminance value than a region of the cheek, it is possible to estimate at a certain probability whether the input image of a human face (object of interest) 138 corresponds to a face or not (positive or negative) based on an output value of the rectangular feature 154B. This is used as a weak classifier in AdaBoost.
In order to allow detection of face regions of various sizes included in input images, regions of various sizes (hereinafter referred to as search windows) must be cut out to determine whether images correspond to faces. However, for example, in the case of an input image consisting of 320×240 pixels, face regions (search windows) of approximately 50,000 sizes are included, and it takes an extremely long time to perform calculation for all the window sizes. Thus, according to United States Unexamined Patent Application Publication No. 2002/0102024, images referred to as integral images are used. As shown in FIG. 3, an integral image is such an image that a pixel (x, y) 162 in an input image 144 has a value corresponding to the sum of luminance values of pixels in a region that is left above the pixel 162, as expressed in expression (1) below. That is, the value of the pixel 162 is the sum of luminance values of pixels in a region 160 that is left above the pixel 162. Hereinafter, an image having pixel values according to expression (1) below will be referred to as an integral image.
                              I          ⁡                      (                          x              ,              y                        )                          =                              ∑                                                            x                  ′                                <                x                            ,                                                y                  ′                                <                y                                              ⁢                      s            ⁡                          (                                                x                  ′                                ,                                  y                  ′                                            )                                                          (        1        )            
By using an integral image, it is possible to quickly perform calculation regarding a rectangular box of an arbitrary size. For example, as shown in FIG. 4, regarding an upper left rectangular box 170, a rectangular box 172 that is right to the rectangular box 170, a rectangular box 174 that is below the rectangular box 170, and a rectangular box 176 that is diagonally right below the rectangular box 170, let the four corners of the rectangular box 176 be denoted by p1, p2, p3, and p4 clockwise from the top left corner, and integral images thereof by P1, P2, P3, and P4. P1 corresponds to a sum A of luminance values in the rectangular box 170 (P1=A). P2 corresponds to the sum of the sum A and a sum B of luminance values in the rectangular box 172 (P2=A+B). P3 corresponds to the sum of the sum A and a sum C of luminance values in the rectangular box 174 (P3=A+C). P4 corresponds to the sum of the sums A, B, C and a sum D of luminance values in the rectangular box 176 (P4=A+B+C+D). The sum D of luminance values in the rectangular box 176 can be calculated by P4−(P2+P3)−P1. That is, the sum of luminance values in a rectangular box can be calculated quickly by adding or subtracting pixel values at the four corners of the rectangular box. Usually, an input image is converted into different scales, and search windows having the same size as learning samples used for learning are cut out from the scaled images, allowing detection by search windows of different sizes. However, as described earlier, the amount of computation becomes huge when input images are scaled so that search windows of all sizes can be set. Thus, according to the techniques described in United States Unexamined Patent Application Publication No. 2002/0102024, integral images that allow quick calculation of the sums of luminance values in respective rectangular boxes are used, and the amount of computation is reduced by using rectangle features.
However, the face detector described in United States Unexamined Patent Application Publication No. 2002/0102024 is capable of detecting only objects of sizes that are integer multiples of the sizes of learning samples used for learning. This is because according to the techniques described in the document, instead of changing the size of search window by scaling an input image, an input image is converted into an integral image and face regions in different search windows are detected using the integral image. That is, since the integral image is discrete on a pixel-by-pixel basis, for example, when a window size of 20×20 is used, it is not possible to set a size of 30×30 for a search window, so that it is not possible to detect a face of this window size.
Furthermore, since only differences between luminance values of adjacent rectangular boxes are used as rectangle features in order to increase computation speed, it is not possible to recognize change in luminance between remote rectangular boxes. Thus, the ability of detecting objects is limited.
Although it is possible, for example, by scaling an integral image, to perform searching by a window of an arbitrary size and to use differences between luminance values in remote rectangular boxes, scaling an integral image increases the amount of computation, canceling the effect of increasing computation speed by using the integral image. Furthermore, in order to consider differences between luminance values in remote rectangular boxes, the number of filters needed becomes huge. This also increases the amount of computation.