1. Field of the Invention
The present invention relates to an information processing method and apparatus for image processing, image recognition, image composition, information analysis, and the like.
2. Description of the Related Art
The information processing field frequently handles multidimensional array information. In this field, processes associated with image processing, image recognition, image composition, and statistical processing often calculate and use a sum total value of elements within a range of a specific area. For this purpose, as an example of an application used to execute information processing, a spreadsheet application such as Excel™ available from Microsoft has a function of calculating a sum of elements within a designated rectangle in a two-dimensional table. Also, a programming language for calculations such as MATLAB™ available from Mathworks has a function of calculating a sum of elements in a matrix.
In the computer graphics field, F. C. Crow has proposed a concept of accumulated image information called a rectangular summed-area table with respect to source input image information (F. C. Crow, “Summed-Area Tables For Texture Mapping”, Computer Graphics, 1984. (to be referred to as Reference 1 hereinafter)). In this Reference 1, a two-dimensional array having the same size (the same number of elements) as an input image is defined as a summed-area table, I(x, y) is defined as a pixel value at a coordinate position (x, y) of the input image, and a component C(x, y) at the same position (x, y) of the summed-area table is defined by:
                              C          ⁡                      (                          x              ,              y                        )                          =                              ∑                                                            x                  ′                                ≤                x                                                              y                  ′                                ≤                y                                              ⁢                      I            ⁡                          (                                                x                  ′                                ,                                  y                  ′                                            )                                                          (        1        )            That is, as shown in FIG. 4A, a sum total value of pixels in a rectangle, which has pixels at an origin position (0, 0) and the position (x, y) in the original input image as diagonal points, assumes the value C(x, y) at the position (x, y) in the summed-area table shown in FIG. 4B. Note that the original summed-area table of Reference 1 defines the lower left position of an image as an origin position. However, this specification uses the upper left as the origin of an image in order to maintain consistency with the following description.
According to this definition, a sum of I(x, y) in an arbitrary rectangular area horizontally or vertically allocated on an input image can be calculated by referring to only four points on the summed-area table using the following equation. For example, as shown in FIG. 4C, a sum total C(x0, y0; x1, y1) of pixel values in a rectangular area having, as diagonal points, (x0, y0) and (x1, y1) can be calculated by:C(x0,y0;x1,y1)=C(x0−1,y0−1)−C(x0−1,y1)−C(x1,y0−1)+C(x1,y1)  (2)In this manner, a sum total of values in an arbitrary rectangular area on an image can be calculated quickly.
In the image recognition field, Viola and Jones use the term “Integral Image” to refer to accumulated image information equivalent to the summed-area table. According to Viola and Jones, by cascading a large number of weak discriminators each including a plurality of rectangular filters using this “Integral Image”, high-speed face detection processing is implemented (P. Viola, M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Vol. 1, pp. 511-518, December 2001. (to be referred to as Reference 2 hereinafter)). Also, Japanese Patent Laid-Open Nos. 2004-185611, 2005-044330, and 2005-293061 describe an object detection method based on the idea of Reference 2.
In a pattern identification method described in Reference 2, as shown in FIG. 7A, a processing window 801 as a rectangular area having a specific size is shifted within an image 800 as a processing target. It is then determined whether or not the processing window 801 at each shift destination includes a human face. Face detection processing in the processing window is executed in a plurality of stages, and different combinations of weak discriminators are assigned to respective stages. Each weak discriminator detects a so-called Haar-like feature, and is configured by a combination of rectangular filters. Non-patent Reference 2 implements high-speed pattern identification represented by face detection with this configuration.
Upon generation of the aforementioned accumulated image information called the summed-area table or integral image from input image information, the bit precision of a buffer used for storing the generated information is normally specified based on a worst case value that may be calculated. Then, based on the bit precision, a buffer size (a size of a temporary holding area) is determined. That is, letting Ximg be the width (the number of pixels in the horizontal direction) of input image information, Yimg be the height (the number of pixels in the vertical direction), and Nimg (bits) (Nimg is a positive integer) be the bit precision of each pixel, a worst case value Cmax corresponds to the sum total value of all pixels when all the pixel values assume a maximum value Imax. That is, Cmax is given by:
                              C          max                =                                            ∑                                                0                  ≤                  x                  <                                      X                    img                                                                    0                  ≤                  y                  <                                      Y                    img                                                                        ⁢                          I              ⁡                              (                                  x                  ,                  y                                )                                              =                                    I              max                        ⁢                          X              img                        ⁢                          Y              img                                                          (        3        )            
Therefore, a 1-element bit precision Nbuf of a buffer used to store the accumulated image information must be a bit precision Nbuf—max that can store Cmax, and assumes a value considerably larger than Nimg although it depends on an image size. For example, when an 8-bit Grayscale image having a VGA size is used as an input image, Nimg=8, Ximg=640, and Yimg=480. Therefore, Cmax=78336000=4AB5000h, that is, a buffer having a precision Nbuf=Nbuf—max=27 bits must be created. When the accumulated image information for an entire area with respect to input image information must be temporarily held, a memory area such as a RAM as large as Nbuf—max×Ximg×Yimg=8294400 bits must be created, thus limiting processing resources. Hence, the bit precision Nbuf of the buffer must be reduced by an arbitrary method. In particular, when processing based on such accumulated information is implemented in hardware, a considerable problem is posed since work memory size is directly related to circuit scale. Even in the case of software processing, however, if Nbuf can be reduced, a smaller buffer can be used, thus reducing resource consumption.
Reference 1 describes one method of reducing the bit precision Nbuf of the buffer. That is, input information is divided into, for example, blocks of 16×16 pixels, and Summed-area tables are independently calculated for respective blocks. If the input information has a bit precision Nimg=8 bits, the bit precision of the buffer required at this time is 16 bits. In addition, a 32-bit value of an original Summed-area table corresponding to a pixel position which neighbors the upper left end corner of each block in an upper left oblique direction is held. In order to restore a value corresponding to a desired position, a 32-bit value held by a block including that position need only be added to a 16-bit value at that position.
However, these calculations do not suffice to actually restore the value of the original Summed-area table. That is, conventionally, a sum total value of a desired area can be calculated by making simple additions and subtractions given by equation (2) with reference to four points. However, since a calculation required to restore a value of each point is added, the calculation load increases considerably. When this method is implemented by hardware processing, the circuit scale required for calculations increases. Even in the case of implementation by software processing, processing speed is reduced.