1. Field of the Invention
The present invention relates to an image processing apparatus which generates a resized image and performs processing such as image recognition, a control method thereof, and a storage medium.
2. Description of the Related Art
To detect target objects of various sizes when performing image recognition processing such as face detection, it is common practice to generate pyramid images by resizing an original image, and perform detection processing for each pyramid image.
Japanese Patent Laid-Open No. 2008-102611 discloses a method of sequentially resizing a read image at a predetermined ratio to generate pyramid images, and then performing face discrimination processing in order to detect a face at a plurality of sizes. FIG. 1 exemplifies pyramid images for detecting a target object at various sizes. An input image 101 of 320 pixels×240 pixels is resized at every 1/1.2 ratio in each of the horizontal and vertical directions, thereby generating resized images 102 to 109 (resized images A to H) at eight levels in the same way. Target object detection processing is performed for the input image and resized images, that is, images of nine resolutions. As a result, the target object can be detected at different sizes.
Japanese Patent Laid-Open No. 2008-210009 discloses an image discrimination apparatus which performs discrimination processing by an arrangement in which resized image data generated by a multi-resolution processor are sequentially supplied to a normalization processor, feature amount derivation unit, and recognition unit via a pipeline connection without the mediacy of a bus.
Access processing to an image memory described in Japanese Patent Laid-Open No. 2008-102611 when sequentially resizing a read image to generate pyramid images and performing detection processing after pyramid image generation will be explained with reference to FIGS. 2A and 2B. FIG. 2A is a view for explaining image access in resize processing of generating pyramid images. When generating pyramid images at nine resolutions, as shown in FIG. 1, first, the input image 101 is read out from the image memory and undergoes resize processing, and the resized image 102 is written in the image memory. Then, the resized image 102 is read out from the image memory and undergoes resize processing, and the resized image 103 is written in the image memory. As for the resized images 103 to 109, readout processing from the image memory, resize processing, and write processing in the image memory are repeated in the same way until the minimum resized image 109 is written in the image memory.
FIG. 2B is a view for explaining image access when performing detection processing after pyramid image generation. First, the input image 101 is read out from the image memory and undergoes detection processing at the highest resolution. An output from detection processing is information about a detected target object, and write processing of an image in the image memory is not executed, unlike FIG. 2A. The resized images 102 to 109 are also read out from the image memory and undergo detection processing without performing write processing of an image in the image memory.
The memory access count can be represented by a pixel count when the pyramid images described with reference to FIG. 1 are processed as shown in FIGS. 2A and 2B. A readout count Ra from the image memory and a write count Wa in the image memory in FIG. 2A, and a readout count Rb from the image memory in FIG. 2B are calculated in accordance with equations (1):
                                                                        Ra                =                                                      76800                    +                    53200                    +                    36686                    +                    ⋯                    +                    5655                                    =                  235688                                                                                                        Wa                =                                                      53200                    +                    36686                    +                    25392                    +                    ⋯                    +                    3888                                    =                  162776                                                                                                        Rb                =                                                      76800                    +                    53200                    +                    36686                    +                    ⋯                    +                    5655                    +                    3888                                    =                  239576                                                                    }                            (        1        )            As shown in FIG. 1, the pixel count of the input image 101 is 76,800, that of the resized image 102 is 53,200, that of the resized image 103 is 36,686, . . . , that of the resized image 108 is 5,655, and that of the resized image 109 is 3,888. The readout count Ra is the sum of the pixel counts of the input image 101 and resized images 102 to 108. The write count Wa is the sum of the pixel counts of the resized images 102 to 109. The readout count Rb is the sum of the pixel counts of the input image 101 and resized images 102 to 109.
An access count N (pixel count) to the image memory in the processing of FIGS. 2A and 2B is calculated by adding all the readout count Ra, write count Wa, and readout count Rb in accordance with equation (2):N=Ra+Wa+Rb=235688+162776+239576=638040  (2)
As is apparent from equations (1), the input image 101 and resized images 102 to 108 are read out twice from the image memory. For this reason, the method disclosed in Japanese Patent Laid-Open No. 2008-102611 increases the access count to the image memory and takes time for processing.
Further, access to an image memory described in Japanese Patent Laid-Open No. 2008-210009 while performing resize processing for a read image and performing detection processing for the resized images using a pipeline arrangement will be explained with reference to FIG. 3.
FIG. 3 shows access to the image memory when resize processing from a resolution corresponding to the input image 101 to a resolution corresponding to the resized image 109, and detection processing are executed with the pipeline arrangement. At all detection resolutions, the input image 101 is read out from the image memory. The input image 101 is resized, as needed, and is directly transferred to the detection processing unit. An output from the detection processing unit is information about a detected target object, and write processing of an image in the image memory is not executed.
The memory access count can be represented by a pixel count when the pyramid images described with reference to FIG. 1 are processed as shown in FIG. 3. A readout count R from the image memory is calculated in accordance with equation (3):R=76800+76800+76800+ . . . 76800=76800×9=691200  (3)
In processing as described in FIG. 3, only readout from the image memory is executed. Thus, the access count N (pixel count) to the image memory is N=R=691,200 pixels.
However, the method disclosed in Japanese Patent Laid-Open No. 2008-210009 does not perform write processing in the image memory, but increases the access count to the image memory because the resolution of an image read out from the image memory is high.