1. Field of the Invention
The present invention relates to an image processing apparatus and a method thereof, and more particularly to an image processing apparatus and a method thereof for detecting a predetermined target object (subject) from an image inputted from an image input device such as a digital camera or the like.
2. Description of the Related Art
Conventionally, a digital camera or a digital video for detecting a specific subject such as an individual or a face in an input image and performing a process which is suitable for the detected subject, has been proposed.
In Japanese Patent No. 3164692, a camera has been proposed which includes an individual recognition unit for recognizing that a subject is an individual, and a distance detection unit for detecting a distance to the subject. This camera includes a unit for adjusting a focal length, a focal position and an aperture based on the detected distance to the subject, and controlling an entire face of the individual to be approximately within a depth of field.
In Japanese Patent Application Laid-Open No. 2001-309225, a camera has been proposed which detects more than one face of persons included in an image in order to improve the quality of the image.
In Japanese Patent Application Laid-Open No. 2003-107555, an image sensing apparatus has been proposed which has a face detection unit for detecting a face of an individual from image data, and controls an exposure based on a result of the detection. This image sensing apparatus includes a photometry unit for performing photometry with respect to a photometry area which is set to the individual's face detected by the face detection unit, and an exposure control unit for calculating an exposure amount based on a result of the photometry of the individual's face and performing an exposure control based on the calculated exposure amount.
As a face detection processing method for detecting the face in the image, various methods have been proposed.
For example, in “Rapid object Detection using a Boosted Cascade of Simple Features”, P. Viola, M. Jones, Proc. of IEEE Conf. CVPR, 1, pp. 511-518, 2001, a high speed face detection method has been proposed. Moreover, in “Convolutional Spiking Neural Network Model for Robust Face Detection”, M. Matsugu, K. Mori, et. al, 2002, International Conference On Neural Information Processing (ICONIP02) and “Neural Network-Based Face Detection”, H. A. Rowley, S. Baluja, T. Kanade, 1996, Computer Vision and Pattern Recognition (CVPR '96), neural networks for performing a face detection have been proposed.
In the case of sensing the image with a digital camera, where, how large and how many faces exist in the image vary substantially depending on a sensing condition. Thus, it is required that the face detection processing method mounted on the digital camera does not depend on a position, the size or the number of faces in the image.
A basic concept of detecting a specific pattern such as the face or the like (hereinafter referred to as “detection pattern”) from the image is as follows. First, an area of a specific size is clipped from the image, a feature of the area and a feature of the detection pattern are compared and investigated. If those features are similar, it is determined that the clipped area is the detection pattern and the specific pattern such as the face or the like exists in the area.
Thus, as shown in FIG. 14, by clipping the area of a specific size sequentially from an image 1401 and investigating those clipped areas respectively, it is possible to perform a detection independent of the position and the number of faces in the input image. Moreover, in order to perform a detection independent of the size of a face, as shown in FIG. 15, a plurality of images obtained by converting a resolution of the input image discretely, called pyramid images, are prepared, and the areas are clipped and investigated from the images of respective resolutions.
Moreover, it is also possible to prepare a plurality of types of detection patterns of specific different sizes and perform the detection, without performing the resolution conversion with respect to the image.
A clipping position, each displacement, that is to say, a positional interval due to the resolution conversion, and an interval of the resolutions depend on positional robustness and size robustness of each detection method. For example, in the case where the pattern detection is performed with respect to one clipped area and the detection pattern is not detected unless the detection pattern exists in the center of the area, it is necessary to move the clipping position pixel by pixel. Here, if the method has the positional robustness of ±2 pixels, it is possible to move the clipping position by 5 pixels. In other words, it is possible to reduce the amount of calculation. Also, similarly, in terms of the size, in a detection method having double size robustness with respect to one area, if images of two resolutions of 1/1 times and 1/2 times are prepared, it is possible to accept a quadruple size variation. On the other hand, in a detection method having the size robustness of about 1.4 times, it is necessary to prepare the images of four resolutions of 1/1 times, 1/√2 times, 1/2 times and 1/(2√2) times and the calculation amount increases.
Generally, however, if the robustness is raised, detection precision tends to be reduced. In other words, even when the detection pattern such as the face or the like exists in the image, the detection pattern may not be detected, or a pattern such as a background or the like which is completely different from the detection pattern may be detected incorrectly as the detection pattern. If such a detection error occurs, when the exposure is controlled in accordance with the face in the image as proposed in Japanese Patent Application Laid-Open No. 2003-107555, the exposure may be controlled in accordance with the area other than the face which has been incorrectly detected, for example, the background or the like.
Conversely, if the robustness is reduced, it is necessary to thicken a clipping positional interval and the interval of the resolutions, and the calculation amount increases in this case. When the calculation amount increases, it may take time to perform the detection process with respect to the image.