1. Field of the Invention
The present invention relates to a technique for estimating an attribute of an object included in an image.
2. Description of the Related Art
In recent years, object attribute estimation techniques for estimating an attribute of an object (to be referred to as an object attribute hereinafter) in an image have been extensively developed. Of these techniques, a face direction estimation technique estimates a direction of a human face in an image, and is an example of the object attribute estimation technique having a human face as an object and its direction as an attribute. In this manner, since human face direction information in an image can be automatically acquired, this technique has broad use ranges such as semantic understanding or composition estimation of an image and device control according to a human face direction.
As a typical method of face direction estimation, a method of preparing a plurality of face detectors dedicated to faces in specific directions are prepared, and estimating a face direction by integrating outputs from the respective face detectors is known (Japanese Patent Laid-Open No. 2007-226512). The face detector dedicated to a face in a specific direction is implemented by preparing in advance a large number of images including human faces whose face direction angles fall within a specific range (to be referred to as a face direction range hereinafter), and learning those images by a machine learning method. This method has a merit of estimating a face direction simultaneously with face detection. However, since the number of face detectors has to be increased to enhance the estimation resolution, each of a plurality of face detectors requires a dictionary file, and a total size of these dictionary files becomes huge, resulting in a demerit. Individual face detectors are independent detectors, and have no relevance between their output values. Hence, even when their output values are integrated, estimation cannot always be made at high accuracy.
On the other hand, a method of estimating a face direction angle value by extracting a feature amount from a face image, and inputting the feature amount to a recursive function (estimation model) has been proposed. This method allows angle estimation at a high resolution by learning the recursive function using learning data prepared by associating face images and their face direction angle values with each other. As an example of face direction estimation using the recursive function, a technique disclosed in non-patent literature 1 (Y. Li, S. Gong, J. Sherrah, and H. Liddell, “Support vector machine based multi-view face direction and recognition,” Image and Vision Computing, vol. 22, no. 5, p. 2004, 2004.) is available. The technique disclosed in non-patent literature 1 estimates a face direction by inputting projected features, which are obtained by projecting a feature amount extracted from a face image onto an eigenspace base prepared in advance, in a Support Vector Regression (SVR). By projecting a feature amount onto the eigenspace, not only a dimension reduction effect of a feature amount but also an effect of eliminating the influence of noise derived from a change in illumination condition of a face image can be expected. The eigenspace is obtained by learning a large number of face images of a face direction range to be estimated, which are prepared in advance. Using nonlinear kernels in the SVR, a recursive function which expresses in detail a feature space having a complicated, nonlinear structure, and maps on a face direction angle, can be configured.
In a technique disclosed in non-patent literature 2 (Erik Murphy-Churtorian, “Head pose estimation for driver assistance systems: A robust algorithm and experimental evaluation,” in Proc. IEEE Conf. Intelligent Transportation Systems, 2007, pp. 709-714.), an HOG (Histogram of Oriented Gradient) is extracted as a feature amount, and a face direction is estimated using the SVR. The HOG corresponds to feature mounts obtained by converting pieces of luminance gradient information of an image into a histogram for respective regions of the image, and is known as a feature amount robust against local noise and densities of an image. By selecting a feature amount robust against variations which are not related to a face direction, stable face direction estimation can be implemented even in an actual environment.
A technique disclosed in Japanese Patent Laid-Open No. 6-333023 estimates an age of a face by inputting a feature amount extracted from a face image in a neural network which is learned in advance. In this manner, an attribute other than the face direction can be estimated using the machine learning method.
However, these methods suffer a problem of an estimation accuracy drop when a face image having a broad face direction range and including noise in an actual environment like a general photo is input. In the actual environment, since the face direction has no limitation, a broad face direction range includes face directions from a full-face direction to a half-face direction, and an appearance of an image according to that angle change largely changes. In addition, there are a large number of factors (noise components) (for example, various illumination conditions such as direct sunlight, indoor illumination, and the like, personal differences of head shapes, various expressions on faces, and the like), which largely change the appearance in addition to the face direction. In such case, for the method of estimating a face direction from projected features using the single eigenspace like in non-patent literature 1, and the simple estimation model using only a feature amount robust against noise like in non-patent literature 2, it is difficult to attain face direction estimation. This is because since a face direction and a change in appearance caused by other factors overlap each other, a range that can be expressed by a single feature amount is exceeded, and an estimator cannot discriminate an appearance difference based on a face direction.