Field of the Invention
The present invention relates to a feature selection method and apparatus for selecting features useful for pattern discrimination from a plurality of features in data. Also, the present invention relates to a pattern discrimination method and apparatus for classifying input data to be discriminated including a plurality of features into a predetermined class using the features selected by such feature selection method and apparatus.
Description of the Related Art
A pattern discrimination method for classifying data into a predetermined class by identifying attributes of data using partial features of data including a plurality of features, and combining the pieces of identified information has been proposed. Such pattern discrimination method has a merit, that is, robustness against data loss, variations, and the like. For this reason, various methods such as Japanese Patent Laid-Open No. 2009-43280 (to be referred to as patent literature 1 hereinafter), “Vincent Lepetit and Pascal Fua, “Keypoint Recognition Using Randomized Trees”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 9, pp. 1465-1479, September 2006” (to be referred to as non-patent literature 1 hereinafter), and the like have been proposed.
Since the discrimination performance of such pattern discrimination method generally depends on partial features of data to be used in discrimination, partial features to be used from data must be selected appropriately. For example, with the method proposed by patent literature 1, discrimination precision of a pattern discrimination method is evaluated every time partial features to be used are changed, and partial features which can enhance the discrimination precision of the pattern discrimination method are selected based on the evaluation result. In non-patent literature 1, a large number of predetermined keypoints are detected from image data, and keypoints, which can be stably detected with high possibility even in a situation in which various variations have been given, are selected as partial features.
The partial feature selection method in the method described in patent literature 1 is called a Wrapper method in the technical field of feature selection. It is known that the Wrapper method is more likely to select appropriate features when test data enough to evaluate the discrimination precision exist. However, the Wrapper method suffers a problem of very high processing cost required for feature selection. On the other hand, when sufficient test data do not exist, so-called over-learning occurs.
On the other hand, in the partial feature selection method described in non-patent literature 1, features are evaluated using a separately defined evaluation scale in place of the final discrimination precision of the pattern discrimination method, and features are selected based on that evaluation value. Such method is called a Filter method in the technical field of feature selection. The Filter method generally has a merit of lower processing cost than the Wrapper method. However, with the Filter method, the evaluation scale required to evaluate features has to be separately defined, and inappropriate definition disturbs appropriate feature selection. For example, the method of non-patent literature 1 uses the high possibility of stable detection as the evaluation scale of features, as described above, but does not consider whether or not these features are discriminable from other features. However, with the pattern discrimination method of non-patent literature 1, whether or not these features are normally classified as corresponding features is associated with the final discrimination precision. For this reason, when target data includes many features which can be stably detected but are hardly discriminated from other features, feature selection that can enhance the final discrimination precision cannot be realized with high possibility.
A framework using AdaBoost described in “Paul Viola and Michael Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, Proceedings IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 511-518, December 2001” (to be referred to as non-patent literature 2 hereinafter) is also considered as one of partial feature selection methods. The framework using the AdaBoost is close to that of the Wrapper method since the framework uses the discrimination precision of element discriminators (so-called weak learners) included in a discriminator used in the final pattern discrimination method in evaluation. For this reason, although the processing cost is lower than the normal Wrapper method, required processing cost is still high. As in the Wrapper method, when sufficient test data required to evaluate element discriminators do not exist, over-learning occurs.
In this manner, a method which can select features suited to pattern discrimination from those in data with lower processing cost even when sufficient test data do not exist is demanded.