A method on the basis of a discriminant analysis method, a method on the basis of a principal component analysis method, and so forth, have been known and widely used, as a method for deciding a feature used for identifying a pattern from a learning-pattern set (see Japanese Unexamined Patent Application Publication Nos. 9-258492, 4-256087, and 1-321591, for example). However, since every feature decided according to each of the above-described methods is linear, the performance of the methods is limited.
On the other hand, a method using a neural network and especially a method using a combination of an error-backward-propagation learning method and the neural network are known, as a method using a nonlinear feature (see “Neurocomputer” edited by Kaoru NAKAMURA, Gijutsu-Hyohron Co., Ltd. 1989, or D. E. Rumelhart, “Parallel Distributed Processing”, MIT Press, 1986, for example). According to the above-described methods, it becomes possible to make a neuron of an intermediate layer learn the nonlinear feature suitable for pattern identification. However, the above-described methods have the following problems. That is to say, it takes enormous time for learning in the case of a problem of great difficulty and the performance of the neural net is significantly affected by the number of the intermediate-layer neurons. Further, there is no general method for determining the most appropriate number of the intermediate-layer neurons in advance.
Further, methods including ID3, C4.5, and so forth, for determining a classification rule at each node of a decision tree by the mutual-information maximization standards are known as a method for performing pattern identification (classification) by using the decision tree (see Japanese Unexamined Patent Application Publication Nos. 2000-122690 and 61-75486, J. R. Quinlan, “C4.5: Programs for Machine Learning”, 1993) The learning time required for these methods is shorter than in the case of the above-described methods using the neural network. On the other hand, there is a perception that the performance of these methods is inferior to that of the neural-network methods, in general. For example, the ability to identify a pattern other than the learning patterns (generalization ability) of these methods is not at adequate levels.