The present invention relates to a classification model generating method for pattern recognition or situation classification, which is used for recognition of, e.g., speech or image patterns or classification of situations, and a recording medium on which a program for making a computer execute the classification model generating method is recorded.
Systems used in the field of process control and the like are required to perform situation classification to discriminate whether the current situation is an abnormal situation or demands a predetermined operation. Situation classification for abnormality determination or operation decision can be regarded as a problem for classifying situations by classifying them into abnormal and normal situations or operations A and B in a feature space defined by feature amounts (to be referred to as variates hereinafter) used for situation classification.
As a conventional method of implementing situation classification, a discriminant analysis method is known. According to the discriminant analysis method, when there are classes characterized by a plurality of types of variates, a specific class to which the situation to be classified belongs is discriminated on the basis of data belonging to the respective classes. This method is generally based on statistical techniques.
Assume that a class that has achieved a given object is defined as class A, and a class that has not achieved the object is defined as class B, and that a plurality of data characterized by variates x1, x2, . . . , xn (e.g., the numbers of times of visits to customers, telephone charges, and the numerical values obtained by quantifying enthusiasm) have been obtained for the respective classes. In this case, the discriminant analysis method uses a discrimination function Y that assigns weights to the respective variates to clarify the difference between classes A and B.
Y=a1xc3x971+a2xc3x972+ . . . +anxc3x97nxe2x80x83xe2x80x83(1)
where a1, a2, . . . , an are the weights for the respective variates. Note that equation (1) is written, as an example of discrimination functions, for a case in which the discrimination function Y is linear (variance-covariance matrixes for the respective classes are equal). FIG. 21 shows how the discrimination function Y is determined when the space of class A as a set of data Da and the space of class B as a set of data Db are present in the two-dimensional feature space defined by the variates x1 and x2. With this function, when a situation in which Yxe2x89xa70 occurs, it can be determined that the situation belongs to class A. If a situation in which Y less than 0 occurs, it can be determined that the situation belongs to class B.
Another method of implementing situation classification, a pattern recognition method of recognizing an object on the basis of a form, mode, or pattern which characterizes the object is known. As this pattern recognition method, a method using a neural network has been proposed (Gail A. Carpenter and Stephen Grossberg, xe2x80x9cPATTERN RECOGNITION BY SELF-ORGANIZING NEURAL NETWORKSxe2x80x9d, A Bradford Book, 1991). As another pattern recognition method, a method using an RCE (Restricted Coulomb Energy) network has been proposed (D. L. Reilly, L. N. Cooper and C. Elbaum, xe2x80x9cSelf Organizing Pattern Class Separator and Identifierxe2x80x9d, U.S. Pat. No. 4,326,259. Awarded Apr. 20, 1982).
A neural network is an attempt to implement a parallel information processing mechanism based on neurons as in the brain of a creature in terms of engineering. When a neural network is to be used for situation classification, variates included in several typical situations and discrimination results to be output from the neural network in accordance with the variates must be supplied to the neural network to make it learn to obtain desired discrimination results. As a method of making the neural network learn, a back propagation method is generally used.
An RCE network is used to classify a feature space by approximating classes occupying a linearly inseparable multi-dimensional space with a plurality of basic graphic patterns (e.g., multi-dimensional hyperspheres). In the case shown in FIG. 22, the spaces of linearly inseparable classes A and B are respectively approximated with basic graphic patterns Ca and Cb to classify the two-dimensional feature space defined by variates x1 and x2.
[Problem to be Solved by the Invention]
According to the discrimination analysis method, however, when the spaces of the respective classes cannot be linearly separated, a discrimination function must be approximated with a higher order polynomial. If, therefore, many types of variates are required, and the space of each class is complicated, a discrimination function is difficult to derive.
In the method using the neural network, the learning speed of the neural network is low (in general, about 100 to 1,000 learning processes are required; it takes about one week in some cases). In addition, it is difficult to determine an optimal network configuration for classification. Furthermore, since it takes much time to perform classification processing, i.e., classifying situations on the basis of variates characterizing the situations, an expensive semiconductor chip is required to increase the processing speed.
In the method using the RCE network, the basic graphic patterns Ca and Cb respectively centered on the data Da and Db belonging to classes A and B are generated to have sizes that do not interfere with the remaining classes. However, the data Da and Db serving as the centers of the basic graphic patterns do not always exist at positions where the spaces of classes A and B can be properly approximated. For this reason, a situation that should not be included in a given class may be determined as a situation belonging to the class. That is, a recognition error may occur. For example, in the case shown in FIG. 22, the basic graphic patterns Cb properly approximate the space of class B, whereas some basic graphic patterns Ca protrude from the space of class A. In this case, therefore, a situation that should not be included in class A may be determined as a situation belonging to class A. In addition, according to the method using the RCE network, when a few data are remote from the data groups of the respective classes, classification is affected by those data.
The present invention has been made to solve the above problems, and has as its object to provide a classification model generating method in which the learning speed and the classification processing speed are high, and the spaces of classes can be properly approximated even if the spaces of the respective classes cannot be linearly separated, and a recording medium on which a program for making a computer execute the classification model generating method is recorded.
[Means of Solution to the Problem]
As described in claim 1, according to the present invention, a classification model generating method of the present invention comprises the steps of, when n-dimensional data which belongs to one class in an n-dimensional feature space defined by n types of variates and whose position is specified by the variates is input, dividing the feature space into mn divided areas by performing m-part division for each of the variates, and determining a division number m on the basis of a statistical significance level in the division by regarding a degree of generation of a divided area containing one data as a degree following a probability distribution with respect to the division number m, setting a divided area containing n-dimensional data as a learning area belonging to the class, and associating each input data with a corresponding divided area, adding divided areas around the learning area as learning areas to expand a learning area group, and removing a learning area located on a boundary between the learning area and a divided area which is not a learning area from the learning area group to contract the learning area group.
As described above, in the present invention, the division number m is determined on the basis of a statistical significance level (data density) to divide a feature space, and the generated divided areas are classified depending on whether they contain n-dimensional data or not, thereby generating a classification model. With this operation, even if the spaces of the respective classes cannot be linearly separated, a classification model that properly approximates the space of each class can be generated. In addition, a divided area that should belong to a class can be added to a learning area group by performing the step of expanding the learning group and the step of contracting the learning area group.
As described in claim 2, the classification model generating method of the present invention further comprises determining the division number m based on the statistical significance level from an average and a variance for the division number m with which at least one divided area containing one data is generated.
As described in claim 3, the classification model generating method of the present invention further comprises removing a learning area having no learning area among adjacent areas from the learning area group before the learning area group is expanded.
As described above, an area containing data regarded as noise can be removed from learning areas by removing a learning area having no learning area among adjacent areas from a learning area group.
As described in claim 4, the step of expanding the learning area group comprises setting an arbitrary divided area as an area of interest, and setting the area of interest as a learning area if at least one of divided areas adjacent to the area of interest is a learning area, and the step of contracting the learning area group comprises setting an arbitrary learning area as an area of interest, and removing the area of interest from the learning area group if at least one of divided areas adjacent to the area of interest is a non-learning area.
As described in claim 5, when the class includes a plurality of classes, a division number for each class is obtained on the basis of the statistical significance level, a division number common to all the classes is determined from the division numbers obtained for the respective classes, and the step of associating data with each divided area, the step of expanding the learning area group, and the step of contracting the learning area group are performed for each class. With this operation, even if a plurality of classes are present, a classification model that approximates the space of each class can be easily generated.
As described in claim 6, when e learning areas, of a total number N of learning areas recognized as areas belonging to a given class, are recognized as areas belonging to another class as well, e/N is set as an identification error ratio indicating separability of classes from each other.
As described in claim 7, according to the present invention, there is provided a recording medium on which a program is recorded, the program making a computer to execute the step of, when n-dimensional data which belongs to one class in an n-dimensional feature space defined by n types of variates and whose position is specified by the variates is input, dividing the feature space into m divided areas by performing m-part division for each of the variates, and determining a division number m on the basis of a statistical significance level in the division by regarding a degree of generation of a divided area containing one data as a degree following a probability distribution with respect to the division number m, the step of setting a divided area containing n-dimensional data as a learning area belonging to the class, and associating each input data with a corresponding divided area, the step of adding divided areas around the learning area as learning areas to expand a learning area group. and the step of removing a learning area located on a boundary between the learning area and a divided area which is not a learning area from the learning area group to contract the learning area group.
[Effect]
(1) According to the present invention, as described in claim 1, the division number m is determined on the basis of a statistical significance level to divide the feature space, and each of the input n-dimensional data is associated with a corresponding divided area. With this operation, even if the spaces of the respective classes cannot be linearly separated, a classification model that can nonlinearly separate the spaces of the respective classes can be generated. As a result, since the space of each class can be accurately approximated, classification processing with a low classification error ratio can be performed. Since divided areas each having a statistically significant size are generated by determining the division number m on the basis of a statistical significance level, the influence of an unbalanced distribution of data can be reduced. In addition, since divided areas are not generated more than necessary, the memory capacity used in the computer can be reduced as compared with the conventional pattern recognition method, and the learning speed of generating a classification model and the classification processing speed can be increased. Furthermore, when the user of the system refers to the learning state of data around data corresponding to the situation to be classified, and determines that the data for the generation of a classification model is insufficient, the user can withhold classification processing. This therefore decreases the possibility of classifying the situation as a situation belonging to a wrong class.
(2) As described in claim 3, an area containing data regarded as noise can be removed from learning areas by removing a learning area having no learning area among adjacent areas from a learning area group. The influence of noise can therefore be reduced.
(3) As described in claim 5, after a division number is obtained for each class on the basis of a statistical significance level, a division number common to all the classes is determined, and the step of associating data with each divided area, the step of expanding a learning area group, and the step of contracting the learning area group are performed for each class. With this operation, even if a plurality of classes are present, a classification model that approximates the space of each class can be easily generated.
(4) As described in claim 6, when e learning areas, of a total number N of learning areas recognized as areas belonging to a given class, are recognized as areas belonging to another class as well, e/N is set as an identification error ratio. With this operation, whether the variates defining the feature space are properly selected can be checked by using this identification error ratio. In addition, since the identification error ratio associated with the generated classification model can be obtained, the classification performance of the classification model can be known in advance.
(5) As described in claim 7, by recording the program on the recording medium, the computer can be made to execute the step of determining the division number m, the step of associating data with each divided area, the step of expanding the learning area group, and the step of contracting the learning area group.