1. Field of the Invention
The present invention relates to techniques for automatically classifying data. More specifically, the present invention relates to a method and apparatus that automatically separates different classes of data patterns in a data set by constructing data classifiers using R-functions.
2. Related Art
Many data-classification applications, such as system fault-identification applications and network attack detection applications, operate by dividing input data into more readily processable subsets. More specifically, such data processing applications typically employ classification-type pattern recognition mechanisms to divide the available data into two (and sometimes more) subsets.
One technique for input data classification utilizes “nearest neighbor classifiers,” which require storing all training patterns and their class labels in a dictionary. New patterns being processed are given the class labels of the most-similar stored patterns. This technique requires few computational resources during training phase of the data classification. However, this technique requires significantly more computational resources during the retrieval phase of the data classification. Furthermore, nearest-neighbor classifiers provide no analytical representation of the class separating boundary. Note that such analytical representations are extremely desirable in many applications.
The “Bayes classifier” technique uses the Bayes likelihood ratio test to decide whether a given pattern belongs to a given class in two-class problems. However, this technique typically requires knowledge of the conditional probability density functions for each class, which must be estimated using a finite number of training patterns. Unfortunately, these estimation procedures can be extremely complex and may require a large number of training patterns to obtain accurate results, thereby consuming a significant amount of computational time and storage resources.
Another pattern-classification mechanism, known as an “artificial neural network” (ANN), requires a significant amount of training (on input training data) to allow the trained networks to perform classification on actual data within a predetermined error-tolerance. Unfortunately, ANNs are known to have a series of problems, including: (1) long training time; (2) tendency of data-overfitting; and (3) exhibiting inconsistent results which are partially caused by stochastic optimization of the weights and difficulties associated with implementing regularization procedures.
The “support vector machine” (SVM) pattern classification technique uses a set of patterns, represented by vectors in n-dimensional space as input data, and classifies these input patterns with nonlinear decision surfaces. During pattern classification, the input patterns first undergo a nonlinear transformation to a new, so-called “feature space”, using a convolution of dot-products for linear separation by a set of hyperplanes (n-dimensional surface separating classes of data) in the transformed space. Next, the SVM determines an optimal hyperplane from the set of hyperplanes that separate the classes. Unfortunately, the training phase of the SVM requires significant computational resources due to the complexity in computing the numerical solution of the quadratic programming optimization problem associated with finding the weighting parameters. This limits the ability of applying the SVM technique on large tasks. Furthermore, the SVM-based classifiers have no pattern rejection option which is desirable in many applications in situations where patterns not similar to either class need to be rejected.
In summary, the above techniques and other ad-hoc techniques all suffer from one of the following deficiencies: (1) prohibitively long training and retrieval phases; (2) lack of analytical representation of the class separation boundary; (3) inconsistencies in results due to either numerical or stochastic optimization of weights; (4) overwhelming complexity of parameter estimating procedures requiring a large number of patterns to achieve accurate results; and (5) the impossibility to implement weak and strong rejection options for rejecting patterns not similar to either class.
Hence, what is needed is a method and an apparatus for automatically classifying input data patterns into separate classes without the above-described problems.