1. Field of the Invention
This disclosure relates to automated data classifiers. In particular, this disclosure relates to an apparatus and method for performing two-group classification of input data in automated data processing applications.
2. Description of the Related Art
Automated systems for data classifying application such as, for example, pattern identification and optical character recognition, process sets of input data by dividing the input data into more readily processible subsets. Such data processing employs at least two-group classifications; i.e. the classifying of input data into two subsets.
As known in the art, some learning systems such as artificial neural networks (ANN) require training from input training data to allow the trained learning systems to perform on empirical data within a predetermined error tolerance. In one example, as described in Y. Le Cun et al., "Handwritten Digit Recognition with a Back-propagation Network" (D. Touretzky, Ed.), ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, Volume 2, Morgan Kaufman, 1990; a five layer back propagation neural network is applied to handwritten digit recognition on a U.S. Postal Service database of 16.times.16 pixel bit-mapped digits containing 7300 training patterns and 2000 test patterns recorded from actual mail.
One classification method known in the art is the Optimal Margin Classifier (OMC) procedure, described in B. E. Boser, I. Goyan, and V. N. Vapnik, A Training Algorithm for Optimal Margin Classifiers, PROCEEDINGS OF THE FOURTH WORKSHOP OF COMPUTATIONAL LEARNING THEORY, Vol. 4, Morgan Kaufman, San Mateo, Calif. 1992. An application of the OMC method is described in commonly assigned U.S. patent application No. 08/097,785, filed Jul. 27, 1993, and entitled AN OPTIMAL MARGIN MEMORY-BASED DECISION SYSTEM, which is incorporated herein by reference.
Generally, using a set of vectors in n-dimensional space as input data, the OMC classifies the input data with non-linear decision surfaces, where the input patterns undergo a non-linear transformation to a new space using convolution of dot products for linear separation by optimal hyperplanes in the transformed space, such as shown in FIG. 1 for two-dimensional vectors in classes indicated by X's and O's. In this disclosure the term "hyperplane" means an n-dimensional surface and includes 1-dimensional and 2-dimensional surfaces; i.e. points and lines, respectively, separating classes of data in higher dimensions. In FIG., 1 the classes of data vectors may be separated by a number of hyperplanes 2, 4. The OMC determines an optimal hyperplane 6 separating the classes.
In situations having original training patterns or dot-product-transformed training patterns which are not linearly separable, learning systems trained therefrom may address the inseparability by increasing the number of free parameters, which introduces potential over-fitting of data. Alternatively, inseparability may be addressed by pruning from consideration the training patterns obstructing separability, as described in V. N. Vapnik, ESTIMATION 0F DEPENDENCIES BASED ON EMPIRICAL DATA, New York: Springer-Verlag, pp. 355-369, 1982; followed by a restart of the training process on the pruned set of training patterns. The pruning involves a local decision with respect to a decision surface to locate and remove the obstructing data such as erroneous, outlying, or difficult training patterns.
It is preferable to absorb such separation obstructing training patterns within soft margins between classes and to classify training patterns that may not be linearly separable using a global approach in locating such separation obstructing training patterns. It is also advantageous to implement a training method which avoids restarting of detecting difficult training patterns for pruning.