Pattern recognition systems have been designed and built for everything from character recognition, target detection, medical diagnosis, analysis of biomedical signals and images, remote sensing and identification of human faces and fingerprints, to reliability, socio-economics, archaelogy, speech recognition, machine part recognition and in industrial applications such as automatic inspection of industrial products such as semiconductor chips for defects.
As well known in the art, pattern recognition often begins with some kind of preprocessing to remove noise and redundancy in measurements taken from physical or mental processes to ensure an effective and efficient pattern description. Next, a set of characteristic measurements, numerical and/or non-numerical and relations among these measurements are extracted to represent patterns. The patterns are then analyzed (classified and/or described) on the basis of the representation.
The process of pattern recognition involves analyzing pattern characteristics as well as designing recognition systems. Many mathematical methods have been offered for solving pattern recognition problems but all are primarily either decision theoretic (statistical) or syntatic (structural). In the decision-theoretic approach, N features or an N-dimensional feature vector represents a pattern, and decision making (and structural analysis) is based on a similarity measure that, in turn, is expressed in terms of a distance measure, a likelihood function or a discriminant function. In the syntactic approach, a pattern is represented as a string, a tree or a graph of pattern primitives and their relations. The syntactic approach draws an analogy between the structure of patterns and the syntax of a language and the decision making (and/or structural analysis) is in general a parsing procedure.
Typically, pattern recognition systems utilize a “training” phase and a “recognition” phase. During the training phase, information representative of distinctions among pattern (e.g. character) types is gathered from a set of training samples of patterns whose proper classification is known (i.e. training set) and stored in a computer's memory in the form of numerical parameters and data structures. During this phase, the selected pattern recognition system is tuned to the training set.
It is a universal characteristic or feature of all pattern recognition systems that a decision rule must be determined. When the rule is then applied to the members of the training set, it should allow perfect classification of all members. In the prior art, a single pattern recognition (classifier) component is normally trained to perform the entire classification for the entire universe of patterns to be recognized. To improve classification accuracy, one system utilizes a hierarchy of recognition components trained to perform the entire classification of all patterns. This system is disclosed in U.S. Pat. No. 4,975,975 issued on Dec. 4, 1990. While this system improved accuracy, it still requires repeating the entire training procedure for each of the recognition components until all of the components required to properly classify all members determined by the first recognition component as being in the class. Moreover, the system is not designed to perform error free decision making.
Typically applying pattern recognition to real problems involves these steps:                1. taking data from samples or objects of the various classes, interferants, etc. and forming features therefrom, the data may be of virtually any parameter,        2. selecting one or more discriminants, where a discriminant is a function of the data and/or features,        3. training the discriminants by plotting the measured data in discriminant space, and        4. applying a decision rule by selecting decision boundaries from step 3 and using the discriminants and decision boundaries for classifying a new sample from new data taken therefrom.        
The combination of those four steps determines how well the pattern recognizer works, so improvements in one can be used to either improve overall performance or achieve similar performance with one or more of the unimproved steps actually degraded.
One well known method is the Principal Component Analysis (PCA) which combines the above steps 1, 2 and 3. In this review, attention is drawn to linear discriminants d1, d2. Using PCA an optimum linear discriminant for the data may be computed from the data vector x using a weight vector w, according tod1=w1Tx.  (1) 
A new data point is subjected to that same inner product with w1 to give a number that is called the “first principal component.”
Next a second linear discriminant weight vector w2 is selected which is subject to the constraint thatw1Tw2=0.  (2) 
Usual embodiments make:wiTwi=1 for all i.  (3) 
The general relationship becomes where the exponent T is the transposition of rows and columns.
Thus orthonormality condition assures that the information used in the second discriminant is orthogonal to that in the first, etc. A logical limitation of the PCA approach is that once a discriminant has correctly identified some of the items it is unnecessary to apply a second discriminant to those items.
The data are plotted in d space and good decision boundaries found. Because each discriminant uses all of the data optimally subject to the constraint that each must add totally new information, PCA is widely considered as good as can possibly be done using linear discriminants. However, this method is complex in practice and will not guaranty that the training set is classified error free.
Other prior art classifiers use a nearest neighbor method, well known in the art, where it is possible to classify all members of a training set error free. But as training sets become large this approach becomes complex and cumbersome. This approach has another limitation in that the discriminants used are not the minimum necessary to classify all the members of the training set.
Fourier filtering is of interest both within optics and within electronics, because it allows targets to be recognized and located in parallel. This is referred to as shift invariance. However, a limitation of this filter is that it is applicable only to problems solvable with a linear discriminant and therefore inadequate for most real applications. So, if a distribution of objects in different classes were not linearly discriminable, Fourier filters were not used. But with the advances and availability of fast electronic chips and fast optical method for Fourier transforms, this technique becomes very attractive if the above limitation is overcome.
For virtually all the known pattern recognition systems and techniques described above, it should be noted that not all choices of the discriminant and threshold are of equal usefulness even in the linearly separable cases. If the separation in the discriminant space of the items in the one class from the remaining items is small, clearly, the sensitivity to small perturbations is far greater than in a case when the separation is large. Stated more conventionally, one choice leads to greater robustness or generalizability than the other.
As referenced above there are many installed pattern recognition systems in a wide variety of applications. Most of these applications have compiled large amounts of data, derived many useful features and have applied many different algorithms that produce some level of satisfaction. But, these systems produce errors when analyzing new data, and improved performance is desirable. However, improvements that require significant investments in time, people and money are often not available to the users. There is a continuing general need in the field to develop techniques that supplement these installed applications making use of the developed data, features, algorithms and techniques while improving the performance thereof.
It is an object of the present invention to provide a pattern classifier that can be used to supplement other pattern recognition systems thereby improving performance.
It is a further object of the present invention to provide a training method for improving performance of existing pattern recognition systems.
It is an object of the present invention to classify error free all members of a training set.
Another object of the present invention is to provide a minimum number of discriminants for error free classification of the training set.
It is still yet another object of the present invention to provide fuzzy (as well as crisp) pattern classifiers.
It is an object of the present invention to design and apply Fourier filters to linear and nonlinearly discriminable problems.
It is yet another object of the present invention to make the system as robust as possible relative to new samples not in the training set by providing significant margins for use on new data items.