1. Field of the Invention
The present invention relates to a system and method for self-training a neural network classifier with automated outlier detection. More particularly, the present invention relates to a chemical sensor pattern recognition system and method for detecting and identifying the presence of chemical agents using a self-training neural network classifier employing a probabilistic neural network with a built in outlier rejection algorithm and a mechanism to reduce the size of the probabilistic neural network.
2. Description of the Related Art
In the industrial and the military environments a need has existed for a mechanism to identify a wide variety of chemical substances on a real-time basis. These substances often include compounds which are extremely dangerous. In the industrial environment these substances may include known carcinogens and other toxins. In the military environment these substances would include blistering agents such as mustard gas and neurotoxins such as nerve gas. Therefore, it is critical for the safety of personal to quickly and accurately detect and alert employees and troops when such substances are present. Just as critical a function is the avoidance of issuing false alarms by any chemical detection apparatus.
FIG. 1 is a diagram showing a configuration of a chemical detection apparatus known in the prior art which includes a sensor 10 and a pattern recognition unit 20. The pattern recognition unit 20 would include a computer system and software to analyze data received from the sensor 10 in order to identify the substance detected.
Referring to FIG. 1, traditional chemical detection methods have relied on the inherent selectivity of the sensor 10 to provide the pattern recognition unit 20 with the necessary information required to determine the presence or absence of target analytes. Advancements in chemical sensor technology have allowed the chemical detection apparatus shown in FIG. 1 to move from the laboratory to the field.
However, field measurements offer additional challenges not seen in laboratory or controlled environments. The detection of target analytes may be required in the presence of large concentrations of interfering species. The ideal chemical sensor 10 responds only to the targeted analyte(s). However, many sensor technologies, such as polymer-coated surface acoustic wave (SAW) chemical sensors, cannot achieve this measure of selectivity. Progress has been made and researchers have been able to overcome this potential drawback by utilizing arrays of partially selective sensors for sensor 10. Pattern recognition algorithms, in the pattern recognition unit 20, are then employed to interpret the sensor signals to provide an automated decision concerning the presence or absence of the targeted analyte(s). This approach has been employed successfully for semi-conducting gas oxide sensors, Taguchi gas sensors, MOSFET sensors, electrochemical sensors, and polymer-coated SAWs for the analysis of both liquid and gas phase species.
The underlying foundations for applying pattern recognition methods to chemical sensor arrays 10 are that the sensor signals numerically encode chemical information (i.e., a chemical signature) about the target analytes and the interfering species. In addition, pattern recognition methods assume that sufficient differences in the chemical signatures for the target analyte(s) and the interfering species exists for the methods to exploit, and that the differences remain consistent over time. For chemical sensor array pattern recognition, the responses for the number of sensors (represented by m) in the array form an m-dimensional vector ("vector pattern") in the data space. Recognition of the signature of the target compound(s) (analyte(s)) is based on the clustering of the patterns in the m-dimensional space. Analytes that have similar chemical features will cluster near each other in the data space, which allows them to be distinguished from other compounds mathematically.
FIG. 2 is a diagram showing a pattern space comprising a sensor array with three sensors (1, 2, 3) and three chemical analytes (A, B, C). Since three sensors are used, the data space is a three dimensional data space. The three chemical analytes (A, B, C) form three different and easily distinguishable clusters of patterns (chemical signatures) in the three dimensional space. However, when attempting to detect chemicals in an environment outside the laboratory, frequently chemicals that closely chemically match the chemical to be identified are present. The closely related chemical is referred to as an interfering species and creates a pattern which partly overlaps with the cluster of the chemical to be detected.
In supervised pattern recognition methods, training patterns (i.e., chemical signatures) from known analytes and potential interfering species representative of the environment the sensors being deployed are used to develop classification rules by the pattern recognition unit 20. These classification rules are used to predict the classification of future sensor array data. The training patterns are obtained by exposing the sensor array to both the target analyte(s) and potential interfering analytes under a wide variety of conditions (e.g., varying concentrations and environments). The potential outcomes of the measurement (e.g., the presence or absence of the target analyte(s)) are considered the data classes. The number of data classes is application specific.
Supervised pattern recognition algorithms used in pattern recognition unit 20 are known in the art and used to analyze chemical sensor 10 array data. The two most popular pattern recognition approaches are linear discriminant analysis (LDA) and artificial neural networks (ANN). LDA is computationally simpler and easier to train than an ANN, but has trouble with multi-modal and overlapping class distributions. ANNs have become the de facto standard for chemical sensor pattern recognition due to the increasing power of personal computers and their inherent advantages in modeling complex data spaces.
The typical ANN for chemical sensor array pattern recognition uses the back-propagation (BP) method for learning the classification rules. The conventional ANN comprises of an input layer, one or two hidden layers, and an output layer of neurons. A neuron is simply a processing unit that outputs a linear or nonlinear transformation of its inputs (i.e., a weighted sum). For chemical sensor arrays, the neurons, as a group, serve to map the input pattern vectors to the desired outputs (data class). Using BP, the weights and biases associated with the neurons are modified to minimize the mapping error (i.e., the training set classification error). Upon repeated presentation of the training patterns to the ANN, the weights and biases of the neurons become stable and the ANN is said to be trained. The weights and biases for the neurons are then downloaded to the chemical sensor system for use in predicting the data classification of new sensor signals.
Despite their popularity, the BP-ANN methodology has at least five major disadvantages for application to chemical sensor arrays.
First, no known method exists for determining the optimal number of hidden layers and hidden layer neurons (i.e., the neural topology). This results in having to train many ANNs before finding one that is best for the application at hand.
Second, the iterative BP training algorithm is extremely slow, sometimes requiring several thousand presentations of the training patterns before convergence occurs. Other ANN training methods, such as Levenberg-Marquardt and QuickProp method, claim to achieve faster convergence, but their use is not widespread. Also, any learning algorithm based on incremental modifications to the weights and biases of the neurons runs the risk of falling prey to false minima and thereby requiring multiple training runs which further slow the process.
Third, the theoretical operation of how the ANN is able to map the inputs to the outputs is not clearly understood. There is no simple method of interrogating an ANN to discover why it classifies patterns correctly, or more importantly, why it fails to classify some patterns correctly.
Fourth, the outputs from a conventional ANN do not feature a statistical measure of certainty. For critical applications using chemical sensor arrays, the pattern recognition algorithm needs to produce some measure of confidence that it has correctly identified a particular classification. It is possible to obtain a confidence measurement by defining a probability density function comprising all possible outcomes of the ANN, but this method requires a large number of training patterns to be statistically valid and this further slows the training process.
Fifth, existing ANNs are unable to reject ambiguous or unfamiliar patterns (e.g., a compound that the ANN has not been trained to recognize), and thus misclassify them as being a member of the data class with which they are familiar. This often limits the applications of ANNs to controlled environments where all possible data classes are known in advance. Methods have been developed to overcome this problem by employing an ad hoc threshold to decide whether to accept or reject a new pattern. Another approach used to overcome this problem employs a dual ANN system where the first ANN decides whether to accept or reject the pattern and the second performs the actual classification. However, these solutions have not proven practical for application to chemical sensor arrays.
One variety of ANN which has been studied for application to chemical sensor array pattern recognition is the probabilistic neural network (PNN). For application to sensor arrays, PNNs overcome many of the disadvantages found with their more conventional counterparts discussed above. The PNN operates by defining a probability density function (PDF) for each data class based on the training data set and the optimized kernel width (.sigma.) parameter. Each PDF is estimated by placing a Gaussian shaped kernel at the location of each pattern in the training set. A multivariate estimate of the underlying PDF for each class can be expressed as the sum of the individual Gaussian kernels. The PDF defines the boundaries (i.e., the classification rules) for each data class. The optimized kernel width (.sigma.) parameter determines the amount of interpolation that occurs between data classes that lie near each other in the data space. For classifying new patterns, the PDF is used to estimate the probability that the new pattern belongs to each data class.
FIG. 3 is a diagram showing the topology of a probabilistic neural network (PNN) in which the inputs are the responses from the three-sensor array of FIG. 2, and the PNN outputs are the predicted classification for the given pattern in the prior art.
Referring to FIG. 3, PNN training is accomplished by simply copying each pattern in a training set (input layer) 30 to hidden layer neurons 40 and optimizing .sigma. for each neuron. Cross-validation and univariate optimization are most commonly employed to choose the best .sigma. for each neuron. The hidden layer neurons 40 and .sigma. values can then be downloaded to the chemical sensor array system for use in the field. The classification of new patterns is performed by propagating the pattern vector through the PNN. The input layer 30 is used to store the new pattern while it is serially passed through the hidden layer neurons 40. At each neuron in the hidden layer 40, the distance (either dot product or Euclidean distance) is computed between the new pattern and the input layer 30 pattern stored in that particular hidden neuron 40. The distance, d, is processed through a nonlinear transfer function as shown in equation 1. ##EQU1##
A summation layer 50 comprises of one neuron for each data class 70 and sums the outputs from all hidden neurons of each respective data class 70. The products of the summation layer 50 are forwarded to an output layer 60 (one neuron for each data class) and the estimated probability of the new pattern being a member of that data class 70 is computed. In the case of this example, since three chemical analytes (A, B, and C in FIG. 2) were supplied to the chemical detection apparatus (shown in FIG. 1) three different data classes 70 resulted from the PNN of FIG. 3.
Compared to conventional ANNs, the PNN offers extremely fast training times and provides mathematically sound confidence levels for each classification decision. However, the application of PNNs for chemical sensor array pattern recognition has been hindered by at least four major problems.
First, to predict the classification of new patterns, the distance between the new pattern and each pattern in the training set must be computed. For stand-alone sensor systems, the entire training set of patterns (i.e., the hidden layer) must be stored on board the microprocessor embedded in the sensor system. Since each and every training pattern must be stored, for many applications, this would require more memory than is available.
Second, as discussed above, the distance between the new pattern and each pattern in the training set (i.e., hidden layer) must be computed. For large training sets and sensor arrays, the number of calculations becomes prohibitive requiring more processing time than a real time application could permit. For certain applications, an embedded microprocessor could not process the sensor signals fast enough to operate in real-time without a significant reduction in the size of the hidden layer.
Third, a method is known in the art for detecting ambiguous patterns by setting a rejection threshold. If the outputs from the summation neuron for all classes are less than the rejection threshold, the new pattern is considered an outlier and no classification is performed. However, no generally accepted criteria for determining the best rejection threshold has ever been established. Setting an appropriate rejection threshold using the method described by Bartal et al. in "Nuclear Power Plant Transient Diagnostic Using Artificial Networks that Allow `Don't-Know` Classifications" would require extensive experimentation or knowledge of the pattern space for each application.
Fourth, although PNN training is much faster than BP-ANN, the cross-validation and univariate optimization procedure (i.e., .sigma. optimization) can be prone to local minima. Thus, several training runs must be performed to determine .sigma. at the global-minimum training classification error.
Therefore, a need exists in the prior art for a pattern recognition system and method which is highly accurate, executes fast enough for real-time applications, is simple to train, has low memory requirements, is robust to outliers and thereby reduces the potential for false alarms, and provides a statistical confidence level for sensor patterns recognized.