1. Field of the Invention
The present invention relates generally to the field of training a neural network and, more particularly, to a system and method that provides a supervised learning environment for a neural network.
2. Related Art
The artificial neural network is a simulation of the biological neural network of the human brain. The artificial neural network accepts several inputs, performs a series of operations on the inputs, and produces one or more outputs. It has been studied in the hope of achieving human-like performance in solving problems with complex, incomplete, or seemingly unrelated data which cannot be solved by conventional programming techniques. The power of artificial neural networks lies in their computational speed and ability to provide a high degree of robustness or fault-tolerance.
A typical artificial neural network consists of a number of connected neurons or processing nodes, and a learning algorithm. A neuron is, in turn, composed of three elements: weighted connections, an integration function, and an activation function. Through the weighted connections, the neuron receives inputs from those connected to it in a previous layer (of neurons), and transfers output to those connected to it in the next layer (of neurons). The integration function simply sums up the received inputs. The activation function, which usually is in the form of a non-linear sigmoid function, converts the integrated input into an output. Mathematically, an integration function is shown as follows: ##EQU1## where i.sub.pj is the integrated input of neuron j corresponding to input pattern p,
O.sub.pi is the output from neuron i, PA1 W.sub.ji is the connection weight between neurons i and j.
An activation function usually takes the following form: ##EQU2## where .beta. is the bias of neuron j.
There are two primary connection types, inhibitory and excitatory. Inhibitory connections decrease the activation of processing elements to which they are connected, while excitatory connections increase the activation. Therefore, a portion of the connections to a particular neuron may have negative weights (inhibitory) and the remainder have positive weights (excitatory).
Because artificial neural networks are formed and operated in a parallel fashion, they have been found to be able to perform complex computations simultaneously and rapidly. In addition, because a network consists of a number of neurons (processing elements), when even a few neurons or their interconnections are damaged, the network can still maintain its regular performance (see generally, Hopfield, J. J., "Neural Networks and Physical Systems with Emergent Collective Computational Abilities," Proceedings of the National Academy of Sciences, Vol. 74, 1982, pp. 2554-2558;, Hopfield, J. J., "Neurons with Graded Response Have Collective Computational Properties Like Those of Two-State Neurons," Proceedings of the National Academy of Sciences, Vol. 81, 1984, pp. 3088-3092; and Lippmann, R. P., "An Introduction to Computing with Neural Nets," IEEE ASSP Magazine, April 1987, pp. 4-22).
The most important property of neural networks is their learning ability. A learning algorithm is used to train the network to learn a (sometimes arbitrary) mapping between the input space and the output space by adjusting the interconnection weights. Two types of learning algorithms are generally used in training the network: supervised learning and unsupervised learning. Supervised learning requires an external feedback of error signals after each mapping. Examples are back propagation (BP) (see Rumelhart and McClelland, Parallel Distributed Processing, Vol. 1, MIT Press, Cambridge, Mass., 1986); Cerebellar Model Arithmetic Computer (CMAC) (see Miller et al., "CMAC: An Associative Neural Network Alternative to Backpropagation," Proceeding of the IEEE, Vol. 78, No. 10, October 1990, pp. 1561-1567); and Brain-State-in-a-Box (BSB) (See Anderson, J. A., "Neural Models with Cognitive Implications," Basic Processes in Reading Perception and Comprehension, edited by D. LaBerge and S. J. Samuels, Erlbaum, N.J., 1977, pp. 27-90). Unsupervised learning does not require any external feedback during the training process. Examples are Adaptive Resonance Theory (ART) (see Carpenter, B. A. and Grossberg, S. "A Massively Parallel Architecture for a Self-Organizing Neural Pattern Recognition Machine," Computer Vision, Graphics, and Image Processing, Vol. 37, 1987, pp. 54-115 and Carpenter G. A. and Grossberg, S., "ART 2: Self Organization of Stable Category Recognition Codes for Analog Input Patterns," Applied Optics, Vol 26, No. 23, 1987, pp. 4919-1930) and the Hopfield network (see Hopfield, J. J., "Neural Networks and Physical Systems with Emergent Collective Computational Abilities," cited above).
The identification of a machine or component fault is actually a pattern recognition problem. In the past, a number of pattern recognition techniques, such as linear discriminant function and fuzzy sets, have been applied to solve this type of problem. Normally, these techniques classify machine or component condition into a two-state situation, i.e., normal or abnormal. Examples can be found in Li, P. G. and Wu, S. M., "Monitoring Drilling Wear States by a Fuzzy Pattern Recognition Technique," Journal of Engineering for Industry, Vol. 110, August 1988, pp. 297-300; and Emel E. and Kannatey-Asibu E., Jr., "Tool Failure Monitoring in Turning by Pattern Recognition Analysis of AE signals," Journal of Engineering for Industry, Vol. 110, May 1988, pp. 137-145. Today, artificial neural networks are the most popular approaches in solving pattern recognition problems.
There are a number of different types of neural networks suitable for pattern classification. Among them, multi-layer feedforward networks are the most popular paradigms because they can solve non-linear problems which are unable to be solved by a linear single layer network, known as perceptrons. The multi-layer feedforward network has one or more layers of processing elements between the input and output layer. These layers are called hidden layers.
One of the most powerful and popular multi-layer feedforward networks is trained with back propagation. Back propagation was proposed by Rumelhart and McClelland as an algorithm for finding the optimal assignment of weights of network connections. It employs an iterative gradient descent algorithm to minimize the error measure between the actual output of the network and the desired output.
FIG. 1 shows a neural network architecture 100 called Predictive Adaptive Resonance Theory (ART) or ARTMAP. The ARTMAP architecture 100 autonomously learns to classify arbitrarily ordered vectors into recognition categories based on predictive success. See Carpenter, G. A., Grossberg, S., and Reynolds, J., "ARTMAP: Supervised Real-time Learning and Classification of Nonstationary Data by a Self-Organizing Neural Network," Neural Networks, Vol 4, 1991, pp. 569-588. This supervised learning system 100 is built from a pair of ART modules (ART.sub.a 110 and ART.sub.b 120) that are capable of self-organizing stable recognition categories in response to arbitrary sequences of input patterns.
Two classes of ART modules have been developed by Carpenter and Grossberg (Carpenter, B. A. and Grossberg, S. "A Massively Parallel Architecture for a Self-Organizing Neural Pattern Recognition Machine," and "ART 2: Self Organization of Stable Category Recognition Codes for Analog Input Patterns," both of which were cited above); ART 1 is capable of processing arbitrary sequences of binary input patterns, while ART 2 is capable of handling either binary or analog input patterns. These ART modules are linked by a Map Field 130 and an internal controller (not shown) that ensures autonomous system operation in real time. The Map Field 130 controls the learning of an associative map from ART.sub.a recognition categories to ART.sub.b recognition categories, as well as matching tracking of the ART.sub.a vigilance parameter 140 (.rho.'). The vigilance parameter 140 determines the closeness between the ART. recognition category and the ART.sub.b recognition category.
The ARTMAP architecture 100, however, applies an unsupervised learning algorithm for training. Oftentimes, the pattern to be recognized is known beforehand and the unsupervised learning algorithm is a disadvantage. As such, what is desired is neural network architecture that provides the benefits of The ARTMAP architecture, but can be trained in a supervised manner.