In general, a neural network consists of two major parts: the individual neural processing units and the network of interconnections among the processors. A neural network may have many processing units. Each unit receives different inputs from many other units and transmits an output to other units. The system has distributed control. This means there is no executive; each processor executes its operation autonomously without explicit knowledge of the system as a whole.
Neural processing elements accept incoming signals from other units, perform simple operations on those signals, and compute single valued outputs. These three functions correspond to three different parts of a general neural processing unit-input, activation, and output. First, for a system of n processing units, there may be as many as n-1 inputs into a single arbitrary unit u.sub.i. These inputs come from the outputs o.sub.j of other units. The output values are modified by a weighting factor w.sub.ij representing the relative connection strength of unit j to unit i. A net input function determines how the inputs should be combined to get a net output value net.sub.j. For the simplest case, this function takes the weighted sum of all impinging signals. That is, net.sub.j =.SIGMA. w.sub.ij o.sub.j. More complex combinations occur if there are different types of connections. For example, each connection type might have to be summed separately and then combined. In this case, let net.sub.ij be the net input of connection type i to u.sub.j. The most common case of multiple input types is for differentiating excitatory vs. inhibitory inputs. Often these types can be combined by simply using positive and negative weights.
Second, this net input value is used in calculating the activation value. The activation value for an arbitrary unit i at time t, a.sub.i (t), represents the current state of the unit. This set of activation values may be continuous or discrete. Continuous values are only bounded between some interval, for example, 0 to 1. Discrete values are either binary, {0,1} or {-1,1}; or range over a small set of values such as {1,2, . . . , 9}. The choice of activation values often has an important impact on the characteristics of computation. The state of the system is simply the vector a(t), representing the state of activation for each unit at time t. This state vector is also called the pattern of activation. A function F can be defined that takes a.sub.i (t) and net.sub.ij and calculates a new state of activation. The activation function may take on a number of different forms. The simplest case is where the activation value is simply the net input net.sub.ij. More likely, F is some kind of threshold function with no activation until the net inputs exceeds a certain value. Stochastic, decay, and sigmoid functions have also been used to modulate the activation values.
Third, activation values are used to determine the final output value o.sub.i (t). An output function f(a.sub.i (t)) maps the current state of activation a.sub.i (t) to an output value o.sub.i (t). This function may be an identity function, in which case o.sub.i (t)=a.sub.i (t). Other common alternatives include thresholding or stochastic.
In a neural network, interprocessor connections are much more than simple communication links. The connections actually encode what patterns the network can identify. In addition to connectivity, the interconnections also have an associated weight or strength. A convenient way to represent this property is by using a weight matrix W. An element in W,w.sub.ij, denotes the strength of connection j to i. For a simple case, positive weights represent excitatory connections while negative weights represent inhibitory connections. A value of 0 means there is no connection between the units.
More complex patterns may be needed for different types of connections. In this case, there will be a weight matrix W.sub.i for each connection type. As mentioned above, the distinction between excitatory and inhibitory connections is only needed if they have different net input functions. Also, W could be more than two-dimensional. For instance, instead of the normal "biconnection", described above, there could be a "triconnection" between three units. In this case, each element in W would have three indices w.sub.ijk. This idea can be extended to an arbitrary number of connections represented by higher dimensional weight matrices.
The topology of neural network interconnections can be very diverse. In the simplest case, there may be a single layer of fully interconnected processors. More often, the network is divided into multiple layers with full interconnection between layers but not with every neuron. One layer of neurons may be used to input values into the network, while another layer may output the final results. In between the input and output layer, there may be intermediate or hidden layers. The hidden layers are necessary to allow solutions of non-linear problems. When the weighting matrices and various parameters associated with the activation functions have been set to correct levels, a complex stimulus pattern at the input layer successively propagates between hidden layers, to result in an often simpler output pattern, such as only one output layer unit having a significantly strong output. The network is "taught" by feeding it succession of input patterns and corresponding expected output patterns; the network "learns" by measuring, directly or indirectly, the difference or error (at each output unit) between the desired output pattern and the pattern that it just produced. Having done this, the internal weights and activation parameters of the hidden layer(s) are modified by a learning algorithm to provide an output pattern which more closely approximates the desired output pattern, while minimizing the error over the spectrum of input patterns. Neural network learning is an iterative process, involving multiple "lessons". Neural networks have the ability to process information in the presence of noisy or incomplete data and yet still generalize to the correct solution.
In order to adjust the weights and activation parameters of the neural network, a learning algorithm must be applied. One of the more widely used learning algorithms is the Back Propagation method, which is described in U.S. Pat. No. 4,893,255, issued to M. S. Tomlinson, Jr. on Jan. 9, 1990, which is incorporated herein by reference. Back Propagation is essentially the backward propagation of error through the network with the changes being proportional to the error signals whose calculation begins at the output of the network. Essentially, the error is first calculated for the output layer and then this error value utilized to calculate weight changes for units that feed into the output layer, which in turn uses weights for successive layers until the process terminates back at the input layer.
To minimize the number of hidden units necessary to map the input space, one technique that has been developed is to generate localized receptive fields. This is disclosed in John Moody and Christian Darken, Learning with Localized Receptive Fields, "Proceedings 1988 Connectionist Models Summer School". Moody discloses a multi-layered system having hidden units with each hidden unit defining a localized receptive field. This localized receptive field has the parameters thereof varied such that it is located over a specific area of the input space. Learning is accomplished by placing the centers of the receptive fields in only those regions of the input space where data is present. This is accomplished through means of a clustering algorithm, which is sometimes referred to as a "competitive learning" algorithm. This method utilizes only the input data to cluster the receptive fields. Thereafter, only the output weights (receptive field amplitudes) need be calculated utilizing an error signal, which can be accomplished with backwards propagation.
The Moody system has a number of disadvantages in that a large number of hidden units are required in order to sufficiently map an input space in a non-linear manner. Therefore, there exists a need for an improved system that allows more efficient mapping of the input space while maintaining some localized nature to the mapping procedure.