1. Field of the Invention
This invention relates to neural-network computer architectures and specifically to a teaching method for recurrent neural networks.
2. Background Art
Neural networks are computing devices inspired by biological models and distinguished from other computing devices by an architecture which employs a number of highly interconnected elemental "neurons". Each neuron is comprised of a summing junction for receiving signals from other neurons, weighting each signal by a weighting value and summing them together. The summing junction is ordinarily followed by a compressor or "squashing" function, (typically a logistic curve), that compresses the output from the summing junction into a predetermined range, ordinarily from zero to one. The neuron's inputs are the inputs to the summing junction and the neuron's output, termed an "activation", is the output from the compressor.
The inputs of each neuron may be connected to the outputs of many other neurons and the neuron's activation may be connected, in turn, to the inputs of still other neurons. In a "feedforward" neural net architecture, inputs to the network are received by a first layer of neurons whose activations feed the inputs of a second layer of neurons and so on for as many layers as desired. The final layer provides the output of the network.
In a "recurrent" neural network architecture, inputs are received by a single layer of neurons and the activations of those neurons are fed back as inputs to that single layer to produce new activations during a "propagation".
Both types of neural network architectures may be realized through programs running on conventional von Neuman architecture digital computers. Alternatively, neural networks may be constructed with dedicated analog or digital circuitry, for example, using analog summing junctions and function generators to construct each neuron as is generally understood in the art.
In operation, the neural network receives an input or a set of inputs and produces an output or set of outputs dependant on the inputs and on the weighting values assigned to each neuron's inputs. With the appropriate selection of the weighting values, a variety of computational processes may be performed.
The relationship between the weighting values and the computational process is extremely complex and the weighting values are ordinarily determined by a teaching procedure. With the teaching procedure, a teaching set of corresponding inputs and target outputs are presented to the neural network and error values are generated which are used to modify an initial set of weighting values. This process is repeated until the generated error values are acceptably low at which point the weighting values may be fixed.
Although the teaching method of programming neural networks appears cumbersome when compared with the programming of a conventional von Neuman computer--because many inputs and outputs must be presented in teaching the neural network--the advantage to the teaching method of programming is that the mechanics of the computational process need not be understood. This makes neural network computers ideal for use in modeling applications where inputs and outputs are available but the underlying mathematical process is not known.
One particular modeling application for which neural networks may be useful is that of modeling industrial processes. Specifically, a manufacturing process under automatic control may have a number of inputs and outputs. The inputs control actuators such as valves and the like and the outputs may be from process sensors such as temperature or pressure gauges. During normal operation of the manufacturing process, the inputs and outputs are related to each other in a complex but stable manner dictated by the physics of the process. A neural network may be taught this relationship by using actual input and output values of the manufacturing process for teaching.
After the neural network has been taught, it may be presented with inputs from the manufacturing process and the outputs from the manufacturing process may be compared with the outputs from the neural network. If the outputs from the neural network differ greatly from the outputs of the manufacturing process, a malfunction may have occurred, such as may be caused by a disabled sensor or process failure. The programmed neural network thus may provide a benchmark of proper process operation and trigger an alarm if the manufacturing process deviates from a recognized pattern.
This modeling technique is particularly attractive for many real world industrial processes where the outputs are complex and non-linear functions of the inputs and may not be modeled by more conventional techniques.
For manufacturing processes with intrinsic storage or time delays, the outputs of the process will depend not only on the current inputs but also on the inputs for previous times as dictated by the system "memory" of the process. If a feedforward type neural network architecture is used to model this type of process, the system memory of the process is accommodated by means of a tapped delay line on the input of the neural network. The delay line stores a time series of process inputs for the length of the system memory and presents samples of the process inputs at discrete time intervals to the network as separate inputs. The neural network generates a current output by processing the present inputs and previous sampled values of the those inputs.
The use of a tapped delay line works well for processes with short system memories but is less successful for processes with long system memories. A simple example of a process with an long system memory is one which involves the filling of a storage tank. The level of material in the tank will depend not only on the current inputs to the tank but on the inputs from all previous times.
In contrast to the feedforward neural network architecture, the recurrent network architecture requires no tapped delay line on its input. Rather, the activations of the neurons themselves serve as memory to provide the necessary storage of previous inputs. Accordingly, recurrent neural networks are preferable for modeling many real physical processes. Unfortunately, however, whereas the teaching process for feedforward neural networks is relatively straightforward and well understood, the teaching process presently available for teaching recurrent networks is difficult, requiring exhaustive analysis of all inputs and outputs, and high levels of mathematical precision, the latter requirement necessitating the use of complex and expensive computer equipment.