Artificial neural networks have utility in a wide variety of computing environments, such as speech recognition, process control, optical character recognition, signal processing, and image processing. Processing engines for many of the foregoing may be implemented through neural networks comprising a plurality of elemental logic elements called neuron circuits.
A neuron circuit (or processing element) is the fundamental building block of a neural network. A neuron circuit has multiple inputs and one output.
As described in the Related Invention identified above, the structure of a conventional neuron circuit often includes a multiplier circuit, a summing circuit, a circuit for performing a non-linear function (such as a binary threshold or sigmoid function), and circuitry functioning as synapses or weighted input connections. Because a typical conventional neuron circuit requires all of the above-described circuitry, the number of neuron circuits which can be manufactured on a semiconductor chip is severely limited.
There are more than twenty known types of neural network architectures, of which the "back-propagation", "perceptron", and "Hopfield network" are the best known.
FIG. 1 shows a prior art back-propagation neural network. As shown in FIG. 1, the back-propagation network typically comprises at least three layers: an "input layer", a "hidden layer", and an "output layer". However, as is well known, many more than three layers may be required to solve medium-sized problems.
With reference to the specific back-propagation neural network shown in FIG. 1, each of a plurality of inputs x.sub.1 -x.sub.n is coupled to a respective input node in the input layer (of which only input nodes 1, 2, and 4 are shown). For example, input x.sub.1 is coupled to input node 1.
The output of each input node 1, 2, and 4 in the input layer is coupled to each neuron circuit of the hidden layer (of which only neuron circuits 5, 6, and 8 are shown). For example, the output of input node 1 is coupled to each of neuron circuits 5, 6, 8, and to all other neuron circuits (not shown) in the hidden layer. The same connections are made regarding the outputs of input nodes 2, 4, and all other input nodes (not shown) in the input layer.
Each neuron circuit in the hidden layer multiplies its inputs, as received from the input nodes, by a given weight to produce a product. For example, neuron circuit 5 multiplies input x.sub.1 by weight w.sub.11, input x.sub.2 by weight w.sub.21, and so on.
Then each neuron circuit sums these products together to produce a "net" which is transformed by a non-linear function to produce its corresponding neuron circuit output.
The operation of the neuron circuit 10 in the output layer is similar to that of the neuron circuits of the hidden layer. The inputs to neuron circuit 10 are the outputs of the hidden layer neuron circuits, and the weights are k.sub.1, k.sub.2, . . . , k.sub.N.
For each cycle (epoch), the back-propagation algorithm first adjusts the weights k.sub.1, k.sub.2, and k.sub.N of the output layer. Then it adjusts the weights w.sub.11, w.sub.21, . . . , w.sub.nN of the hidden layer in a backward manner.
The back-propagation algorithm suffers several serious drawbacks. First, it is time-consuming to train the network for a relatively complex problem. For instance, it may take weeks, or even months, of computational time, often using a super-computer, to train a network. In a known example involving speech-recognition, it required several weeks, using a four-processor minicomputer, to train a back-propagation neural network in order to simply recognize the voiced and unvoiced stops (i.e. the consonants B,D,G,P,T, and K).
Secondly, when weights converge, they usually converge to local minima, which gives an erroneous solution. To avoid local minima, statistical methods such as Boltzman training or Cauchy training may be applied. These methods first randomly vary the weights of neuron circuits and then evaluate the error between desired and actual outputs. In most cases the weights that minimize the errors should be retained. However, in some cases, weights that do not minimize the errors are also kept if required by a given probability.
Although a statistical method can achieve a global minimum, it is extremely inefficient. For example, its convergence rate is reported to be 100 times slower than that of the back-propagation algorithm.
FIG. 2 shows a prior art perceptron neural network. Each of a plurality of inputs x.sub.1, x.sub.2, . . . , x.sub.n is coupled to a respective input node 11, 12, . . . , 14 in the input layer. The output of each input node 11, 12, . . . , 14 is distributed to each of a plurality of neuron circuits in the hidden layer, which neuron circuits include summing circuits 15, 16, . . . , 18 and circuits 21, 22, . . . , 24 for performing a non-linear function. For example, the output of input node 11 is distributed to each of summing circuits 15, 16, . . . , 18.
The output of each summing circuit 15, 16, . . . , 18 is fed into a respective binary threshold circuit 21, 22, . . . , 24. The output of binary threshold circuit 21 is OUT.sub.1 ; the output of binary threshold circuit 22 is OUT.sub.2 ; and so forth.
The outputs OUT.sub.1, . . . , OUT.sub.N are fed into an output neuron circuit 26. Output neuron circuit 26 comprises a summing circuit (not shown), which may be like summing circuit 15, and a non-linear function (not shown), which may be like binary threshold circuit 21.
Developed in the 1950's, the perceptron neural network utilizes "delta rule" training algorithm to compute the weights of the neurons. The delta rule uses the difference between the desired output and the actual output to compute the neuron weights.
Because a single-layer perceptron network is incapable of solving a non-linear problem, its utility is rather limited.
FIG. 3 shows a prior art Hopfield neural network. Each of a plurality of inputs x.sub.1 -x.sub.n is coupled to a respective neuron circuit appearing in what is identified in FIG. 3 as the "hidden layer". Each neuron circuit includes a summing circuit 35, 36, . . . , 38, and the output of each summing circuit 35, 36, . . . , 38 is input to a respective binary threshold circuit 41, 42, . . . , 44. The output y.sub.1, y.sub.2, . . . , y.sub.n of each binary threshold circuit 41, 42, . . . , 44 is fed back to the input of a respective input node 31, 32, . . . , 34 in what is identified in FIG. 3 as the "input layer".
In all other respects the operation of the Hopfield network is identical to that of the back-propagation neural network. The Hopfield network is characterized as a "recurrent" network, because its output signals are fed back to its input layer. A recurrent network must be concerned with a stability problem. Generally, it is found that by avoiding feeding-back the output signal to the neuron circuit itself, the network can be stabilized.
The Hopfield network is especially effective in solving so-called "non-deterministic polynomial" problems, such as printed circuit board routing or the familiar traveling-salesman problem. However, the Hopfield network gives only the local minimum solution. Moreover, it is not a trivial task to find a specific energy function (e.g. Liapunov energy function) required by a Hopfield network for a given problem.
With respect to neural networks in general, a network's training algorithm is usually dictated by the structure of the neural network. With a conventional neural network architecture, it is very difficult to train the network, and such training is usually very repetitive. For example, an Exclusive-Or logic implementation often requires more than thirty iterations if a back-propagation algorithm is used.
Also the training algorithm often converges to a local minimum, which is not the optimum solution. The optimum solution would be a "best fit" or "global minimum" for a given set of examples.
In addition to the problems of inefficient, slow, and ineffective training algorithms discussed above, the existing known neural networks present substantial difficulties to prospective users in defining a proper architecture for solving a given problem, because the manner of determining the number of layers, the number of neuron circuits per layer, and the interconnections between neuron circuits is usually done by trial-and-error or rule-of-thumb.
For instance, there is no clear way for determining how many hidden units (layers or neuron circuits) are required to tackle a problem. One way of determining this is to increase the number of hidden units gradually and to observe the network performance. This practice is continued until no more significant performance improvement is found. Needless to say, this is an extremely time-consuming process.
In summary, the drawbacks of existing known neural networks (e.g. deficiencies associated with training algorithms, ill-defined architecture, local minima solutions, etc.) severely limit the acceptance and proliferation of neural networks in many potential areas of utility, namely, manufacturing (statistical process control, routing), process control (adaptive control), CAD/CAM (optimization), robotics (coordinate transformation, adaptive control), imaging processing (smoothing, feature extraction), signal processing (noise cancellation, echo suppression), and so forth.
In addition, the complex circuitry of known neural networks severely limits their implementation in the form of semiconductor chips or computer software.
Thus there is a significant need for a neural network which does not require repetitive training, which yields a global minimum to each given set of input vectors, and which has a straight-forward architecture that is easy and inexpensive to implement.