The present invention relates to learning methods for multi-level neural networks which are applied to large scale logic circuits, pattern recognizers, associative memories, code converters and image processors, providing a desired output signal stably and quickly with an extremely high generalization ability by an iterative learning procedure consisting of a training process and a test process.
One such neural network is a multi-layered neural network, in which a back-propagation algorithm has been widely used as a supervised training method for the neural network. In a prior training process, as shown in FIG. 1, a set of input signals related to teacher signals is divided into a set of training input signals in a training signal memory 10 with a set of corresponding teacher signals and a set of test input signals in a test signal memory 11 with a set of corresponding teacher signals.
In the training process, the set of training input signals and the set of corresponding teacher signals are provided to a terminal 2 and a terminal 3 through an input selector 12. The multi-layered neural network 1 starts to train after setting initial conditions under the control of a mode controller 9, and outputs output unit signals from the output units of an output layer.
In an error signal generator 4, a difference signal as an error signal is derived by subtracting the output unit signal replying to the training input signal from a teacher signal T (teacher signal element: T.sub.1, T.sub.2, . . . , T.sub.M) through a subtractor. The difference signal is then fed into a weighting factor controller 5. The weighting factors between adjacent layers are then updated by using the output unit signal and the difference signal to minimize the power of the difference signal in the weighting factor controller 5, and are set again in the multi-layered neural network 1.
A binary output unit signal is obtained through a binary threshold means 6 as an output signal at 7 from the output unit signal. By detecting the coincidence between the teacher signal and the binary output unit signal in an error pattern detector 8, it is determined whether the multi-layered neural network 1 achieved convergence or not. These procedures are repeated in the training process until convergence in binary space is achieved.
In the training process, updating of weighting factors is repeatedly performed for the set of training input signals to achieve convergence of the multi-layer neural network 1. A minimum power of the difference signal can provide complete convergence of the neural network in binary space, resulting in coincidence between the set of binary output unit signals replying to the set of training input signals and the set of teacher signals.
The updating speed of the weighting factors becomes gradually slower due to small gradients for input values very close to 0 and 1 in a sigmoidal transfer function in the weighting factor controller 5. When a state minimizing the power of the difference signal in local minima has been captured once, global minima cannot be obtained even if the training cycles are increased, or a significantly large number of training cycles is necessary to obtain the global minima. Namely, the multi-layered neural network 1 is frequently captured in a very tenacious state trapped in local minima. Dependency on initial conditions of the weighting factors is also one of the problems in achieving quick convergence.
A second prior training process comprises an updating of weighting factors by using an error signal generated by an error perturbation method in the error signal generator 4. In the error perturbation method, when the absolute value of the difference signal derived by subtracting the output unit signal from the corresponding teacher signal is equal to or smaller than a given threshold D.sub.1 for a correct binary output unit signal, the error signal has an opposite polarity to that of the difference signal and an amplitude smaller in proportion to that of the difference, and when the absolute value is larger than the threshold D.sub.1 for a correct binary output unit signal, the error signal has the same polarity as that of the difference signal and an amplitude smaller by D.sub.1 than that of the difference signal.
On the other hand, for an erroneous binary output unit signal, the weighting factor is also updated by using an error signal which has the same polarity as that of the difference signal and an amplitude smaller by a threshold D.sub.2 than that of the difference signal.
The difference signal on the m-th output unit is given by T.sub.m -Y.sub.m, where Y.sub.m is the output unit signal on the m-th output unit, T.sub.m is the binary teacher signal on the m-th output unit (0 or 1) and 1&lt;m&lt;M, and M is a number of output units. The error signal E.sub.m on the m-th output unit in the error perturbation is given by the following equations.
If the binary output unit signal on the m-th output unit is correct because of the coincidence with the binary teacher signal T.sub.m, E.sub.m is given by (1), EQU E.sub.m =T.sub.m -Y.sub.m -D.sub.1 *sgn(T.sub.m -Y.sub.m), (1)
and if the binary output unit signal on the m-th output unit is wrong because of the distinction from the binary teacher signal T.sub.m, then EQU E.sub.m =T.sub.m -Y.sub.m -D.sub.2 *sgn(T.sub.m -Y.sub.m), (2)
where ##EQU1## and 0.5.gtoreq.D.sub.1, D.sub.2 .gtoreq.0.
In the training process, updating of weighting factors is repeatedly conducted to achieve convergence for the set of training input signals. By detecting the coincidence between the set of teacher signals and the set of binary output unit signals in the error pattern detector 8, it is determined whether the multi-layered neural network 1 achieved convergence in binary space or not. These procedures are repeated in the training process until the coincidence is achieved. When the coincidence has been once detected, the coincidence signal is fed into the mode controller 10 to terminate the training process.
The first prior training process which is equivalent to the second prior training process with D.sub.1 and D.sub.2 =0 has defects that it is very difficult to quickly achieve complete convergence in binary space due to the easy capture of tenacious states trapped in local minima and the difficulty of slipping away from it, as aforementioned.
Only the second prior training process can converge with the correct set of binary output unit signals for the set of training input signals without being trapped in the tenacious state with local minima. However, a large number of training cycles is sometimes required to achieve convergence for a large number of training input signals. When the number of binary errors in the binary output unit signals quickly has become small once, further reduction of the number of binary errors becomes very slow and the convergence speed eventually stagnates even if the weighting factors are updated by using the error perturbation method to evade the tenacious state trapped in local minima. This is because the error perturbation with large values of D.sub.1 and D.sub.2 introduces disturbance in the optimized weighting factors, which results from reduction of the margin providing the correct binary output unit signal by D.sub.1 in the previous error perturbation.
In particular, a general design method for achieving reliable and very quick convergence has not been established yet for either a three layered or multi-layered neural network having a large number of input nodes and a small number of hidden nodes.
In the test process, the mode controller 9 controls the input selector 13 to switch the process from the training process to a test process to evaluate the generalization ability. Also, the set of test input signals is fed to the terminal 2 and the corresponding teacher signals are fed to the terminal 3 through the input selector 12. In the error pattern detector 8, the binary output unit signal is compared with the corresponding teacher signal to detect an erroneous binary output unit signal.
If the multi-layered neural network 1 does not converge in binary space due to trapping in the local minima, a very low generalization property is obtained due to the large number of erroneous binary output unit signals for the set of test input signals. Even if the neural network with the prior error perturbation method is converged in binary space, an extremely high generalization property with 100% correctness, in which the set of binary output unit signals is entirely correct for the set of test input signals, is not necessarily achieved. A large number of erroneous binary output signals is produced for a set of test input signals with a large volume due to the reduction of the margin by D.sub.1 for providing correct binary output unit signals. There is also another reason, that is the non-optimum allocation of the set of input signals to the sets of training and test input signals.
A general learning method of the binary multi-layered neural network, which can achieve both complete convergence in binary space for the set of training input signals and also achieve 100% generalization for the set of test input signals, has not been established.
As aforementioned, earlier learning procedures for a multi-layered neural network have the disadvantages that either a huge number of training cycles for updating weighting factors is required to achieve convergence due to a stagnation of convergence speed caused by the error perturbation, or convergence cannot be obtained even by continuing the training process due to trapping in tenacious states with local minima and cannot also necessarily achieve an extremely high generalization ability.
Particularly, an easy and reliable learning method for a multi-level neural network with a large number of input units, a small number of output units and a teacher pattern having a distributed representation, which has the difficulty of achieving both complete convergence in binary space with a very small number of training cycles and very high generalization, has not been established. The generalization ability generally degrades due to a surplus number of hidden units and over-learning, whereas the use of a large number of hidden units can make the neural network stably converge. These approaches also inevitably require a huge amount of computations and a very large hardware complexity.
These shortcomings make it very difficult to realize multi-level neural networks for real time applications which require a very high generalization ability for unknown input signals and retrain the neural network for newly obtained input signals providing erroneous output signals.