The present invention relates to a neural network learning apparatus and a learning method, and in particular, relates to a neural network learning apparatus and learning method which can be applied to voice recognition, voice synthesis, character recognition, robot control and stock prediction and the like.
In recent years, there has been much active research performed for the use of neural networks in voice and image recognition, and time system prediction and the like. There have also been various products in which neural networks have been applied. One example is disclosed in WATANABE and YONEYAMA "Method of Ultrasonic Three Dimensional Object Cognition" treatise US90-29 (1990). Favorable results were obtained for the example of a neural network disclosed in this literature referred to above.
The method of learning in such neural networks is known as the back-propagation error learning method (hereinafter termed the BP method) and the following discussion is for a neural network learning method using the BP method.
First, the neural network learning problems are made into equations. A learning problem of a neural network of the three-layer perceptron type is a problem which can be approximated by the linear sum ##EQU1##
a.sub.i : synapse weight input to the output layer from the hidden layer
(i=1,2, . . . N, N is the number of neurons in the hidden layer) PA1 w.sub.i : synapse weight input to the hidden layer from the input layer
.theta..sub.i : bias of the hidden layer
.epsilon.(x): error
of the monotone increasing function .sigma. (x) which is determined beforehand for the given function f(x) which is called the teaching pattern.
More specifically, the learning problem is to use the error function .epsilon.(x) as the required reference to determine the synapse weight a.sub.i, the bias .theta..sub.i, and the synapse weight w.sub.i so that the error with the teaching pattern is minimized. In equation (1), the function .sigma.(w.sub.i x+.theta..sub.i) indicates the output value for the i'th unit of the hidden layer. The function .sigma. (x) may for example use the sigmoid function {1/(1+exp (-x))}.
The method of determining the synapse weight a.sub.i and w.sub.i, and the bias .theta..sub.i is by using the BP method and equation (2) to first determine the error function E and then minimise this error function E using the gradient descent method. ##EQU2## The bias is solved by gradient descent method as follows using Error function E. EQU .DELTA.a.sub.i =-.eta..differential.E/.differential.a.sub.i EQU .DELTA.w.sub.i =-.eta..differential.E/.differential.w.sub.i EQU .DELTA..theta..sub.i =-.eta..differential.E/.differential..theta..sub.i
The BP method is for correcting repeated errors in the output with respect to the input of the neural network and corrects the parameters of each unit, and the correction equation is used in the gradient descent method with respect to the square error (evaluating function) of the output from the output layer.
In the BP as described above, it is possible to have learning in the hidden layers of the neural network of a multi-layered neural network of three or more layers for which learning has proved difficult to achieve without the BP method so the learning performance of neural networks can be increased.
However, the BP method is for learning by the gradient descent method which begins at a random initial value and so learning requires a large amount of time, and there is the possibility that learning may be deficient if the learning leads to a local minima. Furthermore, the BP method cannot determine what characteristic of the data is being learned and so it is not possible to interpret the learned results. Still furthermore, the role of the hidden layer in the neural network is not clear and so the number of units required in the hidden layer cannot be determined by the results of simulation. Not only this, the learning does not proceed when there is only a small number of units, and there is excessive learning when there are too many units. Also, it is not possible to make predictions since the output with respect to the combinations that could not be given as teaching data is different for each learning.
Because of these problems, there have been many attempts to correct the deficiencies of the BP method. However, all of these methods have not been able to clarify the mathematical nature of a neural network and so effective improvement has not been made. The reason why it has not been possible to clarify the mathematical nature of a neural network is because of the non-linear characteristics of neural networks.