The present invention relates to a learning machine for data processing with multi-input single-output circuits connected in a hierarchical structure.
An example of the conventional learning machine with multi-input single-output circuits connected in a hierarchical structure is disclosed in e.g. D.E. Rummelhart et al. "Learning representations by back-propagating errors", Nature Vol. 323 No. 9 (1986). In the conventional learning machine with multi-input single-output circuits connected in a hierarchical structure, each multi-input single-output circuit adds up input signals weighted,. and subjects the resulting signal to a conversion scheme having a saturation characteristic; the output signal thus provided is expressed by ##EQU1## where y[j]is an output signal from a j-the multi-input single-output circuit, y[i]is an output signal from an i-th multi-input single-output circuit in a previous layer, and w[i,j]is the weight charged on the output signal from the i-th multi-input single-output circuit in the previous layer when it is supplied to the j-th multi-input single-output circuit. fnc( ) is a function having a saturation characteristic which can be expressed by e.g. a sigmoidal function ##EQU2##
A learning machine is usually structured in such a manner that multi-input single-output circuits are connected in a hierarchical structure and learns to provide a desired output signal (hereinafter referred to as a supervising signal) in response to input signals. In the learning, an error E is acquired from the supervising signal and the actual output signal in accordance with Equation (2) ##EQU3## where y.sub.p &gt;[j]is an output signal from a j-th multi-input single-output circuit in an output layer for a p-th input pattern, t.sub.p [j]is a supervising signal for Y.sub.p [j], .SIGMA..sub.p is a sum for all the input patterns, .SIGMA..sub.j is a sum for all the output signals in the output layer, and w is a vector including weight w[i,j]as a component (hereinafter referred to as a weight vector).
As shown in Equation (2), the error E can be expressed as a squared sum of a difference between a supervising signal and an actual output signal and so is a function of the weight vector w. A purpose of learning is to change the weight vector to minimize the difference between the supervising signal and the actual output signal, i.e. the error E. The amount of changing the weight vector is decided by ##EQU4## where .epsilon. is a positive constant referred to as a learning rate, .alpha. is a positive constant referred to as an accelerating parameter, .differential.E/.differential.W is a vector including as a component the differentiation of the error expressed by Equation (2) by the weight w[i,j], which is referred to as the steepest descent direction, and w is a vector representation of the weight changing amount in the previous learning. Such a learning algorithm is generally referred to as an error back-propagation method.
In the learning of the conventional learning machine with multi-input single-output circuits connected in a hierarchical structure, the learning rate .epsilon. and the accelerating parameter .alpha. are fixed; they are decided through experience or trial and error and so are not necessarily optimum values. This lengthens the time required for learning. Moreover, the back-propagation method based on the steepest descent method, used for minimizing the error which is the purpose of learning, is not always optimum. Further, the learning may possibly be fallen into a non-effective state where further learning does not reduce any error.