1. Field of the Invention
The present invention relates to learning machines in which a plurality of multi-input/single-output signal processors of a data processor are connected in a hierarchical structure.
2. Description of the Related Art
A conventional learning machine is disclosed, for example, in D. E. Rumelhart, G. E. Hinton and R. J. Williams "Learning Representations by Back-Propagating Errors", NATURE, Vol. 323, pp. 533 - 536, Oct. 9, 1986. As shown in FIG. 9, this machine comprises an output signal computing unit and a weight coefficient updating unit. The output signal computing unit has a hierarchical structure having a plurality of multi-input/single-output signal processing units 600 connected to form a network with no mutual coupling between respective processing units 600 of the same hierarchy so that signals are propagated only in the direction of a higher hierarchy. Any particular multi-input/single-output signal processing unit 600 functions to form an output value thereof through a conversion process using a threshold function to be performed on a total sum of the products of outputs from respective multi-input/single-output signal processing units 600 of a lower hierarchy connected to the particular multi-input/single-output signal processing unit 600 and weight coefficients indicative of the respective degrees of connection therebetween and then to transmit the output value to a higher hierarchy multi-input/single-output signal processing unit 600. In the weight coefficient updating unit, a teacher signal generator 602 responds to signals inputted to input units 601 of the output signal computing unit to generate a teacher signal t.sub.k as a desired output signal for the inputted signals. An error signal computing unit 603 computes a square error given by ##EQU1## thus using a difference between the teacher signal t.sub.k and an actual output signal o.sub.k (indicative of an output value of the highest-hierarchy k-th multi-input/single-output signal processing unit in the output signal computing unit) from the output signal computing unit, and evaluates the performance of the network under the current state of connection (represented by the magnitude of the weight coefficients) in accordance with the resulting value of the square error. A weight change quantity computing unit 604 calculates weight change quantities .DELTA.w.sub.ij for the weight coefficients of the output signal computing unit on the basis of the calculated error E by using the following formula: EQU W.sub.ij = -.epsilon..multidot..delta.E/.delta.W.sub.ij
where .epsilon. is a positive constant called a learning rate. By repeating the updating of the weight coefficients as mentioned above, the error is reduced gradually to have a sufficiently small value, at which time the learning is ended by regarding that the output signal has become sufficiently close to a desired value.
With such a conventional learning machine, learning is performed such that the total sum of the square errors E is minimized. Therefore, even if any multi-input/single-output signal processing unit 600 involving a large error remains, error reducing operations are performed by changing the weights for all the multi-input/single-output signal processing units including units which involve sufficiently small errors, and, when the total sum of the square error E is reduced, the weights are changed regardless of any other conditions, even if such a multi-input/single-output signal processing unit involving a large error still remains. Therefore, the minimization of the total sum of the square errors does not necessarily cause a change of the weight to be made so as to reduce an error of the processing unit involving a large error, so that there occurs a case that only the errors involved in a certain multi-input/single-output signal processing unit in the highest hierarchy remain very large without being converged. Thus, in the prior art learning machine, learning requires much time.
Further, there has been another problem that, when the square error E is reduced, the weight coefficients of multi-input/single-output signal processing units involving even a sufficiently reduced error are updated, so that the efficiency of learning is degraded and the time required for learning becomes long.