This invention relates to a learning machine capable of learning by means of a hierarchical structure without mutual connections within levels thereof, and having plural multiple-input single-output signal processors connected in a network structure in such a manner that signals propagate only to a higher level.
A conventional learning machine has been described by D. E. Rumelhart, G. E. Hinton, and R. J. Williams in their article "Learning Representations by Back-Propagating Errors" in the Oct. 9, 1986 issue of Nature (vol. 323, pp. 533-536).
FIG. 8 shows a structure of a conventional learning machine. As shown in the block diagram of FIG. 8, this conventional learning machine comprises an output signal calculator 1 and a weight coefficient renewer 2 which renews the value of the weight coefficient of the output signal calculator 1 based on the output signal obtained therefrom. The output signal calculator 1 is a multiple stage circuit network as shown in FIG. 9 comprising plural multiple-input single-output signal processors 3 and plural input means 4. A specific example of the multiple-input single-output signal processor 3 used in this output signal calculator 1 is shown in FIG. 10. As shown in FIG. 10, each multiple-input single-output signal processor 3 comprises plural input 5, a memory 6 to store the weight coefficients which weight the plural inputs from the inputs 5, plural multipliers 7 which multiply the weight coefficients from the memory 6 by the inputs from the inputs 5, an adder 8 which adds the outputs from each of the multipliers 7, and a threshold value processor 9 which limits the output from the adder 8 to a value within a predetermined range. The input/output characteristics of the threshold value processor 9 are shown in FIG. 11.
For example, the input/output characteristics of the threshold value processor 9 which limits the output to a value within the range of (0, 1) can be numerically expressed as EQU f(I)=1/(1+exp(-I+.theta.)).
It is to be noted that the weight coefficient renewer 2 shown in FIG. 8 comprises a teacher signal generator 10, an error signal calculator 11, a steepest descent direction determination means 12, an amount of weight change calculator 13, and a weight modifier 14.
A conventional learning machine constructed as described above operates as described below.
When an input signal is input to the input section 4 of the output signal calculator 1, each of the multiple-input single-output signal processors 3 multiplies the output of each of the lower level multiple-input single-output signal processors 3 connected thereto by a weight coefficient, which represents the significance of the connection and is stored in the memory 6, by means of the multipliers 7 and then obtains the sum of the outputs from the multipliers 7 by means of the adder 8; this sum is converted by the threshold value processor 9 and the resulting value is output to the multiple-input single-output signal processors 3 one level higher. In other words, the multiple-input single-output signal processor 3 shown in FIG. 10 processes the equation EQU o.sub.i =f (.SIGMA..sub.j w.sub.ij o.sub.j)
where o.sub.j is the input value to the input means 5 (the output of the lower level multiple-input single-output signal processor at position j) and w.sub.ij is the weight coefficient stored in the memory 6 (the connection weight of the i position multiple-input single-output signal processor to the lower level j position multiple-input single-output signal processor). FIG. 11 is a graph of the input/output characteristics of a function f which expresses a threshold value process of the multiple-input single-output signal processor 3, wherein I in FIG. 11 is the input value to the threshold value processor 9.
The teacher signal generator 10 in the weight coefficient renewing means 2 generates a desirable output signal for the number p input signal input from the input section 4 of the output signal calculator 1; this output signal is used as the teacher signal t.sub.pk (where t.sub.pk expresses the teacher signal for the output of the number k multiple-input single-output signal processor in the highest level of the output signal calculation means 1). The error signal calculator 11 then obtains the error from the difference between the teacher signal t.sub.pk and the actual output signal o.sub.pk to the number p input signal output from the output signal calculation means 1 (where o.sub.pk expresses the output of the number k multiple-input single-output signal processor in the highest level of the output signal calculation means 1) using the equation ##EQU1## and applies this value to evaluate the performance of the network at the present connection state (weight coefficient value). In equation (1), .SIGMA..sub.p is the sum for all input signals, .SIGMA..sub.k is the sum of the outputs of all multiple-input single-output signal processors in the highest level of the output signal calculator 1, and W is a vector which has the weight coefficient W.sub.ij as each component, and which hereinafter is called a weight vector. The error E becomes a function of the weight vector W. Furthermore, the teacher signal t.sub.pk is a value of either 0 or 1.
Based on the obtained error E, the steepest descent direction determination means 12 computes EQU g=.differential.E/ .differential.W (2)
to obtain the direction of steepest descent, which is the direction of change of the weight vector of the output signal calculator 1. The direction of steepest descent is a vector of which a component is a differential of the error E to the weight coefficient W.sub.ij.
The amount of weight change calculator 13 obtains the amount of change in the weight vector of the output signal calculator from this direction of steepest descent by the equation EQU .DELTA.W=-.epsilon.*.differential.E/.differential.W+.alpha.*.DELTA.W'(3)
where .epsilon. is a positive constant called the learning rate, .alpha. is a positive constant called the acceleration parameter, and .DELTA.W' is the amount of change in the weight vector in the previous learning cycle. The weight modification means 14 changes the weight vector of the output signal calculation means according to the amount of change in the weight vector. The amount of error is reduced by thus repeatedly renewing the weight, and when the error is sufficiently small, it is concluded that the output signal is sufficiently close to the desired value and learning stops.
Because learning is accomplished by minimizing the sum square error E in the conventional learning machine as described above, the weights of multiple-input single output signal processors are simply changed while an error for each of the multiple-input single-output signal processors becomes sufficiently small, if a total error, which is expressed as the total sum square error, decreases, even if another multiple-input single-output signal processor with a large error remains; thus, learning efficiency deteriorates.
Furthermore, minimizing of the sum square error is not limited to changing the weight so that the error of the multiple-input single-output signal processor with the greatest error is reduced, and if the error of only some of the multiple-input single-output signal processors in the highest level remains extremely high without converging and yet the block causing this high error is biased towards the multiple-input single-output signal processors which should output "1" (the teacher signal=1), the convergence speed thereof will deteriorate compared with the multiple-input single-output signal processors which should output "0" (teacher signal=0).
Therefore, the problem exists with the conventional learning machine as described above that the time required for learning becomes greater.