1. Field of the Invention
This invention relates to a signal processing apparatus or system carrying out signal processing with the use of a so-called neural network made up of a plurality of units each taking charge of signal processing corresponding to that of a neuron, and a learning processing apparatus or system causing a signal processing section by said neural network to undergo a learning processing in accordance with the learning rule of back propagation.
2. Prior Art
The learning rule of back propagation, which is a learning algorithm of the neural network, has been tentatively applied to signal processing, including high speed image processing or pattern recognition, as disclosed in "Parallel Distributed Processing", vol. 1, The MIT Press, 1986 or "Nikkei Electronics, issue of Aug. 10, 1987, No. 427, pp 115 to 124. The learning rule of back propagation is also applied, as shown in FIG. 1, to a multilayer neural network having an intermediate layer 2 between an inputlayer 1 and an output layer 3.
Each unit u.sub.j of the neural network shown in FIG. 1 issues an output value which is the total sum net.sub.j of output values O.sub.i of a unit u.sub.j coupled to the unit u.sub.j by a coupling coefficient W.sub.ji, transformed by a predetermined function f, such as a sigmoid function. That is, when the value of a pattern p is supplied as an input value to each unit u.sub.j of the input layer 1, an output value O.sub.pj of each unit u.sub.j of the intermediate layer 2 and the output layer 3 is expressed by the following formula (1) ##EQU1##
The output value O.sub.pj of the unit u.sub.j of the output layer 3 may be obtained by sequentially computing the output values of the inputs u.sub.j, each corresponding to a neuron, from the input layer 1 towards the output layer 3.
In accordance with the back-propagation learning algorithm, the processing of learning consisting in modifying the coupling coefficient W.sub.ji so as to minimize the total sum E.sub.p of square errors between the actual output value O.sub.pj of each unit u.sub.j of the output layer 3 on application of the pattern p and the desirable output value t.sub.pj, that is the teacher signal, ##EQU2## is sequentially performed from the output layer 3 towards the input layer 1. By such processing of learning, the output value O.sub.pj closest to the value t.sub.pj of the teacher signal is output from the unit u.sub.j of the output layer 3.
If the variant .DELTA.W.sub.ji of the coupling coefficient W.sub.ji which minimizes the total sum E.sub.p of the square errors is set so that EQU .DELTA.W.sub.ji .varies.=.differential.E.sub.p /.differential.W.sub.ji ( 3)
the formula (3) may be rewritten to EQU .DELTA.W.sub.ji =.eta..multidot..delta..sub.pj O.sub.pj ( 4)
as explained in detail in the above reference materials.
In the above formula (4), .eta. stands for the rate of learning, which is a constant, and which may be empirically determined from the number of the units or layers or from the input or output values. .delta..sub.pj stands for the error corresponding to the unit u.sub.j.
Therefore, in determining the above variant .DELTA.W.sub.ji, it suffices to compute the error .delta..sub.pj in the reverse direction, or from the output layer towards the input layer of the network.
The error .delta..sub.pj of the unit u.sub.j of the output layer 1 is given by the formula (5) EQU .delta..sub.pj =(t.sub.pj -O.sub.pj)f'.sub.j (net.sub.j) (5)
The error .delta..sub.pj of the unit u.sub.j of the intermediate layer 2 may be computed by a recurrent function of the following formula (6) ##EQU3## using the error .delta..sub.pk and the coupling coefficient W.sub.kj of each unit u.sub.k coupled to the unit u.sub.j, herein each unit of the output layer 3. The process of finding the above formulas (5) and (6) is explained in detail in the above reference materials.
In the above formulas, f'.sub.j (net.sub.j) stands for the differentiation of the output function f.sub.j (net.sub.j).
Although the variant w.sub.ji may be found from the above formula (4), using the results of the formulas (5) and (6), more stable results may be obtained by finding it from the following formula (7) EQU .DELTA.W.sub.ji(n+1) =.eta..multidot..delta..sub.pj +60 .multidot..DELTA.W.sub.ji(n) ( 7)
with the use of the results of the preceding learning. In the above formula, .alpha. stands for a stabilization factor for reducing the error oscillations and accelerating the convergence thereof.
The above described learning is repeated until it is terminated at the time point when the total sum E.sub.p of the square errors between the output value O.sub.pj and the teacher signal t.sub.pj becomes sufficiently small.
It is noted that, in the conventional signal processing system in which the aforementioned back-propagation learning rule is applied to the neural network, the learning constant is empirically determined from the numbers of the layers and the units corresponding to neurons or the input and output values, and the learning is carried out at the constant learning rate using the above formula (7). Thus the number of times of repetition n of the learning until the total sum E.sub.p between the output value O.sub.pj and the teacher signal t.sub.pj becomes small enough to terminate the learning may be so large as to render the efficient learning unfeasible.
Also, the above described signal processing system is constructed as a network consisting only of feedforward couplings between the units corresponding to the neurons, so that, when the features of the input signal pattern are to be extracted by learning the coupling state of the above mentioned network from the input signals and the teacher signal, it is difficult to extract the sequential time series pattern or chronological pattern of the audio signals fluctuating on the time axis.
In addition, while the processing of learning of the above described multilayer neural network in accordance with the back-propagation learning rule has a promisingly high functional ability, it may occur frequently that an optimum global minimum is not reached, but only a local minimum is reached, in the course of the learning process, such that the total sum E.sub.p of the square errors cannot be reduced sufficiently.
Conventionally, when such local minimum is reached, the initial value or the learning rate .eta. is changed and the processing of learning is repeated until finding the optimum global minimum. This results in considerable fluctuations and protractions of the learning processing time.