The present invention generally relates to the so-called neural or neuronic networks, that is to say to those electronic networks built and operationally organized in analogy to the animal or human brain.
As is known, such networks once built can be considered as a classic, not already programmed computer, in the meaning that, in order to perform any function, they are first of all to be trained to perform it. For instance, if a neural network is to be employed in an apparatus for the character or generally pattern or object recognition which represent one of the typical applications in which a neural network shows its superiority with respect to a classic computer, it has to be first trained to recognize the characters, the shapes or the object. This stage is defined the "learning" stage.
In practice, considering that a neural network has as its basic components the "neurons" and the "synapses" and that the "synapses" are characterized by a "weight",the learning stage consists in imparting to all the synapses of the network the suitable "weights" such that the network recognizes the characters, the patterns or the objects and gives an exact response that therefore coincides with the desired one.
In the multilayer perceptron neural networks the learning stage is commonly realized by means of a "back-propagation" algorithm. An iterative procedure is dealt with, carried out with the aid of a learning computer, during which to the network a number is presented of patterns of the characters, of the shapes or of the objects that it will have to be able to recognize. Not only the responses that the neural network provides, but the exact responses, i.e. those that one desires to obtain too, are provided to the computer. The object of the learning iterative procedure is to reach, by subsequent approximations, a configuration of the synaptic weights with which the difference turns out to be zero (at the theoretic limit), or otherwise at a minimum, between the desired output (target) configurations, i.e. the exact response, and those that the network provides little by little correspondingly to the single configurations that make up the learning configuration set. This minimization of the differences is known under the term of the "convergence" of the configurations.
As is known, the efficiency of the recognition process depends on the number of models or samples ("patterns") presented to the network during the learning stage. However, the number of the iterations necessary for the achievement of the convergence considerably increases as the number of the samples increases, becoming so high (sometimes up to 100,000 ) as to require very long simulation times, even when very powerful computing machines are used.
Another problem resulting from the use of such a multiplication of the number of the iterations is that of the so-called "endurance", i.e. the maximum number of exceeding reprogrammings to which an EEPROM memory cell may be subjected when relied upon as a backup to store synaptic weights according to the teaching of U.S. Pat. No. 5,274,743 which issued Dec. 28, 1993 from U.S. application Ser. No. 828,062 filed Jan. 30, 1992.
However, the problem is dealt with by speeding up the learning process and by reducing the number of the iterations necessary to achieve the convergence of the back-propagation algorithm. Several techniques have been suggested to this end.
However, a brief outline about the back-propagation algorithm is considered to be timely before proceeding to examine such techniques.
In the back-propagation algorithm, the weights are generally initialized at random values. Given an input vector I.sub.p an output vector O.sub.p is obtained. Such a vector is compared with the vector that corresponds to the desired response, T.sub.p, and the corrections are computed to be operated on the weights of the synapses of the neurons of the output (out) layer.
To this end, the quantities: EQU .delta..sub.p, out, j =(T.sub.pj -0.sub.p,out,j)0.sub.p,out,j (1-0.sub.p, out, j) (1)
are computed, 0.sub.p,outj (1-0.sub.p,out,j) being the derivative of the activation (sigmoid) function.
In Formula (1), the first index refers to the sample or "pattern" (p), the second to the layer (out) and the third to the output of the neuron (j).
For the other layers one uses the relations: ##EQU1## S.sub.k being the number of the neurons of the k.sup.th layer and W.sub.(k+1),ij being the weight with which the i.sup.th neuron of the (k+1).sup.th layer filters the j.sup.th input.
Once all the .delta. are known, one is able to compute the corrections that are to be operated on the weights ##EQU2## .eta. and .alpha. being constants and .DELTA.W.sub.kij * being that obtained by the preceding iteration.
The convergence condition is, for all the outputs and all the samples or "patterns" EQU .sub.p,j.sup.max .vertline.T.sub.p,j -0.sub.p,out,j .vertline..ltoreq..epsilon.
where EQU .epsilon.
is a suitably selected value.
In order to minimize the number of the necessary iterations, it has been suggested that the update of the synaptic weights relevant to each iteration be carried out after the presentation of the whole learning set ("learning by epoch", i.e. "set learning") instead of after the presentation of each sample ("learning by pattern", i.e. "sample learning"). It has also been proved that the convergence properties of the algorithm considerably improve if the parameters .eta. and .alpha. are updated each time allowing for the result of the preceding iteration.
Generally, interest has been shown with respect to the problem of eliminating the local minimums of the function to be minimized, in order to limit the risk of a possible "saturation" of the learning process.
It is an object of the present invention to provide a learning process that avoids the difficulties mentioned above and makes possible a rapid achievement of the convergence condition of the back-propagation algorithm, even with a very numerous set of samples.
The learning process in accordance with the present invention, is not a further reedition of the back-propagation algorithm, but establishes some original criteria for the application of the above-mentioned algorithm, based upon iterative techniques for the presentation of the learning samples or patterns to the neural network.