The present invention relates to a learning system for a data processing apparatus for determining an internal state value which defines a pattern conversion in order to realize a data processing function by a learning process and more particularly to a learning system for a data processing apparatus capable of quickly learning an internal state value.
In a conventional sequential processing computer (Neuman type), the data processing function cannot be controlled in accordance with a change in the usage method and environment. Therefore, an adaptive data processing apparatus, represented by a parallel distribution process system with a network structure mainly in the field of pattern recognition and adaptive filter technology, is required. Such an adaptive data processing apparatus needs to obtain, by a learning process, an internal state value defining the data processing function.
Of the systems for learning the internal state values in data processing apparatuses, the process method called the back propagation method (D. E. Rumelhart, G. E. Hinton, and R. J. Wiliams, "Learning Internal Representations by Error Propagation, "PARALLEL DISTRIBUTED PROCESSING, Vol. 1, pp. 318-364, The MIT Press, 1986) receives attention because of its high practicality.
In a data processing apparatus with a layered network structure, the layered network comprises a kind of node called a basic unit and an internal connection with a weight value corresponding to an inner state value.
FIG. 1 shows the structure of a basic unit 1, which comprises a system with multiple inputs and a single output. It comprises a multiplication unit 2 for multiplying the weight of an internal connection by a plurality of inputs, an accumulating unit 3 for adding all the multiplied results, and a threshold value processing unit 4 for performing a non-linear threshold value process on an accumulated value and outputting a final value, and is composed of a layered network, as shown in FIG. 2. The data processing function is conducted to convert the input signal (input pattern) to a corresponding output signal (output pattern). According to the back propagation method applied to the layered network, the output signal corresponding to the input signal provided for a learning is expected to become a teacher signal (teacher pattern) for designating the signal value. Thus, the weight value of an internal connection of a layered network is determined in accordance with a predetermined learning algorithm. When the value is determined by this process, the layered network outputs the new output signal, even if an unexpected input signal is applied thereto. Thus, a flexible data processing function is provided. However, the learning process must be realized within a shorter time period. That is, it is necessary to shorten the time period required for learning different patterns, including interference between patterns, so that interference between patterns is avoided and different patterns are separated from each other. To realize the required complex data process, the layered network structure must also be complicated. This makes it difficult to shorten the process.
A conventional back-propagation system for use in an adaptable data processing apparatus with a layered network structure is now explained in detail. Supposing that an h layer is a prestage layer, and an i layer is a post-stage layer. An arithmetic operation conducted in the arithmetic processing unit 3 within a basic unit 1 is expressed as shown in equation (1). An operation performed by threshold value unit 4 is described by the following equation (2). ##EQU1## where, h: the unit number of an h layer,
i: the unit number of an i layer, PA1 p: the pattern number of the input signal, PA1 .theta..sub.i : the threshold value of the ith unit of the i layer PA1 W.sub.ih : the weight value of an internal connection between h-i layers, PA1 x.sub.pi : the sum of the products of the input from respective units in h layer to the ith unit in the i layer, PA1 Y.sub.ph : the output from the hth unit of the h layer for the input signal of the pth pattern input signal PA1 Y.sub.pi : the ith unit of the i layer corresponding to the input signal of the pth pattern. PA1 W.sub.ji : a weight value of the internal connection between i-j layers, PA1 x.sub.pj : the sum of the products of the input from respective units of an i layer to the j layer unit of the j layer. PA1 Y.sub.pi : the output from the jth unit of the j layer for an input signal of the pth pattern PA1 E: the sum of the error vectors for the input voltage with regard to all the patterns. PA1 d.sub.pj : a teacher signal to the jth unit of the jth layer for the input signal of the pth pattern.
The back propagation system performs an automatic adjustment adaptively by using a feed-back of the weight value W.sub.ih and the threshold value .theta..sub.i. As is clear from equations (1) and (2), it is necessary to change the weight value Wih and the threshold value .theta..sub.i simultaneously. However, this is difficult because of mutual interference. Therefore, the present applicant proposes that a basic unit 1 with "1" as an input signal be provided at a layer of the input side and the threshold value .theta.i be combined within the weight value Wih, thereby preventing threshold value .theta.i from appearing in equation (2). Thus, equations (1) and (2) are expressed as follows. ##EQU2##
A conventional process for learning the weight value is explained in accordance with equations (3) and (4). This explanation can be conducted by a layered network having a structure comprising an h layer, an i layer and a j layer, as shown in FIG. 3.
The following equation can be obtained from equations (3) and (4). ##EQU3## where, j: the unit number of j layer,
According to the process for learning the weight value, the error vector Ep is first based on the following equations. It is the second power sum of the errors between the teacher signal and the output signal from the output layer and is calculated as an error produced by the layered network. The teacher signal is the signal which the output signal should become. ##EQU4## where, Ep: an error vector for the input signal of the pth pattern,
It may be necessary to perform a partial differential of y.sub.pj in equation (7) to obtain the relation between the error vector and the output signal. ##EQU5##
Further, to obtain the relation between error vector Ep and the input to the j layer, error vector Ep should be partially differentiated with respect to x.sub.pj in accordance with the following equation. ##EQU6##
Further, a relation between error vector Ep and a weight value between the i-j layers is obtained. Thus, error vector Ep is subjected to a partial differentiation with respect to W.sub.ji. ##EQU7##
Thus, the solution represented by the above sum of the products is obtained.
Next, a variation of error vector Ep for an output Ypi of an i layer is obtained as follows. ##EQU8##
Further, a variation of an error vector for a variation of a total sum x.sub.pi to an i layer input unit is calculated. Thus, ##EQU9##
The solution represented by the above sum of the products can be obtained. Further, the relation between the variation of the error vector and the variation of the weight value between the h-i layers is obtained as follows. ##EQU10##
The solution represented by the above sum of the products can be thus obtained. The relation between the error vector and weight value between i-j layers for the total input patterns is obtained, and is expressed by the following equation. ##EQU11##
The relation between the error vectors for all the input patterns and the weight value between the h-i layers can be obtained as follows. ##EQU12##
Equations (15) and (16) represent a rate of change ratio of an error vector to a variation in a weight value between the layers. If the weight is changed so that the rate of damage is always made negative, according to the so-called gradient method, the sum of error vectors can be gradually converged to 0. In the conventional back propagation system, an updated quantity .DELTA.W.sub.ji and .DELTA.W.sub.ih per unit time are determined as follows and the updating of these weight values are repeated, thereby converging a sum E of the error vectors to a minimum value. ##EQU13## where .epsilon.(&gt;0) represents a learning constant of a learning parameter.
Further, in the conventional back propagation system, in order to accelerate convergence to a minimum value, the data factor relating to the updated quantity of the weight value determined by the previous updating cycle, .DELTA.W.sub.ih and .DELTA.W.sub.ji are determined as follows ##EQU14## where .zeta.(&gt;0) represents the momentum of the learning parameter and t represents the number of updating operations.
In the process of learning the weight value according to the back propagation method, the back propagation method is applied to a learning signal group (a pair comprising an input signal and a teacher signal) provided for a learning. Then the sum E of the error vectors is converged to a minimum value. Thus, the output signal corresponding to the teacher signal can be outputted. Therefore, the learning signal is first determined and then the process of learning the weight value starts. In the process of learning the internal state value, in a data processing system with other network structures and other adaptabilities as well as the data processing apparatus of the layered network structure, a structure in which the process of learning the internal state value starts after the learning signal is first determined for learning the internal state value has been employed.
However, in such prior art technology, as is clear from the explanation of the above back propagation system, the number of calculation steps in the learning process increases exponentially with the number of learning signals provided for learning the internal state value. Therefore, to realize a high degree of data process function, the layered network is made complicated and many learning signals are required to determine the internal state value. Thus, an extremely long time is required for the learning process, thereby causing a problem. For example, if patterns A, B, and C are to be determined first, a learning cycle comprising patterns A, B and C is repeated so that interferences between A and B, B and C, C and A, and A, B and C must be learned to separate patterns A, B, and C. If the number of such patterns increases, interference between them also increases, thus requiring an even longer time.
In the prior art, if an unexpected learning pattern, for example, D, appears, it is added to learning patterns A, B and C obtained up to that point, thereby providing a further pattern D to be learnt. Thus, the learning of the internal state value re-starts from the beginning in the sequence A, B, C and D. However, if an adaptive data processing apparatus such as a neural network is used it is difficult to provide all the learning signals which can occur, from the beginning, in order to avoid the new learning pattern. Therefore, the prior art, has the problem that whenever a new learning pattern is received, a long learning process time is required.
Further, in the above prior art, the number of learning patterns is small and learning has to be repeated for learning signals whose learning has been completed. Therefore, it takes an extremely long time to perform a process of learning an external state value. This problem becomes extremely serious when the layer network for realizing a high degree of data process function becomes complicated and when more learning patterns are required to determine the internal state value.