1. Technical Field
The present invention relates to a learning process system for use in a network structure data process apparatus.
2. Background Art
In a conventional sequential processing computer (Neumann type) it is difficult to control a data process function in accordance with a variation in the usage method or environment. Therefore, an adaptive data processing method utilizing a parallel distribution system in a layered network is proposed. The back propagation method (D. E. Rumelhart, G. E. Hinton, and Williams, "Learning Internal Representations by Error Propagation", PARALLEL DISTRIBUTED PROCESSING, Vol. 1, pp. 318-364, The MIT Press, 1986) receives particular attention because of its high practicality.
The back propagation method utilizes a layered structure network comprising a node called a basic unit and internal connection having weights presented. FIG. 1 shows the principle structure of a basic unit. This basic unit 1 comprises a multiple-input single-output system and further comprises a multiplication unit 2 for multiplying the plurality of inputs by weights of the internal connections respectively, an accumulating unit 3 for adding all the multiplied results, and a threshold value processing unit 4 for outputting a final output by applying a nonlinear threshold value process to the added values. Many basic units 1 thus formed are connected in layers as shown in FIG. 2, thereby forming a layered network, and converts a pattern of an input signal to a corresponding output signal, thereby performing a data processing function.
The back-propagation method determines the weight of an internal connection within a layered network in accordance with a predetermined learning algorithm so that the output signal for the selected input signal becomes a teacher signal for designating the signal value to be satisfied. When this process determines the weight, and even if an unexpected or unsought signal is input, the layered network outputs a signal similar to the input signal, or an output signal which looks like the input signal, thereby realizing a data processing function for a "flexible" parallel distribution.
In order to practically realize a network structure data process apparatus such as the one recited above, it is necessary to realize a weight learning process within a shorter time period. In solving this problem, the background that a layered network is to be formed of multi-layers in order to realize a complex data process must be taken into account.
If it is determined that h layer is a pre-stage layer and i layer is a post-stage layer, the arithmetic operation performed by accumulation process unit 3 of and basic unit 1 is as shown by the following equation (1) and the arithmetic operation conducted by threshold value process unit 4 is shown by the following equation (2). ##EQU1## where, h: a unit number of the h layer,
i: a unit number of the i layer, PA1 p: a pattern number of an input signal, PA1 .theta..sub.i : a threshold value of the ith unit in the i layer, PA1 W.sub.ih : a weight of an internal connection between h-i layers, PA1 x.sub.pi : sum of the products of an input from respective units in the h layer to the i unit in the i layer, PA1 y.sub.ph : the output of the h layer for the input signal of the P-pattern, PA1 y.sub.pi : the output of the i layer for the input signal of the P-pattern, PA1 W.sub.ji : a weight of an internal connection between i-j layer, PA1 x.sub.pj : the sum of the products of the input from respective units in the i layer to the j unit in the j layer, PA1 y.sub.pj : the output in the j layer for the input signal of the p pattern, PA1 E: the sum of the error vectors for the input signals of all the patterns, PA1 d.sub.pj : the teacher signal for the j unit in the j layer with regard to the input signal of the p pattern,
The back propagation method adaptively and automatically adjusts an error in a feedback between weight W.sub.ih and threshold value .theta..sub.i. As is clear from the equations (1) and (2). It is necessary to simultaneously carry out the adjustment of weight W.sub.ih and the threshold value .theta..sub.i but this operation is difficult where one is interfered with by the other. Therefore, the present applicant proposes that basic unit 1, having normally "1" as an input signal, is provided in a layer on the input side and threshold value .theta..sub.i is combined with weight W.sub.ih and therefore the threshold value .theta..sub.i does not appear externally, is disclosed in "patent application sho 62-333484 publication filed on Dec. 28, 1987 and titled "Network Structure Data Process Apparatus". The above equations (1), (2) can be expresd as follows. ##EQU2##
A prior art technology of the weight learning process is explained in accordance with equations (3) and (4). The following explanation is made by using a layered network comprising a structure of h layer-i layer-j layer, as shown in FIG. 2.
The following equations can be derived from equations (3) and (4). ##EQU3## where, j: a unit number of the j layer,
According to a weight learning process, an error vector E.sub.P of the sum of the second powers of an error between a teacher signal and an output signal from the output layer is deemed to be an error in the layered network, thereby performing a calculation. The teacher signal is a signal which should be achieved by the output signal. ##EQU4## where, E.sub.p : an error vector for an input signal of the P-pattern,
In order to obtain a relation in an error vector and an output signal, equation (7) is subjected to a partial differentiation with regard to y.sub.pj, is obtained. ##EQU5## Further, in order to obtain a relation between an error vector E.sub.p and the j-th layer, error vector E.sub.P is partially differentiated by x.sub.pj, ##EQU6## is obtained. Further, in order to obtain a relation between error vector E.sub.P and a weight between i-j layers error vector E.sub.P is partially differentiated by W.sub.ji. ##EQU7##
Thus, the sum of the products expressed by the above equation can be obtained as a resolution.
Next, the variation of error vector E.sub.P for the output y.sub.pi in the i layer is as follows. ##EQU8##
Next, a variation of the error vector for the variation of the sum x.sub.pi of the input to be supplied to the i layer input unit is calculated and then a solution expressed by the above sum of the products equation can be obtained. ##EQU9## Further, a relation of a variation of an error vector for the variation of the weight between h-i layers is provided by the following equation, the solution is expressed by the sum of the products. ##EQU10##
Based on the above solution, the relation between the error vector and the weight between i-j layers for all the input patterns is obtained as follows. ##EQU11##
The relation between the error vector and the weight between h-i layers for the entire input pattern is as follows. ##EQU12##
Equations (15) and (16) show a ratio of a variation of an error vector for the variation of the weight between layers. When the weight is caused to vary so that the variation ratio becomes normally negative, the sum E of the error vectors can be gradually made 0 in accordance with the well-known gradient method. The conventional back propagation method determines the updating quantity .DELTA.w.sub.ji and .DELTA.w.sub.ih and one weight updating operation is determined as follows. The weight updating operation is repeated, thus converging the total sum E of the error vectors to the minimum value. ##EQU13## wherein, .epsilon. represent control parameters for learning.
The biggest problem of the back propagation method is that the number of learning operations necessary for convergence is large. This is more pronounced with a multiple network structure. To accelerate convergence, the data element relating to the updating quantity of the weight determined upon the previous updating cycle is added to equations (17) and (18). Then .DELTA.w.sub.ih and .DELTA.w.sub.ji are as follows. ##EQU14## wherein, .alpha. also represents a control parameter for the learning and t represents the number of updating operations.
When .epsilon. and .alpha., representing control parameters, are determined to be small, the sum E of the error vectors almost certainly converges and the number of learning operations required for convergence is made large. If both parameters are determined to decrease the number of learning operations, there is a fear that the sum E of the error vectors oscillates. When the number of units in the input layer is "13" and the number of units in the intermediate layer is "8", the number of units in the output layer is determined to be "7", thereby forming a layered network. Then the learning result is obtained by performing a learning using 62 input pattern signals shown in FIG. 4 and the corresponding teacher pattern signals and is shown in FIG. 5. In FIG. 5, the abscissa designates the number of learning operations and the ordinate axis designates the sum of the error vectors. The control parameter is determined .epsilon.=0.3, .alpha.=0.2. For this reason, the 13th basic unit always receives "1".
As is clear from FIG. 5, although there are some variations depending on the difference in the setting of the parameter, a large number of learning operations is required in the prior art to determine the weight.