1. Field of the Invention
The present invention relates to a neural network and a method for operating the same.
2. Description of the Prior Art
Generally, "neural network" means a network which realizes technologically the structure of a brain of an organism and its operating principle.
Such a general neural network has a structure comprising multiple layers each having a finite number of neurons, as shown in FIG. 1a. Each neuron of each layer is connected to neurons of neighboring layers. Such a connection characteristic is modeled as a value indicative of a connection strength.
The neural network serves mainly to vary such a connection strength, in order to approximate a given function to a desired or higher accuracy.
A method for determining such a connection strength to obtain a specific output for a specific input is called a learning rule.
FIG. 1b is an enlarged view of a region k in FIG. 1a, showing a condition that each neuron of each layer is connected to neurons of neighboring layers, with different connection strengths.
In FIG. 1b, x.sub.i (x.sub.1, x.sub.2, . . . x.sub.n) denotes an input value received in each neuron and w.sub.i (w.sub.1, w.sub.2, . . . w.sub.n) denotes a connection strength between each neuron and each input value x.sub.i. Also, the symbol denotes the threshold. Herein, n represents a positive integer.
Consequently, the neural network provides a desired output value for a given input value, by varying a corresponding connection strength w.sub.i.
In initial neural networks research, a basic model was proposed as a kind of neural network, that is called a single layer perceptron. The single layer perceptron includes one input layer and one output layer. In this case, connection strengths, namely, weights, constitute one of the input/output layers. For controlling these weights, a least mean square learning process is mainly utilized.
A procedure for operating the single layer perceptron will now be described, in conjunction with FIG. 6a.
First, consider the single layer perceptron. In this case, input data are n-dimensional vectors X.sub.i (X.sub.i =[x.sub.1, x.sub.2, . . . x.sub.n ]) and output data are p-dimensional vectors Y.sub.i (Y.sub.i =[y.sub.1, y.sub.2, . . . y.sub.n). The weight for connecting the i-th neuron of an output layer and the j-th neuron of an input layer can be expressed as w.sub.ij.
Accordingly, an output obtained when an input of X.sub.i is applied to the single layer perceptron can be computed by performing the vector-matrix multiplication and addition and expressed by the following equation (1): ##EQU1##
wherein, f represents a nonlinear function.
For learning m number of learning data (X.sub.1,Y.sub.1), (X.sub.2,Y.sub.2) . . . (X.sub.m,Y.sub.m) given for input values X.sub.i and output values Y.sub.i in the single layer perceptron, it is required to control the weights according to the least mean square learning process, as expressed by the following equation (2): EQU W.sub.ij (k+1)=W.sub.ij (k)+[Y.sub.i (k)-d.sub.i (k)]x.sub.j (k) (2)
wherein, d.sub.1 (k) represents a desired output and Y.sub.i (k) represents an actual output.
Upon receiving one input X.sub.i of the learning data, first, the single layer perceptron first detects an output error by deducting the actual output d.sub.i from the desired output Y.sub.i, in accordance with the equation (2). Based on the obtained output error, the weight between the input layer and the output layer is then varied. Thereafter, a check is made to determine whether the output error has been reduced to a desired Level. Where the output error has been reduced to the desired level, the learning procedure is completed. If not, the procedure returns to the step of detecting the output error.
As mentioned above, the single layer perceptron has an advantage that the learning can be accurately achieved at a high rate, since the neutrons of the input layer are directly connected with each neutron of the output layer, with independent weights. However, it involves a disadvantage that a linearly separable problem can only be solved. A structure (X.sub.i,Y.sub.i) for directly connecting the input X.sub.i and the output Y.sub.i, such as the single layer perceptron, is called a direct association. Therefore, the neural network such as the single layer perceptron can be viewed as a content addressable memory (CAM) which serves as an associative memory.
Generally, upon receiving an input, such an associative memory or CAM derives an output associated with the received input.
In terms of a concept corresponding to a location addressable memory (LAM), the CAM has an advantage that stored information can be associated by only a partial representation thereof.
When some pattern recognition problem can be accurately separated into n-1-dimensional Hyper planes in a n-dimensional space, this problem is called a linearly separable problem.
FIG. 3a illustrates an example of such a linearly separable problem in a two-dimensional space. As shown in FIG. 3a, planes A and B in the two-dimensional space can be accurately linearly separated.
On the other hand, a pattern recognition problem which can not be viewed as the linearly separable problem is called a nonlinearly separable problem.
Referring to FIG. 3b, there is illustrated an example of such a nonlinearly separable problem in a two-dimensional space. As shown in FIG. 3b, planes X and O in the two-dimensional space can not be linearly separated. Accordingly, the nonlinearly separable problem can be viewed as one corresponding to XOR, namely, an exclusive OR logic, as depicted in a truth table.
However, a small minority of pattern recognition problems belong to the linearly separable problem, while the majority belong to the nonlinearly separable problem.
Accordingly, the single layer perceptron embodying the concept of direct association can not solve the nonlinearly separable problem as shown in FIG. 3b, due to a limited representation capability of the network itself, as proved by Minskey and Papert (M. L. Minskey and S. A. Papeft, Perceptron: "An Introduction to Computational Geometry", Cambridge, MA:MIT Press, expanded edition, 1988). As another kind of neural network, a multilayer perceptron has been proposed, which is a cascade of single layer perceptrons.
A multilayer perceptron is adapted to eliminate the disadvantage of the single layer perceptron, i.e., its limited capability of only solving the linearly separable problem. As shown in FIG. 2b, the multilayer perceptron is a neural network comprising at least three layers including one input layer, one output layer and a hidden layer interposed between the input layer and the output layer. The multilayer perceptron realizes the concept of indirect association by associating input states X.sub.i and output states Y.sub.i through intermediate states Z.sub.i of a hidden layer. As shown in FIG. 4, the direct association (X.sub.i, Y.sub.i) is considered as the logical implicative rule: IF X.sub.i, THEN Y.sub.i, namely, the IF-THEN RULE, whereas the indirect association is considered as the logical syllogism: IF X.sub.i, THEN Z.sub.i and THEN Y.sub.i.
Consequently, the indirect association is to produce 3-tuples (z.sub.i,x.sub.i, y.sub.i) by adding intermediate states Z.sub.i between inputs X.sub.i and outputs Y.sub.i, in order to make the direct associations easier. In terms of logic, the indirect association can be interpreted as two direct associations.
In other words, the indirect association is the logical syllogism expressed by the rule: if X.sub.i, then Z.sub.i and if Z.sub.i, then Y.sub.i and can be separated into two direct associations expressed by two rules, that is, the rule: if X.sub.i, then Z.sub.i and the rule: if Z.sub.i, then Y.sub.i. By virtue of such a separation, the multilayer perceptron with one hidden layer can be considered as a cascade of single layer perceptrons.
An error back propagation learning method has been commonly used as a learning method of the multilayer perceptron.
In accordance with the error back propagation learning method, it is possible to approximate any function to a desired or higher accuracy, provided that a sufficient number of neurons are secured. This is proved by Hornik and et al. (K. Hornik, N. Stinchcombe, and H. White, "Multilayer feed forward networks are universal approximators", Neural Networks, Vol. 2, no. 5, pp. 359-366, 1989).
In the sense of the indirect association shown in FIG. 4, the back propagation learning method can be interpreted as the method of automatically discovering the intermediate states Z.sub.i linearly separable with respect to the given input data X.sub.i and output data Y.sub.i.
The principle of the error back propagation learning is as follows: upon receiving one input of learning data, first, a weight (namely, a second weight) between the hidden layer and the output layer is varied, using the error between the desired output and the actual output. That is, a variation in a second connection strength is achieved. Depending on the second weight, a weight (namely, a first weight) between the input layer and the hidden layer is varied.
Operation of the multilayer perceptron will now be described, in conjunction with FIG. 6b illustrating an error back propagation learning procedure which is a gradient descent process carried out by the multilayer perceptron and FIG. 6c illustrating the order of error back propagation learning.
Consider the multilayer perceptron shown in FIG. 2b. The multilayer perceptron structure including hidden layers added to single layer perceptrons provides a possibility that the neural network may solve the nonlinearly separable problem. However, the least mean square learning process utilized for learning the single layer perceptron can not be used for the multilayer perceptron, due to the addition of hidden layers. As a result, a new learning rule is required.
The error back propagation learning rule satisfies this requirement and can be expressed by the following equation (3): EQU w.sub.ij (k+1)=w.sub.ij (k)+.delta..sub.ij O.sub.pj ( 3)
wherein, w.sub.ij represents the connection strength, namely, the weight between the i-th neuron and the j-th neuron and represents a learning constant.
In the equation (3), .delta..sub.ij is an error obtained in the j-th neuron upon receiving the p-th input. The error can be expressed by the following equation (4) for the output layer 4 and the following equation (5) for the hidden layer: ##EQU2##
In the equations (4) and (5), O.sub.pj represents an actual output value of the j-th neuron and d.sub.pj represents a desired output value of the j-th neuron. Also, K represents the number of learning times and i, j, p and k represent positive integers.
FIG. 7a is an energy graph for a case of controlling the weights in accordance with the least mean square learning process. When the weights are converged to minima M, the learning is completed.
FIG. 7b is an energy graph for a case of controlling the weights in accordance with the error back propagation learning method. When the weights are converged to global minima GM, the learning is completed.
Upon receiving one input of learning data, the multilayer perceptron deducts the actual output from the desired output, in accordance with the equations (3), (4) and (5), so as to detect an output error.
Using the detected output error, the second weight between the hidden layer and the output layer is varied. Then, the first weight between the input layer and the hidden layer is varied, in proportion to the varied second weight. A check is then made to determine whether the output error has been reduced to a desired level. When the output error has been reduced to the desired level, the learning procedure is completed. If not, the procedure returns to the procedure of detecting the output error.
The multilayer perceptron utilizing the error back propagation learning method has an advantage of solving the nonlinearly separable problem, as mentioned above. However, it encounters the following problems, as proved by Hornik.
First, the weights are likely to be converged to local error minima LM, as shown in FIG. 7b.
Second, the learning time is very slow, because only the error of the output layer is used for automatically discovering the intermediate states Z.sub.i for obtaining m number of most proper indirect associations U(X.sub.i,Z.sub.i,Y.sub.i) for m number of given associations U(X.sub.i,Y.sub.i) in accordance with the gradient descent process and adjust the weights of each layer.
Third, the learning performance is highly sensitive to the initial weights, as shown in FIG. 7b. In other words, the learning performance is varied, depending on selected initial weights. As a result, the learning performance becomes inconstant. For example, where the initial weights W(0) are positioned at the points A, B, C and D, the learning performances therefor have the following order: EQU A&gt;B&gt;C&gt;D (6)
Fourth, the learning efficiency is varied, depending on the order of presentation of learning data.
In accordance with the above-mentioned concept of indirect association, these problems mean that the neural network, namely, the multilayer perceptron can not determine proper intermediate states Z.sub.i.