The present invention relates to a neural network learning method particularly suitable for an information processing technology for realizing a pattern identification apparatus.
It is considered that a pattern identification includes steps of expressing a pattern as a multidimensional characteristic value, i.e. pattern vector, and then classifying the vector given as an input signal. Pattern identification technology is applicable in various fields such as: diagnosing problems from given symptoms, discrimination of letters, classification and discrimination of signals, and analysis of images. In general, it is difficult to express in algorithm the classification processes because of the multiplicity of the input vectors (multidimensional characteristic values for expressing an input matter). The multidimensional nature of the inputs, referred to herein, means that there is a distribution range, which is not negligible, i.e. having no uniformity, even between input patterns belonging in identical classification caused by deformation and positional displacement in case of expressing a shape, and by noise and bias in case of signals.
It is generally thought that neural networks are available for and effective at solving such pattern classification problems as described above, for the following reasons. First, a neural network has the ability for self organization, by learning, of mapping (in concrete example, formation of cells and connections of respective cells) necessary for the classification of vectors: such as "such input vector should be classified in such manner". Second, the mapping thus formed has an ability of solving a non-linear separation problem encountered in various cases of pattern classification problems. Furthermore, the neural network has the ability of taking thereinto, as a prior knowledge, a characteristic quantity commonly possessed by the input pattern vectors which belong to the same class, as well as the distribution range of the characteristic quantity.
The neural network may be called a signal processing system simulating a nerve circuit network of a living organism. The neural network is basically composed of wiring connections performing signal transfer between the cells corresponding to neurons. The signal represents activity of a cell, in other words, the degree of excitation of a neuron, which is expressed with a numeric value. The numeric value is calculated as follows. A signal to be input into an aimed cell, the degree of activity of which is now required, is first obtained in accordance with the degree of activity of a cell connected to the aimed cell and a numerical value representing the binding or connection weight therebetween. Next, an output signal is obtained by a transfer function prescribing an output characteristic of the cell with respect to the first obtained input signal. The degree of activity of the aimed cell can thus be obtained.
Various models of neural networks have been proposed in accordance with signal propagating directions, learning algorithms and wiring connection methods between respective cells. Neural network research has been remarkably progressed by the provisions of a mathematical model called back propagation technology. In such mathematical model, the neural network is provided with a multilayer structure including input and output layers and a plurality of intermediate layers, and the respective cells are connected in a random manner. This structure is a stationary structure having no specific relation to the learning.
The binding intensity, i.e. connection weight, between the cells is only changed by the learning, and this is determined as follows. An input vector and a desired output vector, called a teacher signal are first prepared for the learning. Then, it is determined whether or not the neural network can realize, in the output layer, an activity pattern, i.e. output vector, according to the teacher signal with respect to the input vector.
In a case where the output vector is not in conformity with the teacher signal, an amount of error apart from a solution is transmitted to an output cell having an activity different from the teacher signal. With the cell having the activity larger than that shown in the teacher signal, the following two steps are carried out for degrading the activity, one being to lower the binding intensity between it and an intermediate cell transmitting a signal for activation and other being to lower the activity of this intermediate cell itself. In order to lower the activity of the intermediate layer, the error in the output layer is back propagated in a direction of the input cell to thereby subsequently amend the binding intensity to an afore-stage cell with respect to the input side. On the other hand, in case of the activity being lower than the teacher signal, the output error may be reduced by substantially the same steps as those described above. Such error correction method, in which the error correction is made from the output side towards the input side, is called the back propagation method.
The back propagation method or technique results in the so-called error minimizing problem which tries to adjust the coupling degree so as to minimize the square-error of the output signal and the teacher signal with respect to a pair of various input and teacher signals. Accordingly, when this back propagation technique is applied to the neural network having a complicated multilayer structure, a considerable amount of computing time is required for the learning to converge. Moreover, since the size of the neural network, and particularly, the size or scale of the intermediate layers, are made stationary with no relation to the content of the learning and since there is no theory for preliminary suitable determination, the determination should be done by trail and error repetitions. Furthermore, there is a limited ability for storing a pattern identification in the learning.
In order to obviate such defects of the back propagation technique as described above, there has been provided a neural model called "RCE Neural Network" (refer to D. L. Reilly, L. N. Cooper and C. Elbaum. "A Neural Model Category Learning", Biol. Cybern, vol. 45, pp. 35-41, 1982; or D. L. Reilly, L. N. Cooper and C. Elbaum. "Self Organizing Pattern Separator and Identifier," U.S. Pat. No. 4,326,259. Awarded Arp. 20, 1982). In this model, the neural network is provided with a three-layer structure including input, output and intermediate layers, in which adjacent layers are mutually coupled. The RCE neural network has a characteristic feature such that the intermediate layer cell is automatically produced by the learning, which is significantly different from the back propagation technique. Namely, each of the respective intermediate layer cells are activated only by a specific pattern of the input vector given to the cell, and the output layer cell is then activated for realizing, to the output layer, an output vector indicating a correct attribute of an input as a response. Such a point as that the intermediate layer can be self organized by the learning is a remarkable advantage for the RCE neural network.
In the RCE neural network, a signal propagation is performed in the following manner.
First, when an input vector I.sub.i (i=1, M) is given to an input layer composed of M cells (M: natural number), the input vector I.sub.i transferred to each of the intermediate layer cells. Each of the intermediate layer cells stores a M-dimensional vector W.sub.i called coupling vector, a threshold .xi., and an attribute vector P.sub.j (j-1, N) (N: number of output layer cells) representing an attribute value in a case where the vector W.sub.i is deemed to be an input vector, wherein each of the intermediate layer cells is activated in a case where a distance between the input vector I and the self coupling vector W is smaller than the threshold .xi.. In accordance with this result, and activation signal is outputted so as to realize a pattern of the attribute vector P.sub.j, (j=1, N) on the output layer. Namely each intermediate layer cell discriminates the fact as to whether or not the input vector I has the same attribute value as that of the coupling vector W within the allowable range of .xi. with a point indicated by the stored coupling vector W being the center thereof.
The above principle will be described hereunder with reference to FIG. 4, showing an example in which attributes of hatched areas 10 and 20 are classified into .alpha. and .beta., respectively. The areas 10 and 20 may be covered by the plurality of intermediate layer cells to correctly discriminate the classification by means of the neural network. In FIG. 4, area encircled by a plurality of circles represented respectively activation areas of the respective intermediate layer cells. The attribute of the input signal is discriminated to be .alpha. by the activation of one of five intermediate layers .alpha.1 to .alpha.5 and the attribute of the input signal is discriminated to be .beta. by the activation of one of the four intermediate layers .beta.1 to .beta.4.
The learning will be carried out by two methods, one being of the production of the intermediate layer cells and the other being of the change of the threshold. Basically, a pair consisting of a teacher signal and an input signal to be learned is indicated, and the learning is controlled by a deflection, i.e. error signal, between the output signal of the network and the teacher signal with respect to the input signal.
The production of the intermediate layer cell will occur in a case such as follows. When there is no response of the network to a certain input signal, an intermediate layer cell necessary for the discrimination of the attribute of the input vector of this input signal is absent and a new intermediate layer cell has to be produced in such a case. The input vector is stored in the newly produced intermediate layer cell as the coupling vector W, and the teacher signal is also stored therein as the attribute vector P. In accordance with the threshold .xi., an appropriate initial value is set.
In a case where the response to the input signal is erroneously made, the reason may reside in the activation of the intermediate layer cell which is not required to be activated, and accordingly, such error in the response will be eliminated by suppressing the activation of this erroneously activated intermediate layer cell. This will be carried out by making small the threshold .xi. of that intermediate layer cell. Namely, by making the small threshold, the activation area is reduced so that the input vector indicates an external portion of the area.
The characteristic features of the RCE neural network described hereinbefore will be summarized as follows.
(1) Since the intermediate layer can be self organized, there is no limit for an amount of memory.
(2) Since each of the memories is stored in each of the intermediate layer cells, the mutual interference between the respective memories is within the range of the threshold of the intermediate layer cell. Accordingly, only some of the intermediate layer cells are affected by the learning, so that any blocking against correct memory in accordance with the increase of an amount of learning is substantially reduced.
(3) Since a new mapping is stored by the production of the intermediate layer cell, the learning is finished in a short time period in comparison with the error back propagation technique.
As described above, the RCE neural network is effective for pattern classification, but provides the following problems.
The learning of the RCE neural network is, however, based on the fact that all of the examples are stored by the plurality of intermediate layer cells in a manner that a pair of an indicated input signal and the teacher signal is stored in one intermediate layer cell corresponding to one example. Accordingly, when the number of examples to be learned is increased, it is necessary to increase the number of the intermediate layer cells in proportion to the increased examples. Namely, the increasing of the dimensions of the input vectors necessarily results in the increasing of the learning examples of objects, and therefore, a numerous number of the intermediate layer cells has to be produced during the learning process in order to realize the neural network having an acceptable pattern identifying ability.
However, in the actual classification, there is a case in which the attribute of the input vector can be discriminated only by searching a partial dimensional space of a multidimensional input vector space. This case will be explained with reference to FIG. 1, for example, which represents the pattern classification in the three-dimensional space.
Referring to FIG. 1, it is supposed that the neural network is learned for the classification so that a point in an area .alpha. has its attribute of .alpha. and a point in a cylindrical area .beta. perpendicular to an X-Y plane has its attribute of .beta.. The learning of the area .alpha. to the RCE neural network is achieved by covering the entire area .alpha. with several three-dimensional globes each having an appropriate radius. In this meaning, FIG. 1 shows an example of a mapping representing the attribute of the area .alpha. realized by eight globes, i.e. eight intermediate layer cells. In FIG. 1, the radius of the globe is equal to the threshold stored in the intermediate layer cell.
Regarding the area .beta., since this area extends indefinitely in the Z-axis direction, it is impossible to entirely cover the area with a finite number of intermediate layer cells. However, the attribute of the input vector can be discriminated by the presence or absence of the projection of the input vector on the X-Y plane, I', in an area .gamma. which is the projection of the area .beta. on the X-Y plane. Since the area .gamma. can be substantially entirely covered with finite number of circles, a projected image necessary for the discrimination of the definite area .beta. can be prepared by the finite number of intermediate layer cells by shifting the area to such partial dimensional space. With such problem, in the conventional technique, a separate network specific for the partial dimensional space had to be prepared for the discrimination of the attribute .beta..
Inclusive of such transfer image processing in the partial dimensional space, in the conventional RCE neural network technology, it was impossible to process the pattern classification with only one neural network.