1. Field of the Invention
The present invention relates to a pattern recognition method for categorizing input data, such as speech input data or image input data.
2. Description of the Related Art
The back propagation type neural network (hereinafter referred to simply as a neural network), as proposed by Rumelhart, et al., has been considered in connection with pattern recognition apparatus which categorize input data into one of plural categories. The construction of a neural network for determining the category will now be explained with reference to FIG. 3.
In FIG. 3, reference numeral 301 denotes elements U.sup.I.sub.1, U.sup.I.sub.2 . . . etc. of an input layer. The elements of this layer send a received input value to all the elements of the next layer 302 Reference numeral 302 denotes elements U.sup.S.sub.1, U.sup.S.sub.2 . . . etc. of an intermediate layer. These elements send an output of ##EQU1## with respect to an input x.sub.i (i=1, 2, . . . ) from the elements of the preceding layer 301 to all the elements of the output layer 303, where W.sub.i is called a weight coefficient, and .theta. is a bias, both of which are element inherent values. f is called a transfer function, and usually is a sigmoid function f (s)=1/(1+exp(-s)). Reference numeral 303 denotes elements U.sup.0.sub.1, U.sup.0.sub.2 . . . etc. of an output layers. These elements output ##EQU2## with respect to an input 302 y.sub.j (j=1, 2, . . . ) from the element of the intermediate layer in the same manner as above. The number of the elements of the output layer 303 is the same as the number of categories to be recognized. If weight coefficients and biases of all elements are ideally determined, only the output from the element which corresponds to the category to which the input belongs becomes 1, and outputs from the other elements become 0.
While FIG. 3 shows only a single intermediate layer, plural intermediate layers may be used.
A learning algorithm for determining the weight coefficient of each element and the value of the bias is described in detail in "Parallel Distributed Processing, Vol. 1" (Rumelhart et al., MIT Press, 1986).
Next, an explanation will be given of the meaning of signal processing in such a neural network. For simplification, a neural network having a single intermediate layer will be considered. If the outputs from the input layer and those from the output layer are collectively denoted by vectors x and y, respectively, the following relation holds for them: EQU y=f (W.multidot.x-.theta.) (1)
In the above equation, f represents a vector-valued function in which each element of the vector is put into a sigmoid function. W.multidot.x-.theta. obviously represents an affine transformation. It can be construed that after an affine transformation is performed from the input layer to the intermediate layer, a transformation such that the value is limited to an interval [0, 1] is performed by using a sigmoid function.
FIG. 2 is a view which illustrates an example of the distribution of an input vector. For simplification, let the input vector x be a two-dimensional vector, and let object categories be three types C.sub.1, C.sub.2 and C.sub.3. It is considered that when learning in accordance with a back propagation algorithm is completed, .theta. at that time converges on a vector close to an average of all learning data as shown by reference numeral 201 in FIG. 2.
The outputs from the intermediate layer and those from the output layer are represented by the following in the same manner as in equation (1): EQU z=f (W.multidot.y-.theta.) (2)
This should be construed in another way. That is, in equation (2), W.multidot.y-.theta. can be regarded as a set of linear discriminant functions. Those obtained by putting each of the outputs into the sigmoid function become a final output Z.
An amount of correction .DELTA.Wij for the weight coefficient Wij is calculated by the following equation in the back propagation algorithm: ##EQU3## where .eta. is a positive constant, and E is an error of the entire network. That is, ##EQU4## where tj is a supervisor signal, and yj is an output from the output layer. ##EQU5## is calculated by using the output from each layer. A derivation thereof is known and is therefore omitted.
Wij is sequentially corrected by using the amount of this correction .DELTA.Wij: EQU Wij (n+1)=Wij (n)+.DELTA.Wij
Therefore, it is necessary that weight coefficients be corrected (this step is called learning) starting with a good initial value Wij(0). A correction similar to the above is performed for bias .theta. by regarding it as a weight coefficient for input data, the value of which is 1 at all times.
The back propagation algorithm depends upon the manner in which the initial values are given because the weight coefficient W and the bias .theta. are corrected sequentially. That is, if the initial values are not appropriately given, a solution may fall within a local minimum value, or diverge/oscillate. At the same time, the greater the number of parameters to be determined, such as W or .theta., the greater the degree of freedom, making it difficult to arrive at a correct solution. As explained above, since both W and .theta. are important, it is difficult to omit them.