1. Field of the Invention
The present invention relates to a feature extracting device for calculating features in pattern recognition such as image recognition, and more particularly to a feature extracting device having a pattern learning function such as a neural network.
2. Description of the Related Art
As a method of deciding features for use in pattern recognition from a group of learning patterns, a method based on a discriminant analysis has been well known and widely used (for example, methods disclosed in Japanese Patent Publication Laid-Open (Kokai) No. Heisei 01-321591 have been known).
The discriminant analysis is a method of deciding features to be extracted so as to get the greatest difference between classes (categories) while restraining the variation of features within a class (variation within category) (for example, refer to xe2x80x9cMathematical Study on Feature Extraction in Pattern Recognitionxe2x80x9d written by Ootsu, Electro-Technical Laboratory Report No. 818, 1981), which method is characterized by high isolation ability among classes compared with the other feature deciding method such as a principal component analysis.
Brief description about the discriminant analysis will be made here. Assume that a group of learning patterns is given and that classes these patterns belong to are given.
In the discriminant analysis, covariance matrix within class Sw and covariance matrix between classes Sb are required from these learning patterns then to solve the characteristic equation Swxe2x88x921Sbxc2x7fi=xcexixc2x7fi.
A predetermined number, M of characteristic vectors fi is selected from thus required characteristic vectors in decreasing order of the characteristic value xcexi.
The feature extraction is performed by calculating the inner product Zi=(fi,X), (i=1 to M) from an objective input pattern X, using these characteristic vectors, and the characteristic Zi is extracted.
According to the above discriminant analysis, linear feature extraction in which variation within class is small and difference between classes is large can be achieved, as is well known.
On the other hand, as a learning method of input/output relationship of a pattern using a group of learning patterns consisting of each pair of an input pattern and an output pattern, an error-back propagation learning (back propagation) using a multi-layered perceptron neural network has been known and widely used (for example, refer to xe2x80x9cNeuro-Computerxe2x80x9d compiled under the supervisor of Nakano Hajime, Gijutsu-Hyoron Co., Ltd., 1989, and xe2x80x9cParallel Distributed Processingxe2x80x9d, written by D. E. Rumelhart, MIT Press, 1986).
FIG. 7 shows the structure of a three-layered perceptron neural network. In FIG. 7, an input pattern entered into an input layer is sequentially processed through an intermediate layer and an output layer, hence to calculate the output pattern.
In the error back propagation learning, each parameter (connection weight) of each layer of a neural network is updated so to conform the output pattern to a desired output pattern as a learning pattern as well as possible.
The above point will be described in detail.
In FIG. 7, an output Hj of a unit j of intermediate layer is calculated from an input pattern Ii, using a connection weight Wji and a threshold xcex8j, by the following expression.             H      j        =          f      ⁡              (                  U          j                )              ,      xe2x80x83    ⁢            U      j        =                            ∑          i                ⁢                  xe2x80x83                ⁢                              W            ji                    ·                      I            i                              +              θ        j              ,            f      ⁡              (        x        )              =          1      /              {                  1          +                      exp            ⁡                          (                                                -                  2                                ⁢                                  x                  /                                      u                    0                                                              )                                      }              ,
The symbol f(x) is a function called a sigmoid function.
The symbol u0 is a predetermined parameter.
An output Ok of a unit of an output layer is calculated from the output Hj of an intermediate layer unit thus calculated, by the following expression.             O      k        =          f      ⁡              (                  S          k                )              ,      xe2x80x83    ⁢            S      k        =                            ∑          j                ⁢                  xe2x80x83                ⁢                              V            kj                    ·                      H            j                              +              γ        k              ,
(Vkj is the connection weight, and xcex3k is the threshold.)
At this time, assuming that the desired output pattern is Tk, learning will be performed by updating each parameter (such as connection weight) (generally represented as p) according to the gradient (xe2x88x92∂E/∂p) so as to reduce the error to be shown in the following expression.   E  =       less than                   ∑        k            ⁢              xe2x80x83            ⁢                        (                                    T              k                        -                          O              k                                )                2               greater than   
Here, the symbol  less than xc2x7 greater than  indicates the mean operation as for a learning pattern. As the result, an output of the neural network approaches a desired one.
The features obtained by the above-mentioned conventional discriminant analysis, however, are defectively fragile to variation of a pattern, because of being linear features.
Although the discriminant analysis is, of course, a feature selecting method of reducing the variation of features within a class according to a pattern variation (compared with the variation between classes), naturally it cannot absorb variations such as deviation, rotation, scaling of a pattern, because the obtained features are linear.
While, since a multi-layered perceptron neural network could learn the non-linear input/output relationship, it could be tough against the above-mentioned pattern variation in principle. However, in order to make a network learn so as to absorb the pattern variation and to do the pattern recognition, extravagant learning is actually required, which is not practical.
Therefore, a method of restraining an influence of a pattern variation by pre-processing such as size normalization and alignment of an input pattern, or a method of previously extracting a feature amount decided in an experimental way and doing multi-layered perceptron learning using this feature amount as a new input, is adopted.
Namely, a multi-layered perceptron neural network also has a problem of being fragile to a pattern variation actually.
A first object of the present invention is to provide a feature extracting device suitable for pattern recognition, tough against a pattern variation, in order to solve the above conventional problem.
A second object of the present invention is to provide a feature extracting device tough against a pattern variation, with no need of extravagant learning.
According to the first aspect of the invention, a feature extracting device comprises
feature vector calculating means for projecting a learning pattern to be recognized on a subspace group, so to calculate squares of projection length on each subspace as feature vectors, and
subspace basis vector learning means including at least parameter updating means for updating basic vectors of each subspace forming the subspace group, so as to increase the ratio of variation between classes to variation within a class, as for each component of the feature vectors.
In the preferred construction, the feature vector calculating means normalizes the learning pattern, hence to project the same on the subspace group, and calculates squares of projection length on each subspace, or quantity derived from there, as feature vectors.
In another preferred construction, the subspace basis vector learning means
includes calibrating means for calibrating the feature vectors by performing restraint processing among features based on a restraint parameter predetermined as for the calculated feature vectors.
In another preferred construction, the feature vector calculating means
normalizes the learning pattern, hence to project the same on the subspace group, and calculates squares of projection length on each subspace, or quantity derived from there, as feature vectors, and
the subspace basis vector learning means
includes calibrating means for calibrating the feature vectors by performing restraint processing among features based on a restraint parameter predetermined as for the calculated feature vectors.
In another preferred construction, the parameter updating means performs normalized orthogonalization on the basis vectors obtained by update processing, according to the Gram-Schmid orthogonalization.
In another preferred construction, the feature vector calculating means
normalizes the learning pattern, hence to project the same on the subspace group, and calculates squares of projection length on each subspace, or quantity derived from there, as feature vectors, and
the parameter updating means
performs normalized orthogonalization on the basis vectors obtained by update processing, according to the Gram-Schmid orthogonalization.
In another preferred construction, the feature vector calculating means
normalizes the learning pattern, hence to project the same on the subspace group, and calculates squares of projection length on each subspace, or quantity derived from there, as feature vectors,
the subspace basis vector learning means
includes calibrating means for calibrating the feature vectors by performing restraint processing among features based on a restraint parameter predetermined as for the calculated feature vectors, and
the parameter updating means
performs normalized orthogonalization on the basis vectors obtained by update processing, according to the Gram-Schmid orthogonalization.
In another preferred construction, the feature vector calculating means
normalizes the learning pattern, hence to project the same on the subspace group, and calculates squares of generalized projection length on each subspace as feature vectors.
In another preferred construction, the subspace basis vector learning unit
performs update processing of the basis vectors for increasing the ratio of the variation between classes to the variation within a class as for the feature vectors, by updating the basis vectors so as to make the respective components of the feature vectors have no correlation to each other or make the same independent, and simultaneously so as to increase the ratio of the variation between classes to the variation within a class as for the respective components of the feature vectors.
According to the second aspect of the invention, a feature extracting device for deciding features, using data set, as learning data, consisting of input patterns, class names C the patterns belong to, and a series of subclass names Cm (m=1 to n, where n is an integer 1 and more, assuming that the subclasses are hierarchically classified finer according as m becomes larger), the device formed by
(n+1) stages of feature extraction layers,
the first stage of feature extracting layer comprising
first feature vector calculating means for projecting an input learning pattern, after having been normalized, on a first subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as first feature vectors, and
first subspace basis vector learning means including at least first parameter updating means for updating basis vectors of each subspace forming the first subspace group, so as to increase the ratio of variation between subclasses to variation within a subclass as for the n-th subclass, namely the most segmented subclass, with respect to the first feature vectors,
the k-th (k=2 to n) stage of feature extraction layer comprising
k-th feature vector calculating means for projecting the (kxe2x88x921)-th feature vectors calculated in the (kxe2x88x921)-th stage of feature extraction layer, after having been normalized, on the k-th subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as the k-th feature vectors, and
k-th subspace basis vector learning means including at least k-th parameter updating means for updating basis vectors of each subspace forming the k-th subspace group, so as to increase the ratio (variation between subclasses/variation within a subclass) as for the (n+1-k)-th subclass, with respect to the k-th feature vectors.
In the preferred construction, the (n+1)-th stage of feature extraction layer comprises
(n+1)-th feature vector calculating means for projecting the n-th feature vectors calculated in the n-th stage of feature extraction layer, after having been normalized, on the (n+1)-th subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as the k-th feature vectors, and
(n+1)-th subspace basis vector learning means including at least (n+1)-th parameter updating means for updating basis vectors of each subspace forming the (n+1)-th subspace group, so as to increase the ratio of variation between classes to variation within a class as for the final feature vectors.
In another preferred construction, the subspace basis vector learning means of the respective feature extraction layers
include calibrating means for calibrating the feature vectors by performing restraint processing among features based on a restraint parameter predetermined as for the calculated feature vectors.
In another preferred construction, the (n+1)-th stage of feature extraction layer comprises
(n+1)-th feature vector calculating means for projecting the n-th feature vectors calculated in the n-th stage of feature extraction layer, after having been normalized, on the (n+1)-th subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as the k-th feature vectors, and
(n+1)-th subspace basis vector learning means including at least (n+1)-th parameter updating means for updating basis vectors of each subspace forming the (n+1)-th subspace group, so as to increase the ratio of variation between classes to variation within a class as for the final feature vectors, and
the subspace basis vector learning means of the respective feature extraction layers
include calibrating means for calibrating the feature vectors by performing restraint processing among features based on a restraint parameter predetermined as for the calculated feature vectors.
In another preferred construction, the parameter updating means
performs normalized orthogonalization on the basis vectors obtained by update processing, according to the Gram-Schmid orthogonalization.
In another preferred construction, the subspace basis vector learning means of the respective feature extraction layers
include calibrating means for calibrating the feature vectors by performing restraint processing among features based on a restraint parameter predetermined as for the calculated feature vectors, and
the parameter updating means
performs normalized orthogonalization on the basis vectors obtained by update processing, according to the Gram-Schmid orthogonalization.
In another preferred construction, the (n+1)-th stage of feature extraction layer comprises
(n+1)-th feature vector calculating means for projecting the n-th feature vectors calculated in the n-th stage of feature extraction layer, after having been normalized, on the (n+1)-th subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as the k-th feature vectors, and
(n+1)-th subspace basis vector learning means including at least (n+1)-th parameter updating means for updating basis vectors of each subspace forming the (n+1)-th subspace group, so as to increase the ratio of variation between classes to variation within a class as for the final feature vectors,
the subspace basis vector learning means of the respective feature extraction layers
include calibrating means for calibrating the feature vectors by performing restraint processing among features based on a restraint parameter predetermined as for the calculated feature vectors, and
the parameter updating means
performs normalized orthogonalization on the basis vectors obtained by update processing, according to the Gram-Schmid orthogonalization.
In another preferred construction, the feature vector calculating means of each feature extraction layer normalizes an input to the corresponding layer, projects the same on a subspace group, and calculates squares of projection length on each subspace, or quantity derived from there as feature vectors, and
the parameter updating means of each feature extraction layer updates normalized orthogonal basis vectors of each subspace forming the subspace group, so as to increase the ratio of variation between subclasses to variation within a subclass, or the ratio of variation between classes to variation within a class, as for the calculated feature vectors.
In another preferred construction, the hierarchical subspace basis vector learning unit
performs update processing of the basis vectors, or the normalized orthogonal basis vectors increasing the ratio of the variation between classes to the variation within a class, or the ratio of the variation between subclasses to the variation within a subclass, as for the feature vectors, by updating the basis vectors, or the normalized orthogonal basis vectors so as to make the respective components of the feature vectors have no correlation to each other or make the same independent, and simultaneously so as to increase the ratio of the variation between classes to the variation within a class, or the ratio of the variation between subclasses to the variation within a subclass, as for the respective components of the feature vectors.
In another preferred construction, the (n+1)-th stage of feature extraction layer comprises
(n+1)-th feature vector calculating means for projecting the n-th feature vectors calculated in the n-th stage of feature extraction layer, after having been normalized, on the (n+1)-th subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as the k-th feature vectors, and
(n+1)-th subspace basis vector learning means including at least (n+1)-th parameter updating means for updating basis vectors of each subspace forming the (n+1)-th subspace group, so as to increase the ratio of variation between classes to variation within a class as for the final feature vectors, and
the hierarchical subspace basis vector learning unit
performs update processing of the basis vectors, or the normalized orthogonal basis vectors increasing the ratio of the variation between classes to the variation within a class, or the ratio of the variation between subclasses to the variation within a subclass, as for the feature vectors, by updating the basis vectors, or the normalized orthogonal basis vectors so as to make the respective components of the feature vectors have no correlation to each other or make the same independent, and simultaneously so as to increase the ratio of the variation between classes to the variation within a class, or the ratio of the variation between subclasses to the variation within a subclass, as for the respective components of the feature vectors.
According to the third aspect of the invention, a feature extracting device having n (n is an integer more than 1) stages of feature extraction layers and hierarchical subspace basis vector learning means for updating each parameter for describing operations of the respective feature extraction layers, in which
the first stage of feature extracting layer comprises
first feature vector calculating means for projecting an input pattern, after having been normalized, on a first subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as first feature vectors,
the k-th (k=2 to n) stage of feature extraction layer comprises
k-th feature vector calculating means for projecting the (kxe2x88x921)-th feature vectors calculated in the (kxe2x88x921)-th stage of feature extraction layer, after having been normalized, on the k-th subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as the k-th feature vectors, and
the hierarchical subspace basis vector learning means includes means for updating the basis vectors of each subspace forming the subspace group of the respective feature extraction layers, so as to increase the ratio of variation between classes to the variation within a class as for the n-th feature vectors that are the final feature vectors calculated in the n-th stage of feature extraction layer.
In the preferred construction, the hierarchical subspace basis vector learning means
updates the normalized orthogonal basis vectors of each subspace forming the subspace group of the respective feature extraction layers, so as to increase the ratio of the variation between classes to the variation within a class as for the final feature vectors.
In another preferred construction, the subspace basis vector learning means of the respective feature extraction layers
include calibrating means for calibrating the feature vectors by performing restraint processing among features based on a restraint parameter predetermined as for the calculated feature vectors.
In another preferred construction, the hierarchical subspace basis vector learning means
updates the normalized orthogonal basis vectors of each subspace forming the subspace group of the respective feature extraction layers, so as to increase the ratio of the variation between classes to the variation within a class as for the final feature vectors, and
the subspace basis vector learning means of the respective feature extraction layers
include calibrating means for calibrating the feature vectors by performing restraint processing among features based on a restraint parameter predetermined as for the calculated feature vectors.
In another preferred construction, the parameter updating means
performs normalized orthogonalization on the basis vectors obtained by update processing, according to the Gram-Schmid orthogonalization.
In another preferred construction, the hierarchical subspace basis vector learning means
updates the normalized orthogonal basis vectors of each subspace forming the subspace group of the respective feature extraction layers, so as to increase the ratio of the variation between classes to the variation within a class as for the final feature vectors, and
the parameter updating means
performs normalized orthogonalization on the basis vectors obtained by update processing, according to the Gram-Schmid orthogonalization.
In another preferred construction, the hierarchical subspace basis vector learning means
updates the normalized orthogonal basis vectors of each subspace forming the subspace group of the respective feature extraction layers, so as to increase the ratio of the variation between classes to the variation within a class as for the final feature vectors,
the subspace basis vector learning means of the respective feature extraction layers
include calibrating means for calibrating the feature vectors by performing restraint processing among features based on a restraint parameter predetermined as for the calculated feature vectors, and
the parameter updating means
performs normalized orthogonalization on the basis vectors obtained by update processing, according to the Gram-Schmid orthogonalization.
In another preferred construction, the hierarchical subspace basis vector learning unit
performs update processing of the basis vectors, or the normalized orthogonal basis vectors increasing the ratio of the variation between classes to the variation within a class, or the ratio of the variation between subclasses to the variation within a subclass, as for the feature vectors, by updating the basis vectors, or the normalized orthogonal basis vectors so as to make the respective components of the feature vectors have no correlation to each other or make the same independent, and simultaneously so as to increase the ratio of the variation between classes to the variation within a class, or the ratio of the variation between subclasses to the variation within a subclass, as for the respective components of the feature vectors.
In another preferred construction, the hierarchical subspace basis vector learning means
updates the normalized orthogonal basis vectors of each subspace forming the subspace group of the respective feature extraction layers so as to increase the ratio of the variation between classes to the variation within a class as for the final feature vectors, and
the hierarchical subspace basis vector learning unit
performs update processing of the basis vectors, or the normalized orthogonal basis vectors increasing the ratio of the variation between classes to the variation within a class, or the ratio of the variation between subclasses to the variation within a subclass, as for the feature vectors, by updating the basis vectors, or the normalized orthogonal basis vectors so as to make the respective components of the feature vectors have no correlation to each other or make the same independent, and simultaneously so as to increase the ratio of the variation between classes to the variation within a class, or the ratio of the variation between subclasses to the variation within a subclass, as for the respective components of the feature vectors.
In another preferred construction, the subspace basis vector learning means of the respective feature extraction layers
include calibrating means for calibrating the feature vectors by performing restraint processing among features based on a restraint parameter predetermined as for the calculated feature vectors, and
the hierarchical subspace basis vector learning unit
performs update processing of the basis vectors, or the normalized orthogonal basis vectors increasing the ratio of the variation between classes to the variation within a class, or the ratio of the variation between subclasses to the variation within a subclass, as for the feature vectors, by updating the basis vectors, or the normalized orthogonal basis vectors so as to make the respective components of the feature vectors have no correlation to each other or make the same independent, and simultaneously so as to increase the ratio of the variation between classes to the variation within a class, or the ratio of the variation between subclasses to the variation within a subclass, as for the respective components of the feature vectors.
In another preferred construction, the hierarchical subspace basis vector learning means
updates the normalized orthogonal basis vectors of each subspace forming the subspace group of the respective feature extraction layers, so as to increase the ratio of the variation between classes to the variation within a class as for the final feature vectors,
the subspace basis vector learning means of the respective feature extraction layers
include calibrating means for calibrating the feature vectors by performing restraint processing among features based on a restraint parameter predetermined as for the calculated feature vectors, and
the hierarchical subspace basis vector learning unit
performs update processing of the basis vectors, or the normalized orthogonal basis vectors increasing the ratio of the variation between classes to the variation within a class, or the ratio of the variation between subclasses to the variation within a subclass, as for the feature vectors, by updating the basis vectors, or the normalized orthogonal basis vectors so as to make the respective components of the feature vectors have no correlation to each other or make the same independent, and simultaneously so as to increase the ratio of the variation between classes to the variation within a class, or the ratio of the variation between subclasses to the variation within a subclass, as for the respective components of the feature vectors.
According to another aspect of the invention, a pattern learning device for learning relationship between input/output, using learning data set consisting of each pair of an input vector and a desired output vector corresponding to the input vector, comprising:
n stages (n is an integer 1 and more) of processing layers; and
parameter updating means for updating each parameter for describing operations of the respective processing layers,
the first stage of processing layer comprising first output calculating means for projecting an input vector, after having been normalized, on a first subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as first output vectors,
the k-th (k=2 to n) stage of processing layer, when n is 2 and more, comprising
k-th output calculating means for projecting the (kxe2x88x921)-th output vectors calculated in the (kxe2x88x921)-th stage of processing layer, after having been normalized, on the k-th subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as the k-th output vectors, and
the parameter updating means including a means for updating the basis vectors of each subspace of the respective processing layers, so as to decrease the average square error of the n-th output vectors calculated in the n-th stage of processing layer, that are the final output vectors, and desired output vectors corresponding to the input vector.
According to another aspect of the invention, a computer readable memory storing a feature extraction program for extracting features for pattern recognition, controlling a computer,
the feature extraction program comprising
a function of projecting a learning pattern to be recognized on a subspace group, so to calculate squares of projection length on each subspace as feature vectors, and
a function of updating basis vectors of each subspace forming the subspace group, so as to increase the ratio of variation between classes to variation within a class, as for each component of the feature vectors.
According to a further aspect of the invention, a computer readable memory storing a feature extraction program for deciding features, using data set, as learning data, consisting of input patterns, class names C the patterns belong to, and a series of subclass names Cm (m=1 to n, where n is an integer 1 and more, assuming that the subclasses are hierarchically classified finer according as m becomes larger),
the feature extraction program
formed by (n+1) stages of feature extraction layers,
the first stage of feature extracting layer comprising
a first feature vector calculating function for projecting an input learning pattern, after having been normalized, on a first subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as first feature vectors, and
a first subspace basis vector learning function including at least first parameter updating means for updating basis vectors of each subspace forming the first subspace group, so as to increase the ratio of variation between subclasses to variation within a subclass as for the n-th subclass, namely the most segmented subclass, with respect to the first feature vectors,
the k-th (k=2 to n) stage of feature extraction layer comprising
a k-th feature vector calculating function for projecting the (kxe2x88x921)-th feature vectors calculated in the (kxe2x88x921)-th stage of feature extraction layer, after having been normalized, on the k-th subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as the k-th feature vectors, and
a k-th subspace basis vector learning function including at least k-th parameter updating means for updating basis vectors of each subspace forming the k-th subspace group, so as to increase the ratio (variation between subclasses/variation within a subclass) as for the (n+1-k)-th subclass, with respect to the k-th feature vectors.
According to a further aspect of the invention, a computer readable memory storing a feature extraction program for realizing n (n is an integer more than 1) stages of feature extraction layers and a hierarchical subspace basis vector learning function for updating each parameter for describing operations of the respective feature extraction layers,
the feature extraction program including,
in a first stage of feature extracting layer,
a first feature vector calculating function for projecting an input pattern, after having been normalized, on a first subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as first feature vectors, and
in the k-th (k=2 to n) stage of feature extraction layer,
a k-th feature vector calculating function for projecting the (kxe2x88x921)-th feature vectors calculated in the (kxe2x88x921)-th stage of feature extraction layer, after having been normalized, on the k-th subspace group, and calculating squares of generalized projection length on each subspace, or quantity derived from there as the k-th feature vectors, and
the hierarchical subspace basis vector learning function updating the basis vectors of each subspace forming the subspace groups of the respective feature extraction layers, so as to increase the ratio of the variation between classes to the variation within a class as for the n-th feature vectors that are the final feature vectors calculated in the n-th stage of feature extraction layer.
Other objects, features and advantages of the present invention will become clear from the detailed description given herebelow.