1. Field of the Invention
The present invention relates to feature extraction, and more particularly to a feature extraction system which statistically analyzes a set of samples of feature vectors to calculate a feature or characteristic serving as an index at a pattern identification, with this system allowing more accurate identification as compared with a prior system. In addition, this invention relates to a face image recognition system useful for identification systems using human faces, the certification of licenses and passports, the user identification for man-machine interface and security, and the information compression for low bit rate picture communications, and more particularly to a face image recognition system which narrows the candidates for a plausible model matching with an input image down through a first-stage model selection (first model selection means) and subsequently conducts a final recognition among the selected candidates through the use of a feature extraction different from the first stage in a second-stage selection.
2. Description of the Prior Art
The pattern identification or recognition is applicable to various fields such as a face identification or recognition available for security systems. For the pattern identification, a method based upon the KL (Karhunen-Loeve) expansion has well been known as a way of extracting a feature from a set of samples of feature vectors. This method has the ability to attain, as a feature serving as an index for the pattern recognition, an orthonormal base (eigenvector or proper vector) determined on the basis of the dispersion or variance of a set of samples of feature vectors and a value (eigenvalue) representative of the dispersion of the feature vectors in each basic direction. A description will be taken hereinbelow of the basic property of the KL expansion.
The KL expansion signifies an optimal orthogonal development under the minimum mean squared error standard. More specifically, when a subspace U(M) is produced by selecting M (xe2x89xa6N) in the order of decreasing eigenvalue out of the orthonormal bases obtainable on the basis of a set of samples of N-dimensional feature vectors through the KL expansion, the subspace U(M) assumes an M-dimensional partial vector space which minimizes the mean square error from the feature vectors. In addition, the subspace U(M) is also characterised as an M-dimensional subspace which maximizes the expectation of the square distance between the feature vectors projected. These contents have been described in detail by xe2x80x9cStatistical Pattern Recognition (Second Edition) Academic Press 1990)xe2x80x9d Chapter 9, written by Keinosuke Fukunaga.
Furthermore, for the identification being actually made using the feature obtained from the result of the KL expansion, the feature vector closest to the input feature vector is selected from the standard data through the use of the following distance. That is, the feature vector is orthogonally projected on the aforesaid subspace U(M) to measure the distance on the subspace U(M). As the distance to be used at that time there is the Euclidean distance defined in according to the following equation or formula (1):                                           d            ⁡                          (                              x                ,                y                            )                                =                                                                                          ∑                    i                                    ⁢                                                            (                                                                        x                          i                                                -                                                  y                          i                                                                    )                                        2                                                              ⁢                              xe2x80x83                            ⁢              where              ⁢                              xe2x80x83                            ⁢              x                        =                          (                              x                i                            )                                      ,                  y          =                      (                          y              i                        )                                              (        1        )            
or the distance expressed by the following equation (2):                               math          ⁢                      xe2x80x83                    ⁢                      (                          x              ,              y                        )                          =                                            ∑              i                        ⁢                                                            (                                                            (                                              x                        ,                                                  e                          i                                                                    )                                        -                                          (                                              y                        ,                                                  e                          i                                                                    )                                                        )                                2                                                              λ                  i                                +                                  σ                  2                                                                                        (        2        )            
where xcexi represents an eigenvalue corresponding to an eigenvector ei, and "sgr"2 designates a bias.
The distance expressed by the aforesaid equation (2) coincides with the maharanobis distance when the bias "sgr" assumes zero.
As a patented feature extraction technique using the KL expansion, there has been known xe2x80x9cFace Recognition Systemxe2x80x9d disclosed in U.S. Pat. No. 5,164,992. In this patent, the two components as shown in FIG. 3 extracts a feature for the pattern identification. One of the components, designated at numeral 201, serves as a face image acquisition input section 201, while the other, denoted at numeral 202, acts as a KL expansion execution section 202. The face image acquisition input section 201 obtains a face image as an assembly of density values at every pixel (n-dimensional feature vector, where n depicts the total number of pixels of the image) and conveys it to the KL expansion execution section 202. The KL expansion execution section 202 calculates a orthonormal base (basis) through the KL expansion on the basis of N face images coming from the face image acquisition input section 201 and subsequently selects M (xe2x89xa6n, N) bases in the order of decreasing eigenvalue out of the calculation results to retain the eigenvalues and the coordinates of the vectors corresponding thereto.
However, although the KL expansion is an optimal orthogonal development in view of reducing the minimum mean squared error or enhancing the scattering property of data, it is a problem in the case of using for the purpose of extracting a feature for the pattern identification. More specifically, the most important point for the patter identification is how to accurately identify the confusing feature vectors close in distance to each other, and for this purpose, it is preferable to promote only the scattering property between data close in distance to each other without enhancing the mutual scattering properties of all data. For this reason, with a prior system which carries out the feature extraction directly using the KL expansion, a problem exists in that difficulty is experienced in the identification of the confusing data.
On the other hand, for a face image recognition system, image data (for example, a variable-density image with M pixels in the vertical directions and N pixels in the horizontal directions) is completely expressible with the Mxc2x7N-dimensional vectors in such a manner that each pixel is treated as one independent coordinate axis and the coordinate value is expressed as the density value of the pixel (for example, 10000 dimensions when 100xc3x97100). Accordingly, if L (L greater than 10000) input images are linearly independent to each other, a 10000-dimensional space becomes necessary in order to express this L-image information. However, in terms of human faces, through the recent studies, there has been known the fact that almost faces were expressible with an extremely small dimensional space. This is because as compared with general images the human faces considerably resemble each other (they have eyes, a nose, mouth and others in common and these components have a similar positional relation to each other). A detailed discussion about these matters has been made, for example, in the document xe2x80x9cApplication of the Karhunen-Loeve procedure for the characterization of human facesxe2x80x9d (IEEE Trans. on Pattern Analysis and Machine Intelligence. vol 12, no. 1 1990), written by M. Lirby and L. Sirovich.
The KL expansion system, well-known as a recognition method of face images, takes notice of this nature of the general face images to extract a feature from the face image through the KL expansion for recognition. A detailed description is taken in the document xe2x80x9cFace Recognition Using Eigenfacesxe2x80x9d CVPR ""91 (PROc. IEEE Conf. on Computer Vision and Pattern Recognition 1991), written by Matthew A. Turk and Alex P. Pentland. Although slightly different in input and output, the face image recognition can commonly be defined as a method which previously registers face image being models in a data base to recognize the face image of a person on the model data most similar to an input image. From this point of view, the KL expansion method is designed to approximate the an input face image I and a model image M through the linear combination of P eigenvectors Ei (i=1 . . . P) as shown by the following equation (3) taking the above description into consideration to perform the collation between the approximate data.                               M          ^                =                              ∑                          i              =              1                        r                    ⁢                                    ⟨                              M                ,                                  E                  i                                            ⟩                        ⁢                          E              i                                                          (        3        )            
where {circumflex over (M)} represents an approximate value to a model face M and  less than M, Ei greater than  designates an inner product of vectors M and Ei.
The KL method uses, as the basic vectors, the eigenvectors corresponding to P (for example, approximately 100) eigenvalues selected out of the eigenvalues of a covariance matrix obtainable from W teaching face image data in the order of decreasing magnitude. The use of this expression can reduce the expressive space of an image from the Mxc2x7N (=10000) dimensions up to the approximately P (=100) dimensions while minimizing the loss of the image information contents, and hence it has hitherto been known as an effective technique for the image compression or the like, and is referred to as the KL expansion. According to this KL expansion, as a result of the extraction of a feature other than the nature on the information compression, the teaching data projected are separable most nicely, that is, distinguishable, in a space (which is referred to as an eigenface space) formed by the eigenvectors. In fact, according to the aforesaid document written by Matthew A. Turk and Alex P. Pentland, the person set of the teaching data coincides with the person set of the registered data, and a considerably excellent recognition result is attainable from the experiment in which the face of a person used for the teaching data is newly photographed to be used as an input image.
FIG. 4 is an illustration of an arrangement of a prior face image recognition system disclosed in the aforesaid document written by Matthew A. Turk and Alex P. Pentland, and referring to FIG. 4 a description will be made hereinbelow of an operation of this face image recognition system. In FIG. 4, a KL expansion unit 21 also serves as a first feature extraction means and a second feature extraction means, and implements the feature extraction from each model image existing in a model image memory 22 and stores a feature vector in a model feature vector memory 23, with these being done in the off-line processing. In the on-line, an input image, taken through a camera and retained in an object image memory 24, is transferred to the KL expansion unit 21 so that a feature vector is extracted as in the case of the model face image. Secondly, a model selection unit 25 finds out a model having a feature vector most similar to the feature vector of this input object face by checking with the contents of the model feature vector memory 23. More specifically, through the use of the basic vectors (as described before, the KL method uses, as the basic vectors, the eigenvectors corresponding to P eigenvalues (for example, approximately 100 in number) selected out of the eigenvalues of the covariance matrix obtainable from W teaching face image data in the order of decreasing magnitude) stored in an eigenspace memory (not shown), each model face vector and the input object face vector are projected onto an eigenspace (the extraction of a feature) to attain a coordinate in the eigenspace to evaluate the similarity, thus providing the output on the face image recognition. At the time, the following equation (4) is employed for the evaluation of the similarity to detect the vectors which minimizes the result of the equation (4).                               ∑          l                ⁢                  (                                    m              l                        -                          i              l                                )                                    (        4        )            
where ml and il represent the lth values of the feature vector and the object face feature vector.
There is a problem which arises with the prior method proposed in the aforesaid document written by Matthew A. Turk and Alex P. Pentland, however, in that the realizable system scale is small for the following reason. That is, when the KL expansion exhibits its advantages to the utmost limit in the manner of equalizing the person of the teaching data with the person of the model data, even if the number of teaching data increases, the number of effective eigenvectors obtainable undergoes restriction for the reason as mentioned above, and hence, as the number of model data (registered persons) increases, the distribution of the feature vectors in the eigenface space gets to be tight to cause the recognition ability to lower. Therefore, the number of persons registerable (or recognizable) is exposed to limitation, which makes it difficult to realize a large-scale system.
The present invention has been developed in order to eliminate this problem, and it is therefore an object of the present invention to provide a feature extraction system which is capable of performing the identification of confusing data with a higher robustness than that of the prior art.
Another object of this invention is to provide a face image recognition system which is capable of realizing a large-scale system.
For the first-mentioned purpose, a feature extraction system according to the present invention comprises neighborhood vector selection means for selecting a neighborhood vector to a feature vector and feature vector space production means for discovering a subspace where a local scattering property becomes at a maximum at the orthogonal projection of the feature vector thereto. Thus, the feature extraction system according to this invention allows finding out a subspace which enhances only the scattering property between data close in distance to each other, and hence the identification of the confusing data becomes possible with a higher robustness than that of the prior art in a manner of conducting the identification after the projection of the feature vector to the subspace.
According to one aspect of the present invention, a feature extraction system, statistically analyzing a set of samples of feature vectors to calculate an amount indicative of a feature which will play an important role when pattern-identification is performed, comprises storage means for storing a feature vector inputted through a input means and neighborhood vector selection means for selecting a neighborhood vector, being a feature vector close to the feature vector inputted and stored in the storage means, from feature vectors existing in the storage means. Also included therein is feature vector space production means for outputting a partial vector space in which feature vectors locally most disperse when said feature vectors are orthogonally projected thereinto.
With this arrangement, unlike the prior art where the orthogonal expansion is made to enhance the scattering property of the feature vectors as a whole, according to this invention the orthogonal expansion is made to enhance the scattering property of the data close to each other to produce a subspace, thus allowing the identification of the confusing data with a higher robustness than that of the prior art.
Furthermore, when the feature vectors stored in the storage means are taken to be Vi (1xe2x89xa6ixe2x89xa6N) and the dimension of these feature vectors is taken as n, the aforesaid neighbor vector selection means selects M (xe2x89xa6N) feature vectors close to these feature vectors through the use of a distance d(x, y) given from the above-mentioned equation (1), and the feature vector space production means obtains eigenvectors by solving an eigenvalue problem of a local covariance matrix given from an equation (8) which will be written later and selects m (xe2x89xa6N, n) eigenvectors from among the obtained eigenvectors in the order of decreasing corresponding eigenvalue to output them as bases of a partial vector space which maximizes a local scattering property. Thus, the base of the vector space which maximizes the local scattering property can be expressed by the eigenvalue of the local covariance matrix given from the equation (8).
Still further, the neighborhood vector selection means selects the M feature vectors close to the respective feature vectors through the use of a correlation value determined according to an equation (5) which will be shown later.
Moreover, the neighborhood vector selection means first solves an eigenvalue problem of a covariance matrix obtained according to an equation (6) which will be described later and selects one eigenvector ei (1xe2x89xa6ixe2x89xa61) corresponding to a large eigenvalue, and subsequently said neighborhood vector selection means calculates a distance according to the following equation which will be written later. In addition, the neighborhood vector selection means solves an eigenvalue problem of a covariance matrix obtained according to the equation (6) and selects one eigenvector ei (1xe2x89xa6ixe2x89xa61) corresponding to a large eigenvalue, and subsequently said neighborhood vector selection means calculates a distance according to the equation (2). Further, the feature vector space production means outputs, in addition to said m eigenvectors, eigenvalues corresponding to the m eigenvectors. Accordingly, as the distance for the identification, the distance determined by the equation (2) can be calculated on the basis of these eigenvalues. Moreover, the feature vector space production means outputs, in addition to said m eigenvectors, dispersion values determined according to an equation (9), which will be written later, corresponding to these eigenvectors.
In addition, in accordance with the present invention, a face image recognition system has an arrangement to gradually narrow down a set of model faces to be collated with an input face. That is, even if the scale of a data base increases, when the set of model faces to be collated at every stage are restrictable without failing to give a correct answer, the problem resembles that in the case of a small-scale data base, and hence the large-scale system hardly has a great influence on the recognition rate.
For achieving this arrangement, a face image recognition system according to this invention includes first and second model limitation means for deriving a set of models each having a feature vector satisfying a given condition on the variation of the extracted feature vector from each model face image, model eigenspace holding means for calculating the maximum to Nth eigenvalues of the eigenvalues of a scattering property evaluation matrix defined by an equation (11), which will be shown later, in terms of a set of models chosen by the first model limitation means with respect to each model face image and eigenvectors corresponding thereto and for holding or retaining the calculation results at every model, first model selection means for selecting a model face vector M from among the model face vectors restricted by the second model limitation means as the candidate at a first stage, and second model selection means selects a model face vector, of the model face vectors restricted by the second model limitation means, showing the fact that the variation of a point m obtained in the manner of being projected onto a space made by N eigenvectors Ej (j=1 . . . N) held in the model eigenspace holding means according to an equation (12), which will be described later, with respect to a point i of an object face vector I projected onto the same eigenspace in the same procedure stands at a minimum.