In the field of face recognition, use is made nowadays of statistical procedures for classification by reduction of dimensions. Statistical models of reduction of dimension are constructed by virtue of a set of images of faces that is referred to hereinafter as the “learning base” (or BA for short). One and the same person is preferably represented several times in the learning base. The various representations of the face of one and the same person are then referred to as a “class”.
The two major applications of face recognition are “authentication” and “Identification”.
Within the framework of authentication, the face to be recognized is compared with the faces of the learning base so as to assign it an identity. Biometry or video surveillance, for example, are applications based on authentication.
Within the framework of identification, once the model has been constructed, the learning base is no longer used. One has a first unknown face and a second unknown face and one seeks to compare these two faces so as to determine whether or not it is the same person.
The indexing of movie sequences, for example video sequences, may use applications of this type. The interest could for example be to detect in a video all the sequences where one and the same person is found.
The recognition of faces in digital images currently proposes various methods that can be classed in the following manner:                structural procedures based on the analysis of the geometrical components of the face;        so-called “neural network based” procedures (or “support vector machines”); and        statistical procedures which currently play a fundamental role in dimension reduction and show excellent performance.        
Tools for statistical analysis of data of large dimension n make it possible to reduce a complex system of correlations by projecting initial data into a space of lesser dimension k<<n. According to the intended objective, the projection space thus obtained may give:                optimal reconstruction of the learning base for a fixed dimension k; this is the object of the statistical procedure termed “principal component analysis” (or “PCA” hereinbelow),        or better discrimination between different persons (or “classes” as indicated hereinabove) of the learning base; this is the object of the statistical procedure termed “linear discriminant analysis” (“LDA” hereinafter).        
Described hereinafter is the procedure termed “principal component analysis” of the prior art and set forth in particular in:
“Eigenfaces for recognition”, M. Turk and A. Pentland, Journal of Cognitive Neuroscience, vol. 3, March 1991, pages 71-86.
A commonly used definition of the PCA is that which associates, with an input set of vectors, a set of orthogonal principal axes (termed “principal components”) and on which the projection of the variance of the input vectors is a maximum.
Here, it is indicated that the term “vector” refers to a column vector. Moreover, X denotes the vector to be projected, of large size l·m. Its orthogonal projection is denoted {circumflex over (X)}. This projection is performed onto the space of dimension k, an orthogonal base of which is stored in the form of columns in the matrix P. The matrix P is therefore of size (l·m)×k. The projection of the vector X is then expressed by:{circumflex over (X)}=PTX  (1)
The matrix P is called the “matrix of projection of the initial space into the space of principal components”.
As criterion to be maximized we choose:J(P)=trace(Sp)  (2)where Sp denotes the covariance matrix of the learning base projected into the base of P, i.e.:
                              S          p                =                              1            n                    ⁢                                    ∑                              i                =                1                            n                        ⁢                                          (                                                                            X                      ^                                        i                                    -                                      X                                          ^                      _                                                                      )                            ⁢                                                (                                                                                    X                        ^                                            i                                        -                                          X                                              ^                        _                                                                              )                                T                                                                        (        3        )            where n denotes the number of images present in the learning base.
If Xi denotes the input vector corresponding to the ith vector of the learning base BA, we have:
            X      ^        i    =                    P        T            ⁢              X        i            ⁢                          ⁢      and      ⁢                                        ⁢                                      ⁢                        X          ^                _              =                  1        n            ⁢                        ∑                      i            =            1                    n                ⁢                              X            ^                    i                    
It is indicated that maximizing the criterion according to relation (2) amounts to maximizing the criterion:
                    P        =                              Argmax                                          ℛ                                  (                                      l                    ·                    m                                    )                                            ×                              ℛ                k                                              ⁢                                                                P                                  T                  ·                                            ⁢              S              ⁢                                                          ⁢              P                                                                      (        4        )            where  is the set of matrices with real coefficients of size (l·m)×k.
S is the covariance matrix of the learning base BA, of size (l·m)×(l·m), given by the following relation:
                    S        =                              1            n                    ⁢                                    ∑                              i                =                1                            n                        ⁢                                          (                                                      X                    i                                    -                                      X                    _                                                  )                            ⁢                                                (                                                            X                      i                                        -                                          X                      _                                                        )                                T                                                                        (        5        )            where, with the notation given previously:
      X    _    =            1      n        ⁢                  ∑                  i          =          1                n            ⁢                        X          i                .            
It is shown that, under the hypothesis that the vectors Xi of the learning base are Gaussian vectors, that are pairwise independent and identically distributed (property denoted “iid” hereinafter), P is composed of the k eigenvectors of S that are associated with the k largest eigenvalues (k fixed).
The PCA is commonly used to represent or recognize faces. The procedure for recognizing faces that is best known and based on the PCA has been proposed in the aforesaid document: “Eigenfaces for recognition”, M. Turk and A. Pentland, Journal of Cognitive Neuroscience, vol. 3, March 1991, pages 71-86.
The procedure requires a learning base consisting of a set of images presented as input in the form of a vector per image. Each image Xi, consisting of l rows and m columns of pixels as grey levels, is thus reduced to a vector of size l·m by concatenating its rows of pixels. A PCA is performed directly on these vectors, giving a set of k principal components of the same size l·m as the initial image vectors and are designated by the term “eigenfaces”. The number k of principal components to be retained may be fixed or else determined from the eigenvalues.
The comparison between two images of faces is made following a projection into the base of the eigencomponents according to relation (1) hereinabove. The two projected vectors are compared according to a measure based on a predetermined criterion of similarity.
It is shown in particular that the principal components constitute the subspace of dimension k minimizing the mean quadratic error of reconstruction, defined as being the distance denoted L2 between the learning base and its orthogonal projection in the base consisting of the principal components.
However, a drawback of the procedure resides in the fact that this base does not necessarily offer an optimal classification of the data. Specifically, the principal components maximize the total variance of the learning base, without distinguishing the variations internal to each class from the variations between classes.
Described hereinafter is a procedure arising out of Linear Discriminant Analysis (or LDA), of the prior art, commonly used in shape recognition and described in particular in:
“Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection”, P. Belhumeur, J. Hespanha and D. Kriegman, Special Theme Issue on Face and Gesture Recognition of the IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(7), pages 711-720, July 1997.
It makes it possible to find the subspace that jointly maximizes the variations between classes and minimizes the mean variation inside the classes, so as to give a subspace called the “discriminant subspace” of the initial space and ensuring better discrimination between classes.
The LDA is distinguished from the PCA in particular in that the LDA is a supervised procedure, that is to say the construction of the model requires, for each image of the learning base, the datum of a vector as well as the datum of its membership class.
Here the hypothesis is made that the images of one and the same class, these being represented in the form of vectors, are Gaussian vectors having the iid property described hereinabove. P denotes the orthogonal projection matrix from the initial space into the discriminating subspace within the sense of equation (1) hereinabove. If we choose as criterion to be maximized the expression:J(P)=trace((Swp)−1Sbp)  (6)where Swp (respectively Sbp) is the intra-class (respectively inter-class) covariance matrix of the learning base projected by the orthogonal projection P. P is a matrix of size (l·m)×k. It is recalled that a class is a set of representations of one and the same face in the learning base. It is indicated also that the terms “intra-class” are concerned with properties within one and the same class, whereas the terms “inter-classes” are concerned with properties from one class to another.
If C denotes the total number of classes of the learning base, we define:
                                          S            b            p                    =                                    1              n                        ⁢                                          ∑                                  c                  =                  1                                C                            ⁢                                                                    n                    c                                    (                                                                                                              X                          ^                                                c                                            _                                        -                                                                  X                        ^                                            _                                                        )                                ⁢                                                      (                                                                                                                        X                            ^                                                    c                                                _                                            -                                                                        X                          ^                                                _                                                              )                                    T                                                                    ⁢                                  ⁢        and        ⁢                                  ⁢                              S            w            p                    =                                    1              n                        ⁢                                          ∑                                  c                  =                  1                                C                            ⁢                                                ∑                                      l                    ∈                    C                                                  ⁢                                                      (                                                                                            X                          ^                                                i                                            -                                                                                                    X                            ^                                                    c                                                _                                                              )                                    ⁢                                                            (                                                                                                    X                            ^                                                    i                                                -                                                                                                            X                              ^                                                        c                                                    _                                                                    )                                        T                                                                                                          (        7        )            where:                nc is the number of images of the person corresponding to class c and contained in the learning base,        
                              X          ^                c            _        =                  1                  n          c                    ⁢                        ∑                      l            ∈            c                          ⁢                              X            ^                    i                      ,          ⁢  and                {circumflex over (X)}i=PTXi is a vector of size k corresponding to the image Xi projected into the base P according to equation (1) hereinabove.        
It is then indicated that maximizing the criterion according to relation (5) amounts to choosing the matrix P in the following manner:
                              P          =                      Argmax                                          ℛ                                  (                                      l                    ·                    m                                    )                                            ×                              ℛ                k                                                    ⁣                                                                        P                T                            ⁢                              S                b                            ⁢              P                                                                                                  P                T                            ⁢                              S                w                            ⁢              P                                                                      (        8        )            where:                Sb is the inter-class covariance matrix for the learning base such that        
                                          S            b                    =                                    1              n                        ⁢                                          ∑                                  c                  =                  1                                C                            ⁢                                                                    n                    c                                    ⁡                                      (                                                                                            X                          c                                                _                                            -                                              X                        _                                                              )                                                  ⁢                                                      (                                                                                            X                          c                                                _                                            -                                              X                        _                                                              )                                    T                                                                    ,                            (        9        )            and                and Sw is the intra-class covariance matrix such that:        
                              S          w                =                              1            n                    ⁢                                    ∑                              c                =                1                            C                        ⁢                                          ∑                                  i                  ∈                  c                                            ⁢                                                [                                                            (                                                                        X                          i                                                -                                                                              X                            c                                                    _                                                                    )                                        ⁢                                                                  (                                                                              X                            i                                                    -                                                                                    X                              c                                                        _                                                                          )                                            T                                                        ]                                .                                                                        (        10        )            
The columns of P contain the k eigenvectors of the matrix Sw−1Sb associated with the k largest eigenvectors, where Sw−1 is the inverse of Sw.
As a general rule, the dimension of the input vectors is much bigger than the number of examples acquired in the learning base (l·m>>n). The matrix Sw is then singular and noninvertible.
It is then possible to perform the LDA in a base, determined previously by applying the PCA procedure, the dimension of this base being less than the number of examples of the learning base. This approach is designated hereinafter by the abbreviation “PCA+LDA”.
The number of components to be retained, called “Fisherfaces”, may be determined thereafter in the same manner as for the PCA described hereinabove. The classification is performed after orthogonal projection into the space of Fisherfaces, in the same manner as for the PCA.
Following the numerous comparative studies between the PCA and LDA of the prior art, it is taken as read that if one considers a learning base of sufficient size and sufficiently representative, the LDA gives better results than the PCA and makes it possible above all to better manage the differences of illumination in the pictures of the faces, the differences of facial expressions and of pose.
Very recently, another procedure based on a “bidimensional” PCA has been proposed in:
“Two-dimensions PCA: A New Approach to Appearance-Based Face Representation and Recognition”, J. Yang, D. Zhang, A. F. Frangi and J. Y. Yang, in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 1, January 2004.
This procedure is described hereinafter.
One constructs a model on the basis of the set of images of the learning base that are stored in the form of matrices of pixels having l rows and m columns. More particularly, we seek k orthonormal vectors, P=[Pl, . . . , Pk] of length m, such that the projection of the learning base onto this base of vectors ensures a maximum of variance of the learning base. The projection of an image X of size l×m onto the matrix P of size m×k is given by the following linear relation, where {circumflex over (X)} is a matrix of size l×k:{circumflex over (X)}=XP  (11)
The criterion to be maximized is as follows:J(P)=trace(Sp)  (12)
Sp designates the covariance matrix (termed “bidimensional” as opposed to a unicolumn vector) of the n vectors of the images projected from the learning base onto the vector base P. If we consider:                that Xi is the matrix of size l×m of the ith image of the learning base,        that {circumflex over (X)}i=XiP is the matrix of size l×k projected from Xi by P according to equation (11) hereinabove, and        that        
            X      ^        _    =            1      n        ⁢                  ∑                  i          =          1                n            ⁢                        X          ^                i            is the mean matrix projected (of size l×k) from the learning base onto P,we obtain
                              S          p                =                              1            n                    ⁢                                    ∑                              i                =                1                            n                        ⁢                                                            (                                                                                    X                        ^                                            i                                        -                                                                  X                        ^                                            _                                                        )                                T                            ⁢                              (                                                                            X                      ^                                        i                                    -                                      X                                          ^                      _                                                                      )                                                                        (        13        )            we show that the criterion according to relation (12) is equivalent to:J(P)=PTSP  (14)
In this expression, S is the covariance matrix of the columns of the images and it is calculated as follows:
                    S        =                              1            n                    ⁢                                    ∑                              i                =                1                            n                        ⁢                                                            (                                                            X                      i                                        -                                          X                      _                                                        )                                T                            ⁢                              (                                                      X                    i                                    -                                      X                    _                                                  )                                                                        (        15        )            where
            X      _        =                  1        n            ⁢                        ∑                      i            =            1                    n                ⁢                  X          i                      ,is a mean matrix, of size l×m, of the n images of the learning base.
The criterion to be maximized according to relation (14) is called the “generalized total dispersion criterion”. As for the PCA, the k vectors └Pl, . . . , Pk┘ to be retained are the eigenvectors of the matrix S corresponding to the largest eigenvalues. P=└Pl, . . . , Pk┘ denotes the projection matrix within the sense of relation (11). The projection of the image Xl by P is denoted {circumflex over (X)}i=└{circumflex over (X)}il, . . . , {circumflex over (X)}ik┘ where {circumflex over (X)}il=XiPj is the vector of length l (projection of the image Xi onto the vector Pj).
The number k of components to be retained may be determined in the same manner as for the PCA, seen hereinabove.
As before, the comparison between two faces is performed in the projected space. Here, the projection of an image onto P no longer gives a vector, but a matrix. Therefore, here we use a measure of similarity between matrices {circumflex over (X)}i of size l×k. The distance between the matrices {circumflex over (X)}i and {circumflex over (X)}t may be as follows:
                              d          ⁡                      (                                                            X                  ^                                i                            ,                                                X                  ^                                t                                      )                          =                                            ∑                              j                =                1                            k                        ⁢                                                                                                                      X                      ^                                        i                    j                                    -                                                            X                      ^                                        t                    j                                                                              2                                =                                    ∑                              j                =                1                            k                        ⁢                                                                                (                                                                                            X                          ^                                                i                        j                                            -                                                                        X                          ^                                                t                        j                                                              )                                    T                                ⁢                                  (                                                                                    X                        ^                                            i                      j                                        -                                                                  X                        ^                                            t                      j                                                        )                                                                                        (        16        )            
The bidimensional PCA, conducted with a very small number of components k, gives better results than the classical PCA, on the same bases “Yale Face DataBase B” “ORL” and “AR”, which are among the bases of faces that are best known.
As indicated hereinabove, numerous procedures for reducing dimensions having however been proposed in the prior art, the major disadvantage of the PCA and LDA procedures resides in the fact that the size m·l of the vectors of the learning base is generally very large, this leading to:                an excessive number of calculations during classification;        difficulties during the evaluation of the covariance matrices S, Sb and Sw;        the non-invertibility of the matrix Sw.        
Within the framework of the LDA, to circumvent this problem, one generally performs two projections instead of just one, by combining the PCA and LDA procedures (so-called “PCA+LDA” processing). However, this approach considerably increases the complexity of the LDA. Moreover, the choice of linear combinations of the principal components as discriminating components, on the one hand, is not justified and, on the other hand, it lacks rigor.
The bidimensional PCA, even though it does not exhibit this drawback and though it guarantees optimal conservation of the global variance of the projection learning base, does not make it possible to distinguish variations between classes and variations inside classes. Like the PCA procedure, this technique is well suited to the reconstruction of images of faces after compression by reduction of dimension. However, as for the PCA, it does not necessarily ensure good discrimination between classes.