1. Field of the Invention
The present invention relates generally to biometrics and computer applications for facial recognition, and particularly to a method of performing facial recognition using genetically modified fuzzy linear discriminant analysis, including modification of the Fuzzy Fisherface classification scheme.
2. Description of the Related Art
Facial recognition has recently found many applications and has attracted substantial research efforts from the areas of computer vision, bio-informatics, and machine learning. The techniques used for facial recognition are broadly classified as either appearance-based or geometrical feature-based. The appearance-based techniques use the holistic features of the face image, whereas the geometrical features of the image are utilized in the latter. Some researchers have also adopted a hybrid methodology by applying the appearance-based techniques on the localized regions of the facial image.
Principal Component Analysis (PCA) is one of the most successful techniques used in face image recognition. PCA can be used to perform prediction, redundancy removal, feature extraction, and data compression, etc. PCA essentially reduces the large dimensionality of the data space. The projection of the data is in the direction of the maximum variance of the data used to find the features. However, this subspace is not necessarily optimal in terms of face classification. Large 1-D vectors of pixels are constructed from 2-D facial images, by concatenating the columns and are then projected onto the eigenvectors of the covariance matrix of the training image vectors. If there are N (the number of images) vectors of size M (rows by columns of an image), then the mean vector of all of the images is given by:
                    m        =                              1            N                    ⁢                                    ∑                              i                =                1                            N                        ⁢                                          x                i                            .                                                          (        1        )            
In PCA, the image vectors are first mean-centered. The set of T orthonormal vectors wi's is sought, forming the projection matrix W of order (M×T), and the feature vectors are then given by the following linear transformation:yk=WTxk  (2)
PCA relies on maximizing the total scatter of the training vectors. The total scatter matrix ST is given by:
                              S          T                =                              ∑                          k              =              1                        N                    ⁢                                    (                                                x                  k                                -                m                            )                        ⁢                                                            (                                                            x                      k                                        -                    m                                    )                                T                            .                                                          (        3        )            The scatter of the transformed feature vectors is given by WTSTW. The projection matrix WPCA satisfies the following:WPCA=argWmax|WTSTW|.  (4)
It can be shown from linear algebra that the wi are the eigenvectors of the covariance matrix C=PTP, where P is a matrix composed of the mean centered images mi as the column vectors placed side by side. Since N image vectors are summed up, the rank of the covariance matrix cannot exceed (N−1), since the vectors are mean subtracted.
The non-zero eigenvalues of the covariance matrix have corresponding orthonormal eigenvectors. The eigenvector associated with the largest eigenvalue is one that reflects the greatest variance in the image. The eigenvalues decrease very rapidly: Roughly 90% of the total variance is contained in the first 5% to 10% of the dimensions, as shown in FIG. 2.
FIG. 2 shows only the first one hundred eigenvectors used for projection, but it is clear that projection using only the first fifteen eigenvectors of the covariance matrix results in above 90% recognition accuracy. Thus, image vectors are projected onto a subspace formed by the most significant eigenvectors (i.e., the principal components) of the covariance matrix. When a test image is projected onto the N-dimensional subspace, it is classified as the class of the vector that minimizes the Euclidean distance with it.
Linear Discriminant Analysis (LDA) looks for the projection matrix that provides the best discrimination among the different classes. LDA tries to achieve this by finding a subspace in which the projected vectors of the different classes are maximally separated. The between-class scatter matrix SB and the within-class scatter matrix SW are defined as:
                              S          B                =                              ∑                          i              =              1                        c                    ⁢                                                    M                i                            ⁡                              (                                                      x                    i                                    -                  m                                )                                      ·                                          (                                                      x                    i                                    -                  m                                )                            T                                                          (        5        )                                                      S            W                    =                                    ∑                              i                =                1                            c                        ⁢                                          ∑                                                      x                    k                                    ∈                                      X                    i                                                              ⁢                                                                    M                    i                                    ⁡                                      (                                                                  x                        i                                            -                                              m                        i                                                              )                                                  ·                                                      (                                                                  x                        i                                            -                      m                                        )                                    T                                                                    ,                            (        6        )            where Mi is the number of training vectors in the i-th class, c is the number of distinct classes, mi is the mean of all the vectors belonging to the i-th class, and Xi represents the set of samples belonging to the i-th class, where xk is the k-th image of that class.
SW represents the scatter of the features around the mean of each class, and SB represents the scatter of features around the overall mean for all the classes. In Fisher's LDA, the aim is to maximize SB while minimizing SW, which translates to maximize the ratio between their determinants
            det      ⁢                                S          B                                    det      ⁢                                S          W                              :
                              W          LDA                =                  arg          ⁢                                          ⁢                                    max              W                        ⁢                                                                                                                        W                      T                                        ⁢                                          S                      B                                        ⁢                    W                                                                                                                                                  W                      T                                        ⁢                                          S                      W                                        ⁢                    W                                                                                .                                                          (        7        )            
This ratio is maximized when the column vectors of the projection matrix WLDA are the eigenvectors of SW−1SB. In order to avoid SW from becoming singular, PCA is used as a preprocessing step. Thus, the final transformation is given by the following matrix:WT=WLDAT·WPCAT.  (8)LDA produces well-separated classes in a low-dimensional subspace, even under severe variations in lighting and facial expressions.
In the Fuzzy Fisherface LDA (FLDA), the basic LDA is modified. The modification is the introduction of fuzziness into the “belong-ness” of every projected vector to the classes. In the conventional approach, every vector is assumed to have a crisp membership in the class to which it belongs. However, this does not take into account the resemblance of images belonging to different classes, which occurs under varying conditions. In FLDA, a vector is assigned the membership grades for every class based upon the class label of its k nearest neighbors. This fuzzy k-nearest neighbor algorithm is used to calculate the membership grades of all the vectors. In this manner, the inter-class image resemblance is accounted for. The fuzzy C-class partitioning of the vectors defines the degrees of membership of each vector to all the classes.
In the following, μij represents the membership grade of the j-th vector in the i-th class. The membership functions satisfy the two obvious conditions:
                                          ∑                          i              =              1                        C                    ⁢                      μ            ij                          =        1                            (        9        )                                0        <                              ∑                          j              =              1                        N                    ⁢                      μ            ij                          <                  N          .                                    (        10        )            
During the training phase, the class labels of the k vectors located in the closest neighborhood of each vector is collected. Then, the membership grade of the j-th vector to i-th class is calculated using the expression as:
                              μ          ij                =                  {                                                                                                                                        0.51                        +                                                  0.49                          ⁢                                                      (                                                                                          n                                ij                                                            k                                                        )                                                                                                                                                                                                  if                        ⁢                                                                                                  ⁢                        i                        ⁢                                                                                                  ⁢                        is                        ⁢                                                                                                  ⁢                        the                        ⁢                                                                                                  ⁢                        same                        ⁢                                                                                                  ⁢                        as                        ⁢                                                                                                  ⁢                        the                        ⁢                                                                                                  ⁢                        label                        ⁢                                                                                                  ⁢                        of                        ⁢                                                                                                  ⁢                        the                        ⁢                                                                                                  ⁢                        j                        ⁢                                                  -                                                ⁢                        th                        ⁢                                                                                                  ⁢                        pattern                                                                                                                                                                  0.49                  ⁢                                      (                                                                  n                        ij                                            k                                        )                                    ⁢                                                                          ⁢                  otherwise                                                                                        (        11        )            where nij stands for the number of the neighbors of the j-th vector that belong to the i-th class. The membership allocation formula refines the membership grades of the labeled vectors, and the dominant membership has not been affected. These modified membership grades are used in the computations of the statistical properties of the patterns, such as the mean value and scatter covariance matrices SB and SW:
                                          m            _                    l                =                                            ∑                              j                =                1                            N                        ⁢                                          μ                ij                p                            ⁢                              x                j                                                                        ∑                              j                =                1                            N                        ⁢                          μ              ij              p                                                          (        12        )                                          FS          B                =                              ∑                          i              =              1                        C                    ⁢                                    ∑                              j                =                1                            N                        ⁢                                                            μ                  ij                  p                                ⁡                                  (                                                                                    m                        _                                            l                                        -                    m                                    )                                            ·                                                (                                                                                    m                        _                                            l                                        -                    m                                    )                                T                                                                        (        13        )                                          FS          W                =                              ∑                          i              =              1                        C                    ⁢                                    ∑                                                x                  k                                ∈                                  X                  i                                                      ⁢                                                            μ                  ij                  p                                ⁡                                  (                                                            x                      i                                        -                                                                  m                        _                                            l                                                        )                                            ·                                                (                                                            x                      i                                        -                                                                  m                        _                                            l                                                        )                                T                                                                        (        14        )            where i=1, 2, . . . , c and p, a fuzzy modifier, is a constant that controls the influence of the fuzzy membership degree.
Although the Fuzzy Fisherface LDA modification results in improved facial recognition, there is still a need for further improvement in computer applications for facial recognition. Thus, a method of performing facial recognition using genetically modified fuzzy linear discriminant analysis solving the aforementioned problems is desired.