Modern computer systems increasingly rely on data processing techniques for rapid training and accurate identification and classification of datasets. These datasets may be sparse and over-constrained. For example, a radio communications receiver may receive only a few messages comprising data with many dimensions. Such a situation is referred to as being “overconstrained” since the system must infer a general characteristic based on only a few, very complex, samples. Despite this difficulty, the receiver must classify message patterns to accurately distinguish errors from authentic messages.
Various tools may be used to reformulate data into a form more amenable to analysis and data classification. Fisher's linear discriminant analysis (LDA) is one method for distinguishing classes of data within a dataset. Traditionally, LDA may be used in statistics and pattern recognition to linearly project high-dimensional observations from two or more classes onto a low-dimensional feature space before classification. By projecting data onto a lower dimensional feature space it may be easier to classify incoming data than if classification were attempted on the higher dimensional space. Furthermore, operating in a lower dimensional feature space may facilitate more efficient classification than in the original space.
Development of the Fisher Method
FIG. 1 depicts two data classifiers, 101a and 101b. Data falling within these classifiers may comprise audio data, video data, image data, or any dataset upon which classification may be performed. The classifiers may be generated from a plurality of “training” datapoints fed into the system, i.e. data with a corresponding classification already provided. A new data point, i.e. a “test” or “live” datapoint, whose values fall within the classifier 101a would be classified as data of the type corresponding to the classifier 201a. Similarly, a new data point whose values fall within the classifier 101b will be classified is data of the type corresponding to the classifier 101b. Here, the data comprises only two dimensions 102a, 102b, for ease of explanation, though one will readily recognize that data may regularly be represented in many more dimensions.
While one could simply identify the appropriate classification for a set of new data points by referring to the default coordinates of 102a, 102b, it is regularly the case that these default coordinates are not necessarily the best coordinates in which to represent the data to perform classification. Instead, another unidentified coordinate system may be more amenable to rapid classification. Furthermore, it may be preferable to use fewer dimensions when performing classification, as certain of the default dimensions 102a, 102b may be less useful for classification than other of the default dimensions (as mentioned above, not all 1600 pixels of an image are likely equally useful for facial classification). Identifying a smaller number of dimensions within which to perform classification is sometimes referred to as “dimensionality reduction”.
Once a new set of coordinates (103a, 103b) has been identified, the classifiers and these incoming data points may then be projected upon these new coordinates to facilitate data classification. In the example of FIG. 1, rather than consider the two dimensions 102a and 102b, one could instead project the classifiers and new incoming data upon the vector 103b. Classification could then be performed by noting the new data point's projected location upon the vector 103b. In this example, the distributions of classifiers 201a and 201b comprise means μ1 and μ2 respectively when projected along the vector φ 203b. 
One method for identifying the vector 103b is the Fisher Discrimination method which relies upon the Fisher Discrimination Criterion. The Fisher Discrimination Criterion relates the between-class variation (Sb) to the within-class variation (Sw) of the classifiers, as projected upon a candidate vector 103b. One may also refer to the total scatter St as SW+Sb. The between-class scatter Sb may be defined as:Sb=(μ1−μ2)(μ1−μ2)TεRN×N  (1)
In this example, the within class scatter may be defined asSw=S1+S2εRN×N  (2)
and the total scatter may be defined asSt=Sb+SwεRN×N  (3)
Intuitively, projected classifiers with high between-class variation and low within-class variation will facilitate better datapoint segregation than the converse. This is reflected in the Fischer Criterion which is defined as:
                                          ϕ            T                    ⁢                      S            b                    ⁢          ϕ                                      ϕ            T                    ⁢                      S            w                    ⁢          ϕ                                    (        4        )            
A high between-class variation (Sb) and a low within-class variation (Sw) will have a higher Fischer Criterion and will better facilitate classification. This criterion may be used to identify, of all the possible vectors in the space of coordinates 102a, 102b, the vector φ 103b which best segregates the classifiers 101a, and 101b. Some methods first identify the vector transpose φ0 103a, but the general concept is the same, as would be recognized by one in the art. Although in the simplified example of FIG. 1 one may readily determine that the vector φ 203b best segregates classifiers 101a and 101b, in a many-dimensional system with complicated classifiers the proper vector may be much more difficult to determine Thus, the Fisher Criterion provides a valuable metric for assessing a candidate vector's merit for improving classification.
The vector φ 103a may be identified by iterating through possible vectors in the space of 102a, 102b, and finding the vector which maximizes the Fisher Criterion for the classifiers. This “maximum vector”φ*F may be represented as
                                          ϕ            F            *                    =                                    argmax                              ϕ                ∈                                  R                  N                                                      ⁢                                                            φ                  T                                ⁢                                  S                  b                                ⁢                φ                                                              φ                  T                                ⁢                                  S                  t                                ⁢                φ                                                    ;                            (        5        )            
One may determine φ*F by alternatively computing the maximization of an equivalent criterion λF.
                                                        λ              F                        ⁡                          (              φ              )                                =                      argmax            ⁢                                                            φ                  T                                ⁢                                  S                  b                                ⁢                φ                                                              φ                  T                                ⁢                                  S                  t                                ⁢                φ                                                    ;                  0          ≤                                    λ              F                        ⁡                          (              φ              )                                ≤          1.                                    (        6        )            
For the sake of simplicity, the total scatter St is used, so that the values of λF fall within the range of 0 to 1. λF is referred to as the Fisher's Linear Discrimination Criterion (FLDC).
It can be shown that a vector φ that maximizes the FLDC must satisfy (a proof is provided in the attached appendix):Sbφ=λStφ,  (7)
for some constant λ. This is a generalized eigenvalue decomposition problem.
When Sb and St are both N×N symmetric matrices, there are N pairs of eigenvalues and eigenvectors that satisfy (7): (λ0, φ0), . . . , (λN−1, φN−1). The eigenvalues λ0, . . . , λN−1 are all real and, when Sb and St are scatter matrices lying in the range from 0 to 1. Without loss of generality, assume λ0≧ . . . ≧λN−1. Since Sb is a rank-one matrix, it can additionally be inferred that only one of the N eigenvalues λf is non-zero.0<λ0<1 and λ0, . . . , λN−1=0  (8)
Thus, the Fisher's Linear Discriminant Vector is the generalized eigenvector, φ0, corresponding to the only non-zero generalized eigenvalue, λ0, of Sb and St:
                              ϕ          F          *                =                  ϕ          0                                    (        9        )                                                      λ            F                    =                                    λ              0                        =                          argmax              ⁢                                                                    φ                    0                    T                                    ⁢                                      S                    b                                    ⁢                                      φ                    0                                                                                        φ                    0                    T                                    ⁢                                      S                    t                                    ⁢                                      φ                    0                                                                                      ;                            (        10        )            
The following is one proposed method for identifying λ0. From (7), consider performing a classical eignevale decomposition of (Sb−λSt) for a fixed λ. Let Eλ=[e0λ, . . . , eN−1λ] and Dλ=diag [d0λ, . . . , d0λ, . . . , dN−1λ], respectively denote the eigenvector and eigenvalue matrices of (Sb−λSt). The eigenvalue decomposition can be written asDλ=EλT(Sb−λSt)Eλ  (11)
An eigenvalue d0λ is related to its eigenvector d0λ by Dλ=EλT(Sb−λSt)Eλ. Without loss of generality, assume [d0λ, . . . , dN−1λ].
Thus, the optimal value of the Fisher's Discriminant criterion, may be computed as a value of 0<λ<1 that makes (Sb−λSt) semi-negative definite. It can be shown that there exists only one unique value of λ in the range [0,1] that satisfies the above condition (proof is provided in the Appendix). Therefore, if we let f(λ):[0,1]−>R represent the largest eigenvalue of (Sb−λSt) as a function of λ, i.e.
                              f          ⁡                      (            λ            )                          ≡                              max                                          φ                :                                                    ϕ                                                              =              1                                ⁢                                                    ϕ                T                            ⁡                              (                                                      S                    b                                    -                                      λ                    ⁢                                                                                  ⁢                                          S                      t                                                                      )                                      ⁢                          E              λ                                                          (        12        )                                =                                            e              0                              λ                T                                      ⁡                          (                              Sb                -                                  λ                  ⁢                                                                          ⁢                  St                                            )                                ⁢                      e            0            λ                                              (        13        )                                =                  d          0          λ                                    (        14        )            
The optimal value of the Fisher's criterion, λ*F, may then be computed as=e0λT(Sb−λSt)e0λ  (15)
The Fisher's discriminant vector φ*F may then be given byφ*f=eλ*F  (16)
The function f(λ): is bounded on [0,1] and satisfies the following properties on the closed interval.λ<λ*Ff(λ)>0  (17)λ>λ*Ff(λ)=0  (18)λ=λ*Ff(λ)=0  (19)Generalized Summary of the Fisher Discrimination Analysis
While the preceding section and attached appendices are intended to provide a thorough treatment of the Fisher Discrimination Analysis methodology as used in certain embodiments, FIG. 2 provides a more generalized overview of this reasoning for ease of comprehension. Particularly, FIG. 2 summarizes the analysis producing the function f(λ) and the corresponding search algorithm which will be improved upon by certain embodiments discussed in greater detail below.
As discussed above, the analysis begins 201 by recognizing that we would like to use the Fisher criterion to determine an appropriate projection vector φ*F 202. Determining φ*F requires that we find the maximum argument of
                                          φ            T                    ⁢                      S            b                    ⁢          φ                                      φ            T                    ⁢                      S            t                    ⁢          φ                                    (        20        )            
This may be rewritten as an eigenvalue decomposition problem 203. By the proof provided in Appendix B, it may then be shown that the optimal value of the Fisher's Discriminant criterion can be computed by finding a value between 0 and 1 that makes (Sb−λSt) semi-negative definite. Fortuitously, there is only one value in that range which will make (Sb−λSt) semi-negative definite. From these conditions we may define the function 304.
                              f          ⁡                      (            λ            )                          ≡                              max                                          φ                :                                                    ϕ                                                              =              1                                ⁢                                                    ϕ                T                            ⁡                              (                                                      S                    b                                    -                                      λ                    ⁢                                                                                  ⁢                                          S                      t                                                                      )                                      ⁢                          E              λ                                                          (        21        )                                =                                            e              0                              λ                T                                      ⁡                          (                                                S                  b                                -                                  λ                  ⁢                                                                          ⁢                                      S                    t                                                              )                                ⁢                      e            0            λ                                              (        22        )            
This function has various properties 205. In view of these properties, we recognize that we may find λ* by iterating through possible value of λ, and plugging them into the equation 204, until we identify a value of λ which produces an f(λ) of 0. This λ will be λ*, which we may then use in conjunction with the equation (21) to determine the projection vector φF*, which we had originally sought.
The following section discusses one possible algorithm for finding λ* from the equation f(λ) 204.
Algorithmic Search for λ* Using the Function f(λ)
Referring to the conditions 205 of FIG. 2, the λ which is λ* may be found by a bisection search algorithm. That is, if λ is too low (condition #1) then f(λ) will be too high. Thus a larger value of λ must be selected. However, if too large a value of λ is selected, then f(λ) will be negative (condition #2). One could iterate, selecting successively more granular deltas, until a satisfactorily low value of f(λ) were achieved. In this manner, λ may be made as close to λ* as desired.
FIG. 3 is plot of a function f(λ) representing the largest eigenvalue of (Sb−λSt) as a function of λ. As indicated, the function takes on values of λ 301 along the range from 0 to 1. In this particular example, the function passes through 0 at the value of 0.35. As discussed above, the λ that produces this zero value is the λ* 305 which we seek. Thus, in this example, λ* is 0.35. As the shape of the function f is known it is possible to iteratively search for the value of λ which sets f(λ) to zero. For example, one could begin at 0.5 and calculate the value of f(λ) by solving the equation 204. If f(λ) 302 is less than zero (as is the case at 0.5 in this example), one could then select a value smaller than the previously selected value, say, 0.25. Calculating f(λ) for λ=0.25 generates a positive f(λ) 302 and so one may then select a value to the right of the previous value, 0.25, but before the first selected value, 0.5. Thus, one might select λ=0.375, which would generate a slightly negative f(λ). The process may continue, ad infinum, or until a desired level of precision is reached.
FIG. 4 is a generalized algorithm of this process, i.e., iteratively driving the largest eigenvalue of (Sb−λSt) to zero using bisection. Line 401 initializes variables a and b which represent the smallest and largest values respectively of the λ range to be considered. For as many iterations K as desired 402, the system then iterates in search of λ*. A candidate λ is determined by averaging the values 403. A corresponding eigenvector 404 may be noted. The system may then calculate f(λ) 405. As indicate above, it may require considerable computational resources to perform this calculation. Calculating f(λ) iteratively may impose too great a burden for some systems. If f(λ) is greater than zero 406, the system assigns 407 the selected 2, as the smallest value in the range to consider before continuing the iterations. If f(λ) is negative 408 then the range is instead updated by assigning the candidate value as the largest value in the range to be considered 409. When K iterations are reached, the system assigns 412 the Kth eigenvalue and eigenvector for output. One will readily recognize many modifications to this example exist in the art.
Search algorithms such as the bisection search of FIG. 4 pertaining to the Fisher Discrimination embodiment are common to many classification problems. In many of these problems a metric function, such as f(λ), must be repeatedly calculated. This metric function may comprise eigenvector and eigenvalues which must be rapidly calculated, or else the iterations will take far too long to complete.
Unfortunately, as the computational complexity of linear feature extraction increases linearly with dimensionality of the observation samples, computation can become intractable for high dimensional data, particularly where the operation is to be performed in real time. As mobile devices and portable computers become more prevalent, there is an increasing need for more efficient and robust classification systems. In particular, the calculation of the metric f(λ) as part of the search algorithm for λ* discussed above is computationally intensive and represents a barrier to more efficient training.