The present invention relates generally to pattern recognition systems and, more particularly, to methods and apparatus for performing discriminant feature space analysis in pattern recognition systems such as, for example, speech recognition systems.
State-of-the-art speech recognition systems use cepstral features augmented with dynamic information from the adjacent speech frames. The standard MFCC+xcex94+xcex94xcex94 scheme (Mel-Frequency Cepstral Coefficients plus first and second derivatives, or delta and double delta), while performing relatively well in practice, has no real basis from a discriminant analysis point of view. The same argument applies for the computation of the cepstral coefficients from the spectral features: it is not clear that the discrete cosine transform, among all linear transformations, has the best discriminatory properties even if its use is motivated by orthogonality considerations.
Linear discriminant analysis (LDA) is a standard technique in statistical pattern classification for dimensionality reduction with a minimal loss in discrimination, see, e.g., R. O. Duda et al., xe2x80x9cPattern Classification and Scene Analysis,xe2x80x9d Wiley, New York, 1973; and K. Fukunaga, xe2x80x9cIntroduction to Statistical Pattern Recognition,xe2x80x9d Academic Press, New York, 1990, the disclosures of which are incorporated by reference herein. Its application to speech recognition has shown consistent gains for small vocabulary tasks and mixed results for large vocabulary applications, see, e.g., R. Haeb-Umbach et al., xe2x80x9cLinear Discriminant Analysis for Improved Large Vocabulary Continuous Speech Recognition,xe2x80x9d Proceedings of ICASSP ""92, Volume 1, pp. 13-16, 1992; E. G. Schukat-Talamazzini et al., xe2x80x9cOptimal Linear Feature Space Transformations for Semi-Continuous Hidden Markov Models,xe2x80x9d Proceedings of ICASSP ""95, pp. 369-372, 1994; and N. Kumar et al., xe2x80x9cHeteroscedastic Discriminant Analysis and Reduced Rank HMMs for Improved Speech Recognition,xe2x80x9d Speech Communication, 26:283-297, 1998, the disclosures of which are incorporated by reference herein.
One reason could be because of the diagonal modeling assumption that is imposed on the acoustic models in most systems: if the dimensions of the projected subspace are highly correlated then a diagonal covariance modeling constraint will result in distributions with large overlap and low sample likelihood. In this case, a maximum likelihood feature space transformation which aims at minimizing the loss in likelihood between full and diagonal covariance models is known to be very effective, see, e.g., R. A. Gopinath, xe2x80x9cMaximum Likelihood Modeling with Gaussian Distributions for Classification,xe2x80x9d Proceedings of ICASSP ""98, Seattle, 1998; and M. J. F. Gales, xe2x80x9cSemi-tied Covariance Matrices for Hidden Markov Models,xe2x80x9d IEEE Transactions on Speech and Audio Processing,xe2x80x9d 7:272-281, 1999, the disclosures of which are incorporated by reference herein.
Secondly, it is not clear what the best definition for the classes should be: phone, subphone, allophone or even prototype-level classes can be considered, see, e.g., R. Haeb-Umbach et al., xe2x80x9cLinear Discriminant Analysis for Improved Large Vocabulary Continuous Speech Recognition,xe2x80x9d Proceedings of ICASSP ""92, Volume 1, pp. 13-16, 1992, the disclosure of which is incorporated by reference herein. Related to this argument, the class assignment procedure has an impact on the performance of LDA; EM-based (Expectation Maximization algorithm based) approaches which aim at jointly optimizing the feature space transformation and the model parameters have been proposed, see, e.g., the above-referenced E. G. Schukat-Talamazzini et al. article; the above-referenced N. Kumar et al. article; and the above-referenced M. J. F. Gales article.
Chronologically, the extension of LDA to Heteroscedastic Discriminant Analysis (HDA) under the maximum likelihood framework appears to have been proposed first by E. G. Schukat-Talamazzini in the above-referenced article (called maximum likelihood rotation). N. Kumar, in the above-referenced N. Kumar et al. article, studied the case for diagonal covariance modeling and general (not necessarily orthogonal) transformation matrices and made the connection with LDA. Following an argument of Campbell, in N. A. Campbell, xe2x80x9cCanonical Variate Analysisxe2x80x94A General Model Formulation,xe2x80x9d Australian Journal of Statistics, 26(1):86-96, 1984, the disclosure of which is incorporated by reference herein, N. Kumar showed that HDA is a maximum likelihood solution for normal populations with common covariances in the rejected subspace. In R. A. Gopinath, xe2x80x9cMaximum Likelihood Modeling with Gaussian Distributions for Classification,xe2x80x9d Proceedings of ICASSP ""98, Seattle, 1998, the disclosure of which is incorporated by reference herein, a maximum likelihood linear transformation (MLLT) was introduced which turns out to be a particular case of Kumar""s HDA when the dimensions of the original and the projected space are the same. Interestingly, M. J. F. Gales"" global transform for semi-tied covariance matrices, in the above-referenced M. J. F. Gales article, is identical to MLLT but applied in the model space (all other cases are feature space transforms). Finally, Demuynck in K. Demuynck, et al., xe2x80x9cOptimal Feature Sub-space Selection Based On Discriminant Analysis,xe2x80x9d Proceedings of Eurospeech ""99, Budapest, Hungary, 1999, the disclosure of which is incorporated by reference herein, uses a minimum divergence criterion between posterior class distributions in the original and transformed space to estimate an HDA matrix.
Thus, as suggested above, LDA is known to be inappropriate for the case of classes with unequal sample covariances. While, in recent years, there has been an interest in generalizing LDA to HDA by removing the equal within-class covariance constraint, as mentioned above, there have not been any substantially satisfactory approaches developed. One main reason for this is because existing approaches deal with objective functions related to the rejected dimensions which are irrelevant to the discrimination of the classes in the final projected space. Thus, a need exists for an improved HDA approach for use in pattern recognition systems.
The present invention provides a new approach to heteroscedastic linear analysis (HDA) by defining an objective function which maximizes the class discrimination in the projected subspace while ignoring the rejected dimensions. Accordingly, in one aspect of the invention, a method for use in a pattern recognition system of processing feature vectors extracted from a pattern signal input to the system, comprises the following steps. First, a projection matrix is formed based on a heteroscedastic discriminant objective function which, when applied to the feature vectors extracted from the pattern signal, maximizes class discrimination in a resulting subspace associated with the feature vectors, while ignoring one or more rejected dimensions in the objective function. The projection matrix is then applied to the feature vectors extracted from the pattern signal to generate transformed feature vectors for further processing in the pattern recognition system. For example, further processing may comprise classifying the transformed features associated with the input pattern signal. It may also include filtering, re-ranking or sorting the output of the classification operation.
In addition, we present a link between discrimination and the likelihood of the projected samples and show that HDA can be viewed as a constrained maximum likelihood (ML) projection for a full covariance gaussian model, the constraint being given by the maximization of the projected between-class scatter volume.
The present invention also provides that, under diagonal covariance gaussian modeling constraints, applying a diagonalizing linear transformation (e.g., MLLTxe2x80x94maximum likelihood linear transformation) to the HDA space results in an increased classification accuracy.
In another embodiment of the invention, the heteroscedastic discriminant objective function assumes that models associated with the function have diagonal covariances thereby resulting in a diagonal heteroscedastic discriminant objective function. This is referred to as diagonal heteroscedastic discriminant analysis (DHDA).
As will be explained below, the methodologies of the present invention are generally applicable to pattern recognition systems such as, for example, a speech recognition system. However, the invention may be applied to many other domains which employ pattern recognition, for example, any classification problem based on real-valued features. In addition to speech recognition, examples of such classification problems may include: handwriting recognition; optical character recognition (OCR); speaker identification; 2-dimensional (2D) or 3-dimensional (3D) object recognition in a 2D or 3D scene; forensic applications (e.g., fingerprints, face recognition); and security applications; just to name a few.