Modern speech recognition systems use cepstral features characterizing the short-term spectrum of the speech signal for classifying frames into phonetic classes. Cepstral features are features that are typically obtained through an orthogonal transformation (such as a discrete cosine transform) of short-term spectral features. These cepstral features are augmented with dynamic information from the adjacent frames to capture transient spectral events in the signal. What is commonly referred to as MFCC+Δ+ΔΔ features include “static” mel-frequency cepstral coefficients (usually 13) plus their first and second order derivatives computed over a sliding window of typically 9 consecutive frames yielding 39-dimensional feature vectors every IOms. One major drawback of this front-end scheme is that the same computation is performed regardless of the application, channel conditions, speaker variability, etc. In recent years, an alternative feature extraction procedure based on discriminant techniques has emerged, wherein the consecutive cepstral frames are spliced together forming a supervector which is then projected down to a manageable dimension. One of the better known objective functions for designing the feature space projection is linear discriminant analysis (LDA).
LDA, as discussed in Duda et al., “Pattern classification and scene analysis” (Wiley, New York, 1973) and Fukunaga, “Introduction to statistical pattern recognition” (Academic Press, New York, 1973), is a standard technique in statistical pattern classification for dimensionality reduction with a minimal loss in discrimination. Its application to speech recognition has shown consistent gains for small vocabulary tasks and mixed results for large vocabulary applications (see Haeb-Umbach et al., “Linear Discriminant Analysis for improved large vocabulary continuous speech recognition”, Proceedings of ICASSP '92, and Kumar et al., “Heteroscedastic discriminant analysis and reduced rank HMM's (Hidden Markov Models) for improved speech recognition”, Speech Communication, 26:283-297, 1998). Recently, there has been an interest in extending LDA to heteroscedastic discriminant analysis (HDA) by incorporating the individual class covariances in the objective function (see Kumar et al., supra, and Saon et al., “Maximum likelihood discriminant feature spaces”, Proceedings of ICASSP '2000, Istanbul, 2000). Indeed, the equal class covariance assumption made by LDA does not always hold true in practice making the LDA solution highly suboptimal for specific cases (see Saon et al., supra).
However, since both LDA and HDA are heuristics, they do not guarantee an optimal projection in the sense of a minimum Bayes classification error (i.e., a minimum probability of misclassification). A need has thus been recognized in connection with selecting features on the basis of a minimum probability of misclassification.