For years, the de facto standard in acoustic modeling has been hidden Markov models (HMMs) with state-dependent Gaussian mixture models (GMMs) for expressing the distributions of the acoustic feature vectors within each state. Traditionally, the estimation of the GMM parameters (means, variances, mixture weights) is performed with maximum likelihood via the expectation-maximization (EM) algorithm. There have been advances in the estimation of GMM-HMMs through the advent of discriminative training techniques such as maximum mutual information and minimum phone error (MPE) training. Discriminative training can be carried out either in model space or in feature space as in feature minimum phone error (fMPE), where the goal is to estimate a transform that maps high-dimensional vectors of Gaussian posteriors to time-dependent offsets which are added to the regular acoustic feature vectors. The projection is trained to enhance the discrimination between correct and incorrect word sequences.
The prominence of GMM-HMMs in acoustic modeling has led to an entire ecosystem of front-end processing and speaker-adaptation techniques specifically tailored to maximize the recognition performance under this model. Linear transforms such as the semi-tied covariance (STC) transform and maximum likelihood linear regression (MLLR) are examples of such techniques that were developed in the context of diagonal-covariance GMMs.
Additionally, transformations of feature spaces are common methods to improve pattern recognition accuracy. Examples of features include the following:                in textual processing: term frequency-inverse document frequency (TF-IDF), likelihood scores associated with textual units, etc.;        acoustic features in speech recognition tasks; and        color, font, layout, symbols and concepts in image interpretation tasks.        
Typically, transformations of feature spaces are carried out as follows. There are features that represent test data X, and there are “meta-features” that represent training data. Features that represent test data are then moved toward “meta-features” that represent features of training data.
By way of example, assume that there is some map of features:F(A,V1,V2):{x1,x2, . . . xT}→{y1,y2, . . . yT},where Vi are subsets of linear spaces, and A is a linear map from a subset V1 to V2. This map A is found via maximization over A for some objective function G over data {x1, x2, . . . xT}, the transform F(A, V1, V2), and model parameters θ. Also, consider
      max    A    ⁢            G      ⁡              (                              F            ⁡                          (                              A                ,                                  V                  1                                ,                                  V                  2                                            )                                ,                      {                                          x                1                            ,                              x                2                            ,                              …                ⁢                                                                  ⁢                                  x                  T                                                      }                    ,          θ                )              .  
The objective function can be a likelihood function or discriminative function (for example, fMPE). Examples can include:                Maximum Linear Likelihood Regression (MLLR). In MLLR:F(A,V1,V2):{x1,x2, . . . xT}→{y1,y2, . . . yT}={Ax1,Ax2, . . . AxT}.        In fMPE:F(A,V1,V2):xi→yi=xi+A{gi−r,gi−r+1, . . . gi+r}.        
In the context of fMPE, this means that there are features, a set of Gaussians that represent training data (“meta-features” for training data), and the process of creating posteriors over test features. Then, these posteriors can be projected back to test features (as an offset that is added to test features). The above transformations involve linear projections of high-dimensional vectors of Gaussian posteriors to time-dependent offsets that are added to regular feature vectors. Other examples of transformations of feature data are linear transforms such as semi-tied covariance transform and maximum likelihood regression. However, given these existing approaches, challenges still exist in integration of methodologies and higher pattern recognition accuracy.