Designing appropriate feature representations for speech recognition has been an active area of research for many years. For example, in large vocabulary continuous speech recognition systems, huge gains in performance are observed by using speaker-adapted and discriminatively trained-features, learned via objective functions such as feature-space maximum likelihood linear regression (fMLLR), and feature-space boosted maximum mutual information (fBMMI). In addition, designing appropriate classifiers given these features has been another active area of research, where popular modeling approaches include Deep Neural Networks (DNNs) or Gaussian Mixture Models which have been discriminatively trained.