1. Technical Field
The present invention relates generally to feature spaces and, in particular, to regularized feature space discrimination adaptation.
2. Description of the Related Art
A state of the art automatic speech recognition (ASR) system is usually trained with more than a few hundred speakers in a target domain to provide robustness. Since ASR performance is highly dependent on the acoustic environment in the target domain, an acoustic model (AM) in the system should ideally be built with a large amount of target domain data. However, creating a large speech corpus for each ASR application involves enormous costs, and constructing the AM for the application from scratch takes time. Therefore, AM adaptation techniques are often used to convert a deployed system into a target domain AM with a small amount of target domain data.
Typical adaptation techniques such as maximum likelihood linear regression (MLLR) and maximum a posteriori (MAP) adapt acoustic model parameters. However, the frontend pipeline in a modern ASR system includes a discriminative feature space transform which is statistically trained with a large speech corpus to map cepstrum-based or linear discriminant analysis (LDA) features into a canonicalized (discriminative) feature space. This means that the transform depends on the acoustic conditions of the training data and should also be adapted to target a domain such as acoustic model parameters.