Speech recognition systems commonly use Hidden Markov Models (HMMs) to automatically recognize speech (e.g., by decoding the linguistic content of the speech). In a training phase, this type of speech recognition system estimates the parameters which define the HMMs; it performs this task by training the HMMs based on a set of speech signals labeled with linguistic content items (e.g., words or other forms). In a recognition phase, the speech recognition system uses the trained HMMs to recognize new speech signals.
One type of HMM is referred to as a Gaussian mixture HMM. This type of HMM uses a combination of Gaussian components (referred to as “mixture components” herein) to model the distribution of speech and noise within the environment for each state. Different environments are characterized by noise having different characteristics. Thus, to provide more robust speech recognition, Gaussian mixture HMMs can be trained using a training set that accounts for speech within different types of environments.
In the classic case, the Gaussian mixture components used by Gaussian mixture HMMs are fixed, meaning that these components do not vary as a function of the characteristics of the environment. In a more recently proposed case, the Gaussian mixture components can vary as a function of a measurable characteristic of the environment, such as the signal-to-noise ratio (SNR). For example, each Gaussian mixture component can include a mean component μ and variance component Σ, each of which varies as a function of SNR. This variable type of HMM model is referred to herein as a variable-parameter hidden Markov model (VPHMM). VPHMMs potentially offer better performance than fixed-component HMMs because the VPHMMs adapt to different types of environments, as opposed to using a single set of parameters to account for every circumstance that may be encountered within an environment.
While VPHMMs have been shown to be effective, there remains room for further improvement in this technology. Particularly, one known version of VPHMM technology uses global polynomial functions to approximate the way in which Gaussian parameters (μ's and Σ's) vary as a function of utterance SNR. There may be various inefficiencies and limitations associated with this approach.