Recently as the processing power of portable electronic devices has increased there has been an increased interest in adding speech recognition capabilities to such devices. Wireless telephones that are capable of operating under the control of voice commands have been introduced into the market. Speech recognition has the potential to decrease the effort and attention required of users operating wireless phones. This is especially advantageous for users that are frequently engaged in other critical activities (e.g., driving) while operating their wireless phones.
The most widely used algorithms for performing automated speech recognition (ASR) are based on Hidden Markov Models (HMM). In a HMM ASR speech is modeled as a sequence of states. These states are assumed to be hidden and only output based on the states, i.e. speech is observed. According to the model, transitions between these states are governed by a matrix of transition probabilities. For each state there is an output function, specifically a probability density function that determines an a posteriori probability that the HMM was in the state, given measured features of an acoustic signal. The matrix of transition probabilities, and parameters of the output functions are determined during a training procedure which involves feeding known words, and or sentences into the HMM ASR and fine tuning the transition probabilities and output function parameters to achieve optimized recognition performance.
In order to accommodate the variety of accents and other variations in the way words are pronounced, spoken messages to be identified using a HMM ASR system are processed in such a manner as to extract feature vectors that characterize successive periods of the spoken message.
In performing ASR a most likely sequence of the states of the HMM is determined in view of the transition probability for each transition in the sequence, the extracted feature vectors, and the a posteriori probabilities associated with the states.
Background noise, which predominates during pauses in speech, is also modeled by one or more states of the HMM model so that the ASR will properly identify pauses and not try to construe background noise as speech.
One problem for ASR systems, particularly those used in portable devices, is that the characteristics of the background noise in the environment of the ASR system is not fixed. If an ASR system is trained in an acoustic environment where there is no background noise, or in an acoustic environment with one particular type of background noise, the system will be prone to making errors when operated in an environment with background noise of different type. Different background noise that is unfamiliar to the ASR system may be construed as parts of speech.
What is needed is a ASR system that can achieve high rates of speech recognition when operated in environments with different types of background noise.
What is needed is a ASR system that can adapt to different types of background noise.