The present invention relates to speech recognition. In particular, the present invention relates to noise rejection in speech recognition.
In speech recognition systems, an input speech signal is converted into words that represent the verbal content of the speech signal. This conversion is complicated by many factors including interfering sounds, which are generically referred to as noise. Noise includes such things as the sounds made when the speaker clears their throat or smacks their lips. It also includes external sounds such as the sound of footsteps, the sound of someone knocking at a door, and the sound of a phone ringing.
Since most speech recognition systems work by matching sounds to the basic acoustic units of speech, for example senones or phonemes, many speech recognition systems will identify noise as one or more words. For instance, if a user types on a keyboard during speech recognition, the sound of the typing may be interpreted as the word xe2x80x9citsxe2x80x9d.
To avoid such false acceptance, some speech recognition systems add models of noise to the acoustic models used for speech recognition. These models rely on a noise entry found in a lexicon for the speech recognizer. For example, a model would be created for the sound associated with knocking on a door. Because the model relies on an entry in the lexicon, noises that are not in the lexicon cannot be identified as noise by these models and are usually identified as a word. Since there is a wide variety of noises, it is impossible to include all noises in the lexicon. As such, there are a large number of noises that are improperly recognized as words in prior art speech recognition systems.
A method and apparatus is provided for two-tier noise rejection in speech recognition. The method and apparatus convert an analog speech signal into a digital signal and extract features from the digital signal. Hypothesis speech words and hypothesis noise words are identified from extracted features in a first tier of noise rejection by modeling common noises as words in a lexicon. The features associated with the hypothesis speech words are examined in a second tier of noise rejection to determine if the features are more likely to represent noise than speech. The hypothesis speech words are replaced by a noise marker if the features are more likely to represent noise than speech.