1. Technical Field
This invention relates to the field of speech recognition, and more particularly, to determining the accuracy of a speech recognition system.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by microphone is converted to a set of text words, numbers, or symbols by a computer or a microprocessor based device. These recognized words may be used in a variety of computer-based software applications for purposes such as document preparation, data entry, and command and control. Improvements to speech recognition systems provide an important way to enhance user productivity.
Speech recognition systems can model and classify acoustic signals to form acoustic models, which are representations of basic linguistic units referred to as phonemes. Upon receiving and digitizing an acoustic speech signal, the speech recognition system can analyze the digitized speech signal, identify a series of acoustic models corresponding to the speech signal, and derive a list of potential word candidates based upon the identified series of acoustic models. The speech recognition system also can determine a measurement reflecting the degree to which the potential word candidates phonetically match the digitized speech signal.
Speech recognition systems further can analyze the potential word candidates with reference to a contextual model. This analysis can determine a probability that one of the word candidates accurately reflects received speech based upon previously recognized words. The speech recognition system can factor subsequently received words into the probability determination as well. The contextual model, often referred to as a language model, can be developed through an analysis of many hours of human speech. Typically, the development of a language model can be domain specific. For example, a language model can be built reflecting language usage within a telephony context, a legal context, a medical context, or for a general user context.
The accuracy of speech recognition systems can be dependent on a number of factors. One such factor can be the audio environment in which speech is detected. The audio environment can significantly affect the resulting quality of the speech audio signal. User speech obtained from high noise environments, for example automobiles or public places such as where one communicates with a speech recognition system using a public telephone, can include a significant amount of environmental noise. This can lead to poor speech recognition. Further, telephony systems often utilize low quality audio signals to represent speech. The use of low quality audio signals within a voice processing system can exacerbate the aforementioned problem as a low quality audio channel may result in noise which overpowers the user speech.
Another factor that can significantly affect the accuracy of a speech recognition system is the configuration of the speech recognition system itself. System configuration can be particularly relevant with regard to speech recognition systems which operate in diverse audio environments and/or audio environments having a significant amount of noise.