Automatic speech recognition (ASR) technology interacts with human users by recognizing speech commands and responding with some action, such as providing users with information. ASR uses processor intensive evaluation of digitized voice signals to recognize human speech. For instance, ASR compares a digitized voice signal against a glossary, also known as a vocabulary, of expected responses and identifies the digitized voice signal as an expected response if a match is found with a great enough confidence. In order to improve the reliability of an ASR system, glossaries of expected responses are typically fine tuned to adapt as much as possible to variations in human voices and noise signals for a likely set of commands. ASR technology has steadily improved in terms of reliability and speed as processing capability and processing techniques have improved so that ASR technology is growing increasingly popular as a user friendly interface for businesses.
One application for ASR technology that is gaining wide acceptance is the use of voice recognition for providing services through a telephone network. Voice recognition offers a friendly alternative to touch tone services provided through DTMF signals and also reduces the cost otherwise associated with live operator support of customer inquiries. In particular, voice recognition based telephone services have grown increasingly popular in providing services through mobile devices such as wireless or cell phone networks because users are able to access information “hands off” making cell phone use safer, such as in driving conditions. As the quality of voice recognition applications has improved, an increasing number of services have become available ranging from obtaining driving directions, weather information, flight information and reservations and even stock quotes. For instance, Cingular wireless offers a variety of services supported by voice recognition through Cingular's VOICE CONNECT service.
When it works, voice recognition technology offers clear advantages for inputting requests to a telephone system compared with touch pad DTMF signaling and offers considerable cost advantages over the use of live operators. However, when voice recognition fails or performs unreliably, voice recognition introduces considerable user frustration. Thus, to improve reliability, voice recognition applications are typically tuned for a given set of expected commands and conditions. For instance, within a given service, separate glossaries of responses are often used to improve reliability by increasing the likelihood that a voice request will be recognized, with each glossary designed to address a set of commands. Further, glossaries are fine tuned periodically to adapt to changing conditions and respond to reliability problems. These fine tunings are in addition to changes implemented for menu items and additional services.
One significant difficulty with updating and improving the reliability of services supported by voice recognition is that changes and updates to voice recognition glossaries to support menu changes will have an effect on the service as a whole, for instance by altering recognition rates where glossaries are applied in different contexts. When voice recognition is deployed to a telephone service the overall impact of fine tuning of a glossary is difficult to predict for the application of the glossary in different contexts, such as in combination with other glossaries, especially when real live factors like noise and variations in voices are taken into account.