1. Field of the Invention
The invention is in the field of automatic speech recognition, and more particularly to the accuracy and perceived responsiveness of automatic speech recognition systems.
2. Related Art
The ability for a user to control or interoperate with electronic machinery with the use of one's voice has been a longstanding objective. To this end, automatic speech recognition (ASR) systems, which convert digitized audio samples of the human voice into recognized text, have been developed. Despite the development of such systems, they are still not widely used as it is felt by many that their accuracy and responsiveness is inadequate.
Speech recognition accuracy is affected by an ASR system's ability to capture clear, complete and noise-free speech from a user. In general, ASR systems will not work well if the audio input is in some way defective or corrupted. Such defects or corruptions include: i) speech that is too soft (and therefore poorly transduced by a microphone), ii) speech that is too loud (and therefore subject to clipping or other non-linear distortions within the audio capture system), iii) lacking the start of an utterance (e.g., a user beginning to speak before pressing a push-to-talk button in an ASR system that requires such user interaction to start the audio capture), iv) lacking the end of an utterance (e.g., a user continuing to speak after releasing a push-to-talk button), and v) intrusion of either environmental or channel noise.
The responsiveness of an ASR system is likewise important because users can be impatient and they will not likely tolerate a system that they regard as sluggish. The metaphor for interaction with an ASR system is conversational, and users are conditioned by human conversation to expect a response within a few seconds of when a spoken command has been given. Unfortunately, ASR systems are not always able to respond this quickly, which leads to user dissatisfaction and abandonment of the product, application or service employing the ASR system.
What is needed, therefore, is an improvement in the accuracy and perceived responsiveness of ASR systems.