Speech recognition telephony applications such as voice response systems (VRS) often ask a caller to input alphanumeric information. For example, a VRS used by a company's personnel department might ask callers to enter employee identification numbers. Likewise, retail customers might place orders by entering part numbers. One way for a VRS to accept such entries is to use a speech recognition system. Speech recognition is especially helpful to callers who use limited-function terminals such as cellular telephones, because data entry using such terminals is especially awkward.
Unfortunately, speech recognition is an imperfect art, and a speech recognition system is often able to provide only a best estimate of a caller's intended alphanumeric entry. This is increasingly the case when callers use cellular telephones. Cellular telephones are inherently low-fidelity devices, as they use low-bit-rate speech coders in order to minimize the per-call need for radio-frequency spectrum and its associated cost. Further, the call may originate from a location with a high level of background noise, for example from a moving automobile or from a construction site. Background noise and speech-coder distortion cause the performance of the speech recognition system to degrade.
When the performance of the speech recognition system degrades, and it is unable to recognize spoken input with adequate confidence, the VRS may request further information from the caller. For example, the VRS might ask the caller to repeat an entire alphanumeric entry, or to repeat selected characters of the entry. Of course, repetition does not ensure success, and in difficult situations the VRS may go back to the caller numerous times to ask for help. Alternatively, the VRS might present a list of possibilities, from which the caller is instructed to choose his or her intended entry.
Unfortunately, all of these measures to increase speech-recognition confidence are typically very inconvenient for the caller, especially when the caller uses a cellular telephone that has limited display and input capabilities. Thus, there is a need for a way of minimizing requests to the caller when a speech recognition telephony application such as a voice response system is unable to recognize a spoken alphanumeric input with adequate confidence.