ASR technologies enable microphone-equipped computing devices to interpret speech and thereby provide an alternative to conventional human-to-computer input devices such as keyboards or telephone keypads. For example, many telecommunications devices are equipped with hands-free voice dialing features to initiate a telecommunication session. Such voice dialing features are enabled by ASR technology to detect the presence of discrete speech such as a command like CALL, and nametags like HOME or OFFICE. Moreover, a user may use ASR-enabled voice dialing to initiate a telephone call by speaking a command like DIAL followed by a plurality of digits constituting a complete a telephone number.
But with such discrete digit dialing, ASR systems typically repeat every single digit immediately after recognizing the user's utterance of each digit. Although this approach may be reliable in a high noise environment, it requires a significant amount of time to enter a lengthy string of digits. Moreover, this single digit verification process annoys users when a digit utterance is incorrectly recognized. This is because users may forget a subsequent digit to be uttered when stopping to say a command like CLEAR, repeating the misrecognized digit, and then listening to a system verification before resuming with the rest of the digits.
To address this inconvenience, some ASR-enabled voice dialing systems allow a user to initiate a call by speaking lengthy telephone numbers in predefined groups of multiple-digit strings of any length, one at a time with pauses therebetween. For example, a user can dial the telephone number 1-313-667-8888 by uttering ONE-THREE-ONE-THREE <pause> <wait for verification and correct if necessary> SIX-SIX-SEVEN<pause> <wait for verification and correct if necessary> EIGHT-EIGHT-EIGHT-EIGHT (or EIGHTY-EIGHT EIGHTY-EIGHT).
However, these variable length dialing schemes can also have problems. For example, these voice dialing systems normally require users to utter only one digit string at a time and require the user to correct that one string before uttering any subsequent strings. In other words, such systems do not allow a user to speak a telephone number in a customary, natural manner. For example, if a user utters multiple digit strings including a first correctly recognized string, then an incorrectly recognized string, and a subsequent correctly recognized string, the user would have to clear the entire number recognized thus far and start all over from the beginning. In fact, such systems do not allow a user to enter multiple strings and then back up and skip over a correctly recognized string to correct an incorrectly recognized string.