Automatic speech recognition (ASR) technologies enable microphone-equipped computing devices to interpret speech and thereby provide an alternative to conventional human-to-computer input devices such as keyboards or keypads. One application of ASR includes telecommunication devices equipped with voice dialing functionality to initiate telecommunication sessions. An ASR system detects the presence of discrete speech, like spoken commands, nametags, and numbers, and is programmed with predefined acceptable vocabulary that the system expects to hear from a user at any given time, known as in-vocabulary speech. For example, during voice dialing, the ASR system may expect to hear command vocabulary (e.g. Call, Dial, Cancel, Help, Repeat, Go Back, and Goodbye), nametag vocabulary (e.g. Home, School, and Office), and digit or number vocabulary (e.g. Zero-Nine, Pound, Star).
One general problem encountered with voice dialing is that ASR-enabled devices sometimes misrecognize a user's intended input speech, according to rejection, insertion, and/or substitution errors. A rejection error occurs when the ASR system fails to interpret a user's intended input utterance. An insertion error occurs when the ASR system interprets unintentional input, such as background noise or a user cough, as an intended user input utterance. A substitution error occurs when the ASR system mistakenly interprets a user's intended input utterance for a different input utterance.
A substitution error is usually due to confusability between similar sounding words. In a general example, when a user tries to store a nametag that sounds exactly like an already-stored nametag, number, or command, then the ASR system will have difficulty processing the nametag because it is not sufficiently unique. In more specific example, a substitution error sometimes occurs where a nametag is misinterpreted as one or more digits to be dialed, for instance, where a user has defined a nametag to include a number. As a result, the ASR system may process the incorrect word, or may repetitively ask the user to repeat the nametag. In either case, the user can become frustrated.
Existing solutions to this problem are not optimal. One solution is to prompt the user to not store nametags having numbers in them. This solution is flawed because some nametags sound like numbers even though the nametags do not include numbers. Another solution is to separate voice dialing dialogs into a “Call”<nametag> dialog and a “Dial”<digits> dialog. This solution is imperfect because it reduces user flexibility of entering numbers or nametags in a conjoined command dialog (e.g. Call/Dial).
Another solution is nametag confusability detection, which is carried out using both stored nametags and supported ASR commands simultaneously. For example, a user utters a command word like “Store” at a main menu of an ASR system, the system prompts for a telephone number to be stored in association with a nametag, and the user responds by uttering the digits comprising the number to be stored. Then, the system prompts for a nametag, and the user responds by uttering the particular nametag. Thereafter, the system calculates a confusability score for the uttered nametag by comparing the uttered nametag with all previously stored nametags and commands combined, at the same time, and sequentially nametag-by-nametag and command-by-command. If the confusability calculation is too high, the system prompts the user to use a different nametag. But this solution does not account for nametag confusability with digits, which, for a sixteen-digit string, could include trillions of different numbers and, thus, trillions of resource intensive computations as the nametag is compared sequentially number-by-number through all 0-16 digit numbers. Such confusability checks are resource prohibitive and, thus, those of ordinary skill in the art are discouraged from carrying out nametag confusability checks with numbers.