The present invention generally pertains to voice-activated command systems. More specifically, the present invention pertains to methods for improving the accuracy of voice-dialing applications through processing of homonyms.
Homonyms pose unique challenges to voice-dialing applications; even beyond speech recognition accuracy problems. In many instances, known applications treat two names as collisions only if the spelling of the names is identical. Therefore, even with perfect speech recognition, it is not uncommon for known systems to ask a caller to make a selection from a plurality of terms having identical pronunciations but different spellings. Since the caller cannot “see” spelling differences over the phone, it becomes easy to understand why homonyms are prone to being a source of confusion and incorrect call transfers.
An example will help to further define the nature of challenges posed by homonyms to voice-dialing systems. For the purpose of illustration, it will be assumed that “craig” and “kraig” are pronounced the same. Under these circumstances, in the context of many voice-dialing systems, a caller will be presented with a voice prompt in the nature of “Are you looking for Craig or Kraig”. Because the caller is essentially blind to the difference in spelling, there is a fifty percent chance that a caller seeking a connection to “kraig” will be connected to “craig”, and vice versa. As the number of homonyms within a system increases, there are corresponding decreases in system connection accuracy and consistency.
Some voice-dialing solutions are configured to empower a caller to somehow distinguish between names having a common pronunciation utilizing an identifier other than spelling. For example, a caller might ask for “Mike Andersen”. The system might include one listing for “Mike Andersen” and two listings for “Mike Anderson”. Presented with this homonym scenario, known systems generally are not equipped to accurately determine which listing the caller desires. Some systems are configured to present additional identifying information in order to empower the caller to make an informed selection decision. For example, the system might pose a selection inquiry to the caller such as “Are you looking for Mike Anderson in building 6, Mike Anderson in building 7, or Mike Anderson in building 12?”. Despite being ignorant of any differences in the spelling of Anderson, the caller can make a selection based on an alternate criteria (i.e., building location). In many cases, the caller will be more familiar with spelling differences than with a given set of additional identifying information.