The present invention generally pertains to voice-activated command systems. More specifically, the present invention pertains to methods and apparatus for improving name confirmation in voice-dialing systems.
Voice-dialing systems typically use an introductory message recorded by a voice talent (a person hired for their voice) to greet caller, and to inquire from the caller who they would like to contact. The caller then speaks the name of the person he or she wishes to contact, and the voice-dialing system uses a speech recognition technique to identify or recognize the name of this intended recipient of the call. Typically, the voice-dialing system confirms the recognized name with the caller prior to connecting the call to the phone or voice mail associated with the recognized name.
Names with similar pronunciations, such as homonyms or even identically spelled names, present unique challenges to voice-dialing applications. These “name collisions” are problematic in voice-dialing, not only in speech recognition but also in name confirmation. In fact, some research has shown that name collision is one of the most confusing (for users) and error prone (for users and for voice-dialing systems) areas in the name confirmation process.
Many standard voice dialers rely on plain TTS (Text to Speech) to pronounce the recognized names during the process of confirming the name with the caller. Due to the lower sound quality and frequent mismatched pronunciations, it becomes a new performance bottleneck of such speech applications. Recently, some voice dialers have begun to use voice talents to record all the names used in the application to improve the quality of the prompts. Thus, recordings from the voice talent are used both to greet and prompt the caller, and to pronounce the recognized name during the name confirmation process. This approach adds a huge burden to the maintenance effort since names are frequently added to, or deleted from, voice-dialing systems. There are also increased costs associated with this additional burden. However, while adding a huge burden to the maintenance effort, this approach still can not eliminate the mismatched pronunciations.
The present invention provides solutions to one or more of the above-described problems and/or provides other advantages over the prior art.