1.1 Field of the Invention
The present invention relates to systems and methods for both voice messaging and speech recognition. More particularly, the present invention is a voice messaging system and method responsive to speech commands issued by a voice messaging subscriber.
1.2 Description of the Background Art
Voice messaging systems have become well-known in recent years. A typical Voice Messaging System (VMS) interacts with a subscriber through a Dual-Tone Multi-Frequency (DTMF), or touchtone, voice messaging User Interface (UI). During subscriber interactions, the VMS issues a voice prompt requesting the subscriber to press one or more DTMF keys to initiate corresponding operations. In the event that the subscriber presses a valid DTMF key sequence, the VMS performs a particular set of operations.
Under certain circumstances, it may be inconvenient or even dangerous for a subscriber to focus their attention on a keypad. For example, in a wireless telephone environment where a subscriber is driving or walking while on the telephone, requiring the subscriber to select an option from a set of DTMF keys could result in an accident or difficult situation. As a result, systems and methods have been developed for using speech as a means for providing hands-free interaction with a VMS, through speech-based selection of commands, user interface navigation, and entry of digits and/or digit strings.
Those skilled in the art will recognize that a conventional DTMF voice messaging UI usually has a fairly complex or extensive hierarchy of menus. Some systems that provide speech-based VMS interaction simply implement a speech UI having an identical or essentially identical menu hierarchy as a conventional DTMF UI. When a subscriber must concurrently perform multiple tasks, such as driving and VMS interaction, reducing the complexity of lower-priority tasks is very important. Thus, systems that implement a speech UI in this manner are undesirable because they fail to reduce VMS interaction complexity.
Those skilled in the art will recognize that speech recognition is an inexact technology. In contrast to DTMF signals, speech is uncontrolled and highly variable. The difficulty of recognizing speech in telephone environments is increased because telephone environments are characterized by narrow bandwidth, multiple stages of signal processing or transformation, and considerable noise levels. Wireless telephone environments in particular tend to be noisy due to high levels of background sound arising from, for example, a car engine, nearby traffic, or voices within a crowd.
To facilitate the successful determination of a subscriber""s intentions, speech-based voice messaging systems must provide a high level of error prevention and tolerance, and significantly reduce the likelihood of initiating an unintended operation. Speech-based voice messaging systems should also provide a way for subscribers to successfully complete a set of desired voice messaging tasks in the event that repeated speech recognition failures are likely. Prior art speech-based voice messaging systems are inadequate in each of these respects.
The difficulties associated with successfully recognizing subscribers"" speech and determining their intentions necessitates a high level of support and maintenance to achieve optimal system performance. The availability of particular speech recognition data and system performance measures can be very useful in this regard, especially for system testing and problem analysis. Prior art systems and methods fail to provide an adequate means for flexibly controlling when and how speech recognition data and system performance measures are stored and/or generated. Moreover, prior art systems and methods fail to collect maximally useful speech recognition data, namely, the speech data generated during actual in-field system use. What is needed is a speech-responsive voice messaging system and method that overcomes the shortcomings in the prior art.
The present invention is a system and method for speech-responsive voice messaging, in which a Speech-Responsive VMS (SRVMS) preferably provides a hierarchically-simple speech UI that enables subscribers to specify mailboxes, passwords, digits, and/or digit strings. In the SRVMS, a recognition command generator and a speech and logging supervisor control the operation of a speech recognizer. A recognition results processor evaluates the quality of candidate results generated by the speech recognizer according to a set of quality thresholds that may differ on a word-by-word basis. In the preferred embodiment, the recognition results processor determines whether individual candidate results are good, questionable, or bad; and whether two or more candidate results are ambiguous due to a significant likelihood that each such result could be a valid command. The recognition results processor additionally identifies a best candidate result.
Based upon the outcome of a quality evaluation, an interpreter facilitates navigation through speech UI menus or invocation of voice messaging functions, in conjunction with a speech UI structure, a voice messaging function library, and the recognition command generator. If the recognition results processor has determined that candidate results are questionable or ambiguous, the interpreter, in conjunction with an ambiguity resolution UI structure and the recognition command generator, initiates confirmation operations in which the subscriber is prompted to confirm whether the best candidate result is what the subscriber intended.
In response to repeated speech recognition failures, the interpreter initiates a transfer to a DTMF UI, in conjunction with a DTMF UI structure and the voice messaging function library. Transfer to the DTMF UI is also performed in response to detection of predetermined DTMF signals issued by the subscriber while the speech UI is in context. The present invention therefore provides for both automatic and subscriber-selected transfer to a reliable backup UI.
If a best candidate result corresponds to a voice messaging function, the interpreter directs the mapping of the best candidate result to a digit sequence, and subsequently transfers control to a voice messaging function to which the digit sequence corresponds. Because the present invention provides both a speech and a DTMF UI, the mapping of candidate results allows the speech UI to seamlessly overlay portions of a standard DTMF UI, and utilize functions originally written for the DTMF UI. The present invention also relies upon this mapping to facilitate simultaneous availability of portions of the speech UI and DTMF UI while remaining within the context of the speech UI. Thus, while at particular positions or locations within the speech UI, the present invention can successfully process either speech or DTMF signals as valid input for speech UI navigation.
The SRVMS thus provides a high level of error tolerance and error prevention to successfully determine a subscriber""s intentions, and further provides access to a DTMF UI in parallel with portions of the speech UI or as a backup in situations where repeated speech recognition failure is likely.
A logging unit and a reporting unit operate in parallel with the speech UI, in a manner that is transparent to subscribers. The logging unit directs the selective logging of subscriber utterances, and the reporting unit selectively generates and maintains system performance statistics on multiple detail levels.
The present invention flexibly controls speech recognition, candidate result quality evaluation, utterance logging, and performance reporting through a plurality of parameters stored within a Speech Parameter Block (SPAB). Each SPAB preferably corresponds to a particular speech UI menu.