Voice messaging systems (VMSs) have become well known in recent years. Such VMSs have been developed to implement various communications-related applications, among other things. In a typical application when a caller reaches a conventional VMS, a series of multilevel menus and prompts are often played to the caller. The menus and prompts invite the caller's responsive entry of a sequence of Dual-Tone Multi-Frequency (DTMF) tones, or touchtones, to navigate the various menu levels. The DTMF tones are generated by pressing buttons on the caller's telephone keypad. The conventional VMS is designed to receive and process the DTMF tones provided by the caller to implement desired voice messaging features. However, under certain circumstances, it may be inconvenient or even dangerous for a caller to focus their attention on a keypad. For example, in a wireless telephone environment where a caller is driving or walking while on the telephone, requiring the caller to select an option from a set of DTMF keys could result in an accident or difficult situation.
To address this problem, current VMSs provide for hand-free interaction with callers by utilizing speech recognition platforms, also referred to as voice response units, which interpret speech from the callers and provide the appropriate DTMF tones to the VMS. More specifically, as depicted in the prior art architecture shown in FIG. 1, a conventional speech recognition platform 20 recognizes and receives a caller's voice commands, which the caller could have alternatively entered through the provision of an appropriate sequence of DTMF tones. Upon receipt of a voice command, the speech recognition platform 20 generates an associated sequence of DTMF tones that corresponds to the voice command. This sequence is then provided to a VMS 24, as if the caller himself had provided the DTMF tones. In this way, the conventional speech recognition platform 20 simply imitates a caller's DTMF keypresses. The VMS 24 has no knowledge of the function that the speech recognition platform 20 performed. Rather, the VMS 24 simply detects the DTMF tones and reacts as if the caller is pressing keys.
As an example, assume that a subscriber to the VMS 24 dials into his account in the VMS 24 wanting to change the outgoing greeting played to persons trying to reach him. To do so without the use of the speech recognition platform 20, the subscriber must navigate a multilevel menu structure by providing DTMF tones at the appropriate time. In response to a host of menu options, depending on the particular design of the menu structure, the subscriber would, for example, first press “2” on the telephone keypad to access a “greetings and names” menu. Second, the caller would, for example, press “2” on the telephone keypad to select greeting options, instead of name options. Third, the subscriber would, for example, press “3” to indicate an intention to re-record the greeting.
However, where the speech recognition platform 20 is utilized in front of the VMS 24, the architecture provides for the use of voice commands by a caller. In such a case, the speech recognition platform 20 would first recognize and process the subscriber's voice command to change the greeting. Following the example above, this speech recognition platform 20 would then provide to the VMS 24 the sequence of DTMF tones that correspond to the depression of the “2,” “2,” and “3” keys. The DTMF tones would be provided in rapid succession. As a result, a menu prompted by a particular DTMF tone, and otherwise played in its entirety to the subscriber, would be cut short by the provision of the next DTMF tone. In this regard, a series of aborted audio feedback would be played to the subscriber, presenting a nonintegrated “look and feel” to the subscriber.
In other cases, some VMSs that provide speech-based interaction simply implement a speech user interface having an identical or essentially identical menu hierarchy as a conventional DTMF user interface. Systems that implement a speech user interface in this manner are undesirable because they fail to reduce voice messaging system interaction complexity.
Therefore, in light of the above problems, there is a need for a new system architecture that reduces voice messaging system interaction complexity; presents an integrated “look and feel” appearance to a caller or subscriber; dispenses with the need to generate DTMF tones in response to voice commands; and does not use existing DTMF keypad-based platforms for voice messaging.