1. Field of the Invention
This invention relates to a method for network-based speech recognition of subscriber (or “user”) voice-commands for invoking call information and management features and text-to-speech translation of call information and call management features.
2. Description of the Related Art
Real-time systems with telephony interfaces, including telephony and computer systems, offer a large variety of useful network-based features, such as Caller-ID, conferencing (call merge), call forwarding, call hold and messaging. However, these features must generally be accessed with some difficulty in a real-time interactive environment. Often, users cannot effectively access certain features, at least in part because such access requires knowledge of subject-specific details with which the user may be unfamiliar. Although the user can learn some subset of the features set and use them effectively with cues and practice, if the user does not need to use a particular system for some time, it is likely that his or her ability to use the system and understand the features will diminish. Users may also be unable to access certain features because the access device has a limited set of features, such as a small display on a cell phone handset.
While in operation, a system can be in one of many different “states” at which services or features are available. An example of such a system state is a state in which a Call Waiting call arrives and a caller-ID is to be displayed. The system transitions from a Call in Progress” state to a “Caller ID on Call Waiting” state at which point the subscriber has several options. Another example is when a subscriber calls someone and the called line rings busy. The system enters a state of “Busy” for at caller and an option is available to have the network feature continually re-try (redial) the called party until there is a “Ringing” system state. When the called party picks up, another system state is entered. If the called party does not answer after a predefined number of rings, then the system state changes to a “Ring-No-Answer” state and other features are available to the caller at this latter state, such as “Leave a Message”, “Continue Trying the Number for 24 hours”, etc.
A call flow is a pathway of steps that a call follows from the time that the call is initiated until termination of the call. Each step in the call flow may also be considered a different system state. The call flow may be controlled by the user to the extent that the user determines whether to initiate some calls, stay on the line, select features, answer a call, or subscribe to messaging services. Other types of system states include states wherein the caller communicates with the system or causes the system to communicate with another system, such as another network.
To remind users of features available at a particular point in a call flow or some other system state, specialized equipment is often used to display which features are available in the current state of a call or communication transaction. Computer and telephony systems, for example, require that users learn to interface with the systems using specialized devices, such as keypads, keyboards, mice, and trackballs, and special or reserved procedures which may appear in the form of an interaction on a computer screen or in a voice response system. Another limitation on feature accessibility is that the telephone keypad, keyboard, and mouse do not provide wide bandwidth for input to a system. In a real-time transaction environment, this constraint reduces the number of sophisticated features that may be made available in a telephony session or transaction dialog.
Some feature sets attempt to offer simplified interfaces by utilizing visual cues and mnemonic devices. An enhanced version of the Caller-ID feature, Caller-ID on Call Waiting, represents one attempt to provide a simplified interface with visual cues. Ordinary Caller-ID is provided using specialized equipment, such as an adjunct display device or a telephone with an integral display and special protocols. Currently available Caller-ID class 2 services, such as Caller-ID on Call Waiting, however, require more specialized equipment, such as an Analog Display Service Interface (ADSI) screen phone. There is an automated communication sequence between the service provider switch and the premise equipment that allows a user who receives Caller-ID information or originating system to utilize that information to make decisions as to how to handle (“manage”) the incoming call based on the Caller-ID or originating station information. For example, using one feature call flow, when a person is already on the phone and another call comes in, the person already on the phone will now who is calling from the displayed Caller-ID information and can decide from a displayed menu whether to play a message and put the person on hold, conference the call with the current call, drop the current call and take the new call, send the call to voice mail, forward the call, or take other actions. But if one has only an ordinary non-ADSI phone, these actions must currently be entered using Star Features, such as *82, which are difficult to remember.
The specialized ADSI device displays in text form a full list of options which can be used to respond to the Caller-ID information. The subscriber can then select a desired option using the keypad which generates a DTMF (dual tone multi-frequency) signal understood by the service provider switch, or using soft keys on the ADSI screen phone which correspond to functional options displayed to the called party. Caller-ID information is displayed on a screen in either case.
The specialized ADSI equipment is expensive and its functionality is only available at the location of that phone. When a subscriber uses a different phone, he or she cannot access these features. Even in one household, only those extensions with the specialized phones will be able to use the enhanced feature set. Moreover, subscribers who are visually impaired may not be able to use the display devices at all.
There accordingly exists a need for network-based speech recognition. It would also be particularly helpful to combine the network-based speech recognition with a network-based text-to-speech translator of call state or progress information and available call management features. This would enable network service providers to offer a wide variety of features to mobile phone/web users by “translating” features available on a network to an audio format recognizable to the device upon which the audio is to be played, such as a sound or wave file, to which a user could respond with a voice command upon which speech recognition is performed. (The device-specific audio capabilities may be referred to as the device's audio form factor.)