The present invention relates generally to digital telephones and telephone systems, such as private branch exchange (PBX) systems. More particularly the invention relates to a multimodal telephone that provides both voice and touchpad control through an integrated system employing speech recognition and speech generation together with optical display such as an LCD panel. The user communicates with the telephone to perform voice dialing and other system control functions by interacting with the integrated dialog manager that ensures the voice mode and visual/touchpad mode remain synchronized.
The telephone has evolved quite considerably since Alexander Graham Bell. Today, complex telephone stations connect to sophisticated switching systems to perform a wide range of different telecommunication functions. Indeed, the modern-day telephone device has become so sophisticated that the casual user needs an instruction manual to be able to operate it. The typical modern-day telephone device features a panoply of different function buttons, including a button to place a conference call, a button to place a party on hold, a button to flash the receiver, a button to select different outside lines or extensions and buttons that can be programmed to automatically dial different frequently called numbers. Clearly, there is a practical limit to the number of buttons that may be included on the telephone device, and that limit is rapidly being approached.
It has been suggested that voice operated telephones may provide the answer. With a sufficiently robust speech recognizer, the telephone could, in theory, be controlled entirely by voice. It is doubtful that such a device could be successfully achieved using today's technology; simply incorporating speech recognition into the telephone would not result in a device that is easy to use.
Anyone who has been caught in the endless loop of a voice mail system will understand why voice control of the telephone is a significant challenge. It is difficult to offer the telephone user a wide assortment of control functions and operations when those options are prompted by speech synthesis and must be responded to by voice. The user typically has difficulty remembering all of the different choices that are possible and difficulty remembering what the precise commands are to invoke those operations. Also, speech recognizers will occasionally misinterpret a user's command, resulting in the need to abort the command or enter it again. If the user's speech differs significantly from the model on which the recognizer has been trained, the recognizer may also fail to recognize the abort command. When this happens the system may execute an unwanted command, causing user frustration and inconvenience.
The problem is compounded when voice dialing is desired, because voice dialing significantly increases the size of the dictionary of words that must be recognized. Essentially, every new name that is added to the phone directory becomes another word that must be properly interpreted by the recognizer.
The present invention solves the problem with a new approach that integrates voice prompts, visual prompts, spoken commands and push button commands so that the user always has a choice. The telephone includes a dialog manager that monitors the user's spoken commands and push button commands, maintaining both modes in synchronism at all times. The result is a natural, easy-to-use system that does not require an extensive user's manual. The dialog manager displays the commands that are possible, which the user can select by pressing the soft key buttons on the keypad adjacent the visual display or by speaking the commands into the handset. The soft key buttons are push buttons whose function changes according to the state of the dialog. The current function of the soft key button is indicated on the visual display adjacent the button. As the user is first learning the system the visual display provides convenient prompts so that the user will always know what commands are possible at any given time. As the user begins to learn these commands he or she may choose to simply enter them by speaking into the handset, without even looking at the visual display. Of course, even the experienced user may occasionally choose to use the soft key push buttons--when the user cannot use the spoken commands or when entering an abort command to cancel an earlier command that was misinterpreted by the recognizer.
The preferred embodiment of the telephone system is implemented in a modular way, with the voice recognition and synthesis functions as well as the dialog manager being disposed on a circuit card that plugs into a separate card supporting the touchpad, soft keys and visual display functions. The preferred architecture allows the telephone to be manufactured either with or without voice capability or the sophisticated dialog manager. Later, these features can be added to the telephone by simply plugging in the voice card.
By way of summary, the multimodal telephone of the invention comprises a telephone unit having a microphone and a speaker for supporting voiced communication by a user. The microphone and speaker may be incorporated into the handset of the telephone unit according to conventional practice, or they may be separate from the handset. A visual display device is disposed on the telephone unit, the display being adapted for displaying a plurality of different command prompts to the user. The presently preferred embodiment employs a multiline liquid crystal display (LCD) for this purpose. The multimodal telephone further comprises at least one programmable function key for entry of keyed commands by the user. The function key is disposed on the telephone unit adjacent the visual display, so that at least a portion of the command prompts are displayed approximately adjacent the function key. The preferred embodiment uses several such function keys, with adjacent command prompts defining the current function of the key.
A speech module is disposed in the telephone unit. The speech module includes a voice recognizer and a speech generator or synthesizer. The speech module is coupled to the telephone unit so that the voice recognizer is responsive to voiced commands entered through the microphone, and the speech synthesizer provides audible prompts through the speaker.
The multimodal telephone further comprises a dialog manager coupled to the visual display as well as to the function keys and the speech module. The dialog manager defines a hierarchically arranged set of control function states. Each state is associated with one of the command prompts and at least a portion of the states are further associated with one of the audible prompts. The dialog manager is responsive to the voiced commands, and also to the function keys, to traverse the hierarchically arranged set of control function states and select one of the control function states as the active state.
The dialog manager is operative to maintain synchronism between the command prompts and the audible prompts. The dialog manager is also operative to maintain synchronism between voiced commands and keyed commands, so that the state hierarchically adjacent to the active state is displayed as a command prompt and the user has the option to move from the active state to the hierarchically adjacent state by either voiced command or keyed command.
For a more complete understanding of the invention, its objects and advantages, reference may be had to the following specification and drawings and to the pseudocode listing in the Appendix.