Spoken language is the most natural and convenient communication tool for people. Advances in speech recognition technology have allowed an increased use of spoken language interfaces with a variety of different machines and computer systems. Interfaces to various systems and services through voice commands offer people convenience and efficiency, but only if the spoken language interface is reliable. This is especially important for applications in eye-busy and hand-busy situations, such as driving a car or performing sophisticated computing tasks. Human machine interfaces that utilize spoken commands and voice recognition are generally based on dialog systems. A dialog system is a computer system that is designed to converse with a human using a coherent structure and text, speech, graphics, or other modalities of communication on both the input and output channel. Dialog systems that employ speech are referred to as spoken dialog systems and generally represent the most natural type of human machine interface. With the ever-greater reliance on electronic devices, spoken dialog systems are increasingly being implemented in many different systems.
In many human-machine interaction (HMI) systems, users can interact with the system through multiple input devices or types of devices, such as through voice input, gesture control, and traditional keyboard/mouse/pen inputs. This provides user flexibility with regard to data input and allows users to provide information to the system more efficiently and in accordance with their own preferences.
Present HMI systems typically limit particular modalities of input to certain types of data, or allow the user to only use one of multiple modalities at one time. For example, a vehicle navigation system may include both a voice recognition system for spoken commands and a touch screen. However, the touch screen is usually limited to allowing the user to select certain menu items by contact, rather than through voice commands. Such multi-modal systems do not coordinate user commands through the different input modalities, nor do they utilize input data for one modality to inform and/or modify data for another modality. Thus, present multi-modal systems do not adequately provide a seamless user interface system in which data from all possible input modalities can be used to provide accurate information to the system.
What is desired, therefore, is a multi-modal information user input interface for HMI systems that can synchronize and integrate information obtained from different modalities, disambiguate and recover from errors with the assistance of the multi-modal input information. Such a system would greatly improve user satisfaction, system performance and system robustness.
What is further desired is an HMI user input system that can synchronize and integrate the multi-modal information obtained from different modalities in any order.