1. Field of the Invention
The present invention relates to multi-modal computer interfaces and more specifically to a system and method of using graphical widgets to increase the efficiency of multi-modal computer interaction.
2. Discussion of Related Art
The availability of multi-modal interfaces is expanding as speech recognition technology, gesture recognition technology and computing power increases. For example, known speech recognition technology enables a user to provide some basic instructions such as “call mom” to a computer device, such as a telephone system. In this manner, the telephone system retrieves the telephone number for “mom” and dials the number, thus enabling the user to drive and dial a phone number without the distraction of pressing the touch-tone telephone buttons. Such systems are “multi-modal” because the user can interact with the device in more than one manner, such as via touch-tone buttons or speaking.
Similarly, graphical user interfaces (“GUIs”) are also well known in the art. Interfaces such as the Microsoft® Windows system, the Macintosh® operating system, and handheld systems such as Palm Pilot's® operating system provide users with a graphical interface including menus providing selectable options to navigate and achieve tasks. For example, the well-known Microsoft “Start” option in the GUI pops up a menu with user-selectable options like “Programs” or “Settings.” These menus enable the user to navigate and control the computer and complete tasks.
Other computer devices provide graphical user interfaces for users to provide and receive information in an efficient manner. Some attempts have been made to combine speech recognition technology with graphical user interfaces. One example is the Multi-Modal Voice Post Query (MVPQ) Kiosk, discussed in S. Narayanan, G. Di Fabbrizio, C. Kamm, J. Hubbell, B. Buntschuh, P. Ruscitti, J. Wright, “Effects of Dialog Initiative and Multi-Modal Presentation Strategies on Large Directory Information Access,” ICSLP, pp. 636, 639, Beijing, China, 2000 (“Kamm et al.”), incorporated herein. The MVPQ kiosk allows users to select among a number of different options when they request information about a particular person in a telephone and address directory software application. FIG. 1(a) illustrates an example opening GUI 10 for a MVPQ Kiosk. This GUI enables the user to either type in a name in the field 12 or say the name that the person wishes to look up.
For example, if the user asks for “Kowalski,” the system presents either the name and information for the person named Kowalski or, if there is more than one, the different Kowalski's in a list on the display screen 10 and the user can use touch input or mouse control to select the person they want. FIG. 1(b) illustrates the display screen 10 with the information for the user to select from the various Kowalski names 14. The Kamm et al. system provides some improved interaction in a multi-modal context. The multi-modal disambiguation display 14 shown in FIG. 1(b) lists the Kowalskis and asks the user to choose the one that is wanted. While there are some benefits to this interactive operation, the Kamm et al. system fills the entire display screen with the disambiguation information, thus precluding the presentation of any other information. Thus, in the Kamm et al. system, other information being presented at the time the disambiguation routine executes is covered or removed since the entire screen is used for disambiguation. These multi-modal interfaces provide some improvement in efficiently providing users with information in a small number of interactions, but they still include some deficiencies.
One of the primary deficiencies is that menus or dialogs with a user that take the user away from the primary task are distracting and tend to cause the user to lose focus. Further, besides being taken to a dialog outside the primary task, the typical menu or form filling query presents the user with too much information. Thus, by the time the user can regain focus on the task, time and energy are wasted and the user has to regain momentum and attention to his or her main objective.
The benefits of multi-modal interfaces include increasing the speed and reducing the number of inputs necessary to obtain desired information. While speech recognition systems, graphical user interfaces and menu options provide some advantages, they still fail to intelligently enable a user to provide and receive information to and from a computer device with the least number of steps.