There are many types of wireless mobile devices being used in the world today including mobile phones, personal digital assistants (“PDAs”), hand-held devices, and combinations of these devices. Wireless transport networks and wireless local area networks allow electronic content to flow to and from these mobile devices. With the growing popularity of mobile devices, mobile information access and remote transactions are fast becoming commonplace. However, mobile devices impose their limitations on the end user experience. For example, mobile phones have relatively small visual displays and a cumbersome keypad input. PDAs have better visual displays, but have the same input limitations. As devices become smaller, modes of interaction other than keyboard and stylus are a necessity. One such alternative is the use of multimodal access methods.
Multichannel access is the ability to access enterprise data and applications from multiple methods or channels such as a phone, laptop or PDA. The term “channel” refers to the different browsing platforms or user agents that access, browse, and interact with online applications. Multichannel applications are designed for universal access across different channels, one channel at a time, with no particular attention paid to synchronization or co-ordination among the different channels. A user has an array of channels with which to access content, which appears separate but functional and consistent. For example, a user may access his or her bank account balances on the Web using Microsoft® Internet Explorer when in the office or at home and may access the same information over a dumb phone using voice recognition and text-to-speech when on the road.
By contrast, multimodal access is the ability to combine multiple modes or channels in the same interaction or session. The methods of input include speech recognition, keyboard, touch screen, and stylus. Depending on the situation and the device, a combination of input modes will make using a small device easier. For example, in a Web browser on a PDA, a user can select items by tapping or by providing spoken input. Similarly, a user can use voice or stylus to enter information into a field. With multimodal technology, information on the device can be both displayed and spoken. This can be especially important in automobiles or other situations where hands and eyes free operation is essential.
Multimodal applications represent the convergence of content—video, audio, text and images—with various modes of user interface interaction. This enables a user to interact with an application in a variety of ways, for example: input with speech, a keyboard, keypad, mouse and/or stylus, and output such as synthesized speech, audio, plain text, motion video and/or graphics.
The term “mode” denotes a mechanism for input and output to a user interface. A user can employ each of these modes independently or concurrently. Multimodal applications incorporate any number of modes simultaneously so a user can vocalize his/her name, type in an address, send a phone number from a wireless handset—all within the same session, form, and application context. The browser will typically let a user select the most appropriate mode of interaction based on the user's situation, activity, or environment.
The different modes may be supported on a single device or on separate devices working in tandem. When separate devices work in tandem, this is typically referred to as distributed multi-modal computing. An example of distributed multi-modal computing is a user is talking into a cell phone and seeing the results on a PDA. Voice may also be offered as an adjunct to browsers with high resolution graphical displays, providing an accessible alternative to using the keyboard or screen.
Multimodal applications are an improvement over multichannel applications. Advantages of multimodal applications include: multimodal interfaces improve the usability of data services such as weather, driving directions, stock quotes, personal information management, and unified messaging; Application Service Providers can offer users a wider range of personalized and differentiated offerings using multimodal interfaces; many call center applications and enterprise data services such as account management, brokerage accounts, customer service, and sales force automation offer voice-only interfaces and multimodal interfaces added to these applications enhance a users experience; with multimodal interfaces, a user can easily access and enter information, especially when using small devices by combining multiple input and output devices; multimodal applications improve a users experience with mobile devices and encourage the growth and acceptance of m-Commerce; a user need not be constrained by the limitations of a particular interaction mode at any given moment, for example, while listening to instructions on a Voice browser, a user is constrained by the ephemeral nature of the interface; a user may wish to listen to the instructions again; multimodal interfaces give a user the flexibility to choose the most convenient interaction mode that suits the task and purpose; they can also exploit the resources of multiple interfaces in order for a user to have an enhanced computing experience.
Users of multimodal interfaces, however, do face certain issues. These issues include ergonomic issues and appropriateness. Ergonomic issues may arise as a user switches from one mode to another, such as alternating between listening and watching. Appropriateness issues are in the nature of a user disabling speech input and output when this would be distracting to nearby people. Considering all of the various issues, however, a user must still select the most appropriate mode of interaction based on the user's situation, activity, or environment.