The invention relates generally to communication systems and methods and more particularly to multimodal communications system and methods.
An emerging area of technology involving communication devices such as handheld devices, mobile phones, laptops, PDAs, internet appliances, non-mobile devices and other suitable devices, is the application of multimodal interactions for access to information and services. Typically resident on a communication device is at least one user agent program, such as a browser, or any other suitable software that can operate as a user interface. The user agent program can respond to fetch requests (entered by a user through the user agent program or from another device or software application), receives fetched information, navigate through content servers via internal or external connections and present information to the user. The user agent program may be a graphical browser, a voice browser, or any other suitable user agent program as recognized by one of ordinary skill in the art. Such user agent programs may include, but are not limited to, J2ME application, Netscape(trademark), Internet Explorer(trademark), java applications, WAP browser, Instant Messaging, Multimedia Interfaces, Windows CE(trademark) or any other suitable software implementations.
Multimodal technology allows a user to access information, such as voice, data, video, audio or other information, and services such as e-mail, weather updates, bank transactions and news or other information through one mode via the user agent programs and receive information in a different mode. More specifically, the user may submit an information fetch request in one or more modalities, such as speaking a fetch request into a microphone and the user may then receive the fetched information in the same mode (i.e., voice) or a different mode, such as through a graphical browser which presents the returned information in a viewing format on a display screen. Within the communication device, the user agent program works in a manner similar to a standard Web browser or other suitable software program resident on a device connected to a network or other terminal devices.
As such, multimodal communication systems are being proposed that may allow users to utilize one or more user input and output interfaces to facilitate communication in a plurality of modalities during a session. The user agent programs may be located on different devices. For example, a network element, such as a voice gateway may include a voice browser. A handheld device for example, may include, a graphical browser, such as a WAP browser or other suitable text based user agent program. Hence, with multimodal capabilities, a user may input in one mode and receive information back in a different mode.
Systems, have been proposed that attempt to provide user input in two different modalities, such as input of some information in a voice mode and other information through a tactile or graphical interface. One proposal suggests using a serial asynchronous approach which would require, for example, a user to input voice first and then send a short message after the voice input is completed. The user in such a system may have to manually switch modes during a same session. Hence, such a proposal may be cumbersome.
Another proposed system utilizes a single user agent program and markup language tags in existing HTML pages so that a user may, for example, use voice to navigate to a Web page instead of typing a search word and then the same HTML page can allow a user to input text information. For example, a user may speak the word xe2x80x9ccityxe2x80x9d and type in an address to obtain visual map information from a content server. However, such proposed methodologies typically force the multimode inputs in differing modalities to be entered in the same user agent program on one device (entered through the same browser). Hence, the voice and text information are typically entered in the same HTML form and are processed through the same user agent program. This proposal, however, requires the use of a single user agent program operating on a single device.
Accordingly, for less complex devices, such as mobile devices that have limited processing capability and storage capacity, complex browsers can reduce device performance. Also, such systems cannot facilitate concurrent multimodal input of information through different user agent programs. Moreover, it may be desirable to provide concurrent multimodal input over multiple devices to allow distributed processing among differing applications or differing devices.
Another proposal suggests using a multimodal gateway and a multimodal proxy wherein the multimodal proxy fetches content and outputs the content to a user agent program (e.g. browser) in the communication device and a voice browser, for example, in a network element so the system allows both voice and text output for a device. However, such approaches do not appear to allow concurrent input of information by a user in differing modes through differing applications since the proposal appears to again be a single user agent approach requiring the fetched information of the different modes to be output to a single user agent program or browser.
Accordingly, a need exists for an improved concurrent multimodal communication apparatus and methods.