1. Field of the Invention
The invention relates to a multimodal system and method and, more particularly, to a multiple sub-session multimodal system and method.
2. Description of the Related Art
As computing permeates society, the need to develop multimodal network systems increases. Multimodal systems involve a single user simultaneously interacting with several applications using a variety of input and output mechanisms. Multimodal systems allow for multiple interface modes including voice (e.g., via speech recognition engines), keypad, keyboard, mouse, and/or stylus input and text, video, graphics, audio (e.g., prompts), and voice (e.g., text to voice engines) output. Multimodal systems use each of these modes independently or concurrently.
A car's telematics unit might initiate a multimodal session with a location based server (LBS). In this context, a driver in a moving car uses voice to direct its telematics unit to acquire directions to a certain location from the LBS. The LBS responds in two ways: 1) the LBS relays voice instructions to guide the driver to the location; and 2) the LBS graphically displays a location map on the telematics unit's screen. Alternatively, a driver in a parked car interfaces with the LBS using a combination of button presses and voice input.
Another example of a multimodal session involves a user filling out a web airline reservation. The input modality here depends on the user's circumstance. For example, if the user is on foot, he might prefer to fill out the reservation form by speaking into his web enabled cellular phone. The cellular phone transmits the speech to a server that, in turn, converts the speech to fill out the form. Having reached the comfort of his office, the user might chose to complete the form using his keyboard and mouse as inputs while looking at a graphical representation of the form on his screen.
Yet another example of a multimodal session involves a user obtaining flight information through the web. The user might click on a flight icon on a device and say “Show me flights from San Francisco to Boston after 7 p.m. on Saturday.” The browser then displays a web page with the corresponding flights, allowing the user to click on a specific flight and obtain further information.
Development in multimodal systems has focused on multimodal interaction (MI) on a single multimodal web session. MI extends the web interface to allow use of the multiple input and output modalities described earlier. Other solutions for MI use a multimodal markup language (MML) together with a hypertext transfer protocol (HTTP) session. A solution that uses a mixed mode MML for MI is U.S. patent application Ser. No. 10/293,529, filed Nov. 12, 2002, and assigned to the same assignee as the present application. The Speech Applications Language Tags (SALT) group and the World Wide Web Consortium (W3C) are each defining an MML for an MI user interface. The MI in each of these cases depends on the underlying MML.
Accordingly, a need remains for an improved multimodal system and method that is MML independent.