1. Field of the Invention
The present invention relates to multimodal browsers and voice servers and, more particularly, to voice-enabled multimodal applications for multimodal browsers and voice servers.
2. Description of the Related Art
Recent developments, many founded on the data-description eXtensible Markup Language (XML), have given rise to new Web-based applications including multimodal interfaces or browsers. A multimodal browser allows a user to access multimodal content, content that can be both graphical and audible. Traditionally, the user accessed Web content utilizing graphic input from a keyboard or manually directed screen-pointer entry. Later, the user also was been able to utilize speech input. More recently, the user has been able to access Web content through multimodal interfaces which permit the use of both graphic and speech inputs.
One type of multimodal browser is provided by the eXtensible Hypertext Markup Language (XHTML or XML)+Voice eXtensible markup language (VXML), also denoted more succinctly as the X+V markup language. The X+V markup language extends the traditional graphic browser to include spoken interactions. The X+V markup language integrates XHTML and XML-events technologies with XML vocabularies that were developed as part of the World Wide Web Consortium (W3C) Speech Interface Framework. The integration includes voice modules that support speech synthesis, speech dialogs, command and control applications, and speech grammars. Voice handlers can be attached to XHTML elements and respond to specific Document Object Model (DOM) events of a visual browser.
Notwithstanding these developments, a number of user desirable capabilities are not present in conventionally implemented multi-modal interfaces, such as a user-friendly capability to fill form fields based upon speech utterances. Forms requiring user input have become commonplace. For example, users must commonly complete a form before being granted access privileges to enter a secure Web site. Inputting form information can be tedious, time consuming, and even frustrating. This can be especially true for a user who repetitively accesses content from various Web sites, each of which requires form-based input of user data before access is allowed. Moreover, the user may be using a device to access Web content that has limited or inconvenient input options. For example, a telephone, mobile phone, personal digital assistant (PDA), or similar type device often includes only a limited array of keys, a very small keypad, or nothing other than a voice input mechanism. It is desirable, therefore, that multimodal browsers be extended to provide an efficient way of voice enabling the automatic filling of form fields.