Typical interaction with content, such as web content, takes place in only one browser at a time. For example, a user may interact with a web site by downloading an Hypertext Markup Language (HTML) page and using a visual browser to interact with the content represented by the page (i.e., “visual/tactile” mode), or may request the same content in the form of a Voice eXtensible Markup Language (VXML) page, and may use a voice browser to interact with that content (i.e., “voice” or “audio” mode). The disadvantage of this arrangement is that the user can generally interact with the content in only one mode at a time. For example, if the user downloads the HTML version of a portal site's home page, the user cannot then speak the word “mail” to be taken to the portal's mail service. The user must select a link to the mail service using a mouse, stylus, keyboard, etc.
It would be advantageous if a user could use both browsers to interact with content simultaneously—e.g., if the user could say the word “mail” to be taken to the mail service, and then enter the user's ID and password using a keyboard. Moreover, it would be further advantageous if such a system could be implemented using existing browsers and markup languages, or only minor extensions thereto.
In view of the foregoing, there is a need for a system that overcomes the drawbacks of the prior art.