The present invention relates generally to the field of networked computing. More particularly, the invention provides a technique for synchronizing the state of plural browsers at various levels of granularity. The technique may be used to synchronize between visual and voice browsers in wireless multi-modal applications.
Increasingly, wireless communications devices such as wireless telephones are becoming adapted for use with the Internet. It is estimated that, by the year 2003, half of all Internet traffic will come from wireless devices. Many present-day wireless telephones have at least some capability not only to capture and render audio information, but also to allow users to interact with data using a visual display and some form of data input device. Many wireless carriers are marketing access to the so-called “wireless web” as part of wireless telephone service.
While wireless data access is clearly a boon to electronic device users, there is a notable deficiency in the current wireless data access technology: a user must generally choose to interact with the data either using the voice components of the wireless device (i.e., microphone and speaker), or using the visual components (i.e., screen and keypad), but cannot use both at the same time. Some aspects of communication work best with a visual interface, and others work best with a voice interface. For example, suppose that an application provides directions to a specified location—e.g., for use while driving. It is convenient to speak the name of the desired location as input to the application, but it is cumbersome to receive the directions themselves in the form of speech. A visual map combined with written directions such as “turn right on Elm” is a very convenient format in which to receive the directions, but it is may be less convenient to input the desired location using a keypad or stylus than it is merely to speak the location into a microphone. Thus, the ideal interface for certain applications is, in many cases, not visual or voice alone, but rather a combination of visual and voice. Present wireless application often allow one or the other but not both.
One problem that impedes the integrated and combined use of voice and visual interfaces to data is that each mode of communication generally requires its own browser. Typically, a particular piece of content (e.g., a web page) may be represented in both a visual markup language (such as Wireless Markup Language or “WML”), and in a voice markup language (such as Voice eXtensible Markup Language or “VXML”). A visual browser permits the user to navigate through the WML content using the screen and keypad. Similarly, a voice browser, which is generally a software component separate from the visual browser, permits the user to navigate through the VXML content using the microphone and speaker. Not only are the visual and voice browsers separate software components; they often execute on separate, and distantly located, devices. A visual browser typically executes on a wireless handset (such as a wireless-web-enabled telephone). However, the handset is generally “dumb” with respect to voice—i.e., it can capture and render audio signals, but does not have the means to browse and navigate content based on the content of the received audio, or to generate audio signals based on VXML data. Thus, a voice browser typically executes on a voice server and communicates with the user through the microphone and speaker of the wireless device by transmitting and receiving digital signals to the device through an ordinary voice circuit within a wireless network.
Because the voice and visual browsers are separate, it is difficult to switch seamlessly back and forth between a visual and voice modes of interacting with wireless data, because the voice browser may be unaware of what the visual browser is doing and vice versa. That is, the voice and visual browsers are not normally “synchronized,” in the sense that neither knows the other's state with respect to the underlying content that the voice and visual browsers are manipulating. For example, suppose that a wireless handset user uses a visual browser to navigate through a series of web pages, eventually ending up at a particular URL. If the user then decides to switch to the voice interface, the voice browser does not know where the user has navigated to because it is unaware of what the visual browser has been doing. Upon switching to a voice interface, the voice browser can simply re-start the user at a “home” URL, but this is inconvenient for the user because the user loses the benefit of all of the navigation that has already been performed. Similarly, within the page located at a particular URL, the user may have navigated through several cards, and may have positioned the cursor at a particular field on a particular card using the visual browser, but the voice browser will be unaware of all of this activity. The problem, in this example, is that the voice and visual browsers are not “synchronized.”
In view of the foregoing, there is a need for a browser synchronization technique that overcomes the drawbacks of the prior art.