The present invention relates generally to systems and methods for building multi-modal browsers applications and, in particular, to systems and methods for building modular multi-modal browsers using a DOM (Document Object Model) and MVC (Model-View-Controller) framework that enables a user to interact in parallel with the same information via a multiplicity of channels, devices, and/or user interfaces, while presenting a unified, synchronized view of such information across the various channels, devices and/or user interfaces supported by the multi-modal browser.
The computing world is evolving towards an era where billions of interconnected pervasive clients will communicate with powerful information servers. Indeed, this millennium will be characterized by the availability of multiple information devices that make ubiquitous information access an accepted fact of life. This evolution towards billions of pervasive devices being interconnected via the Internet, wireless networks or spontaneous networks (such as Bluetooth and Jini) will revolutionize the principles underlying man-machine interaction. In the near future, personal information devices will offer ubiquitous access, bringing with them the ability to create, manipulate and exchange any information anywhere and anytime using interaction modalities most suited to the an individual's current needs and abilities. Such devices will include familiar access devices such as conventional telephones, cell phones, smart phones, pocket organizers, PDAs and PCs, which vary widely in the interface peripherals they use to communicate with the user.
The increasing availability of information, along with the rise in the computational power available to each user to manipulate this information, brings with it a concomitant need to increase the bandwidth of man-machine communication. The ability to access information via a multiplicity of appliances, each designed to suit the individual's specific needs and abilities at any given time, necessarily means that these interactions should exploit all available input and output (I/O) modalities to maximize the bandwidth of man-machine communication. Indeed, users will come to demand such multi-modal interaction in order to maximize their interaction with information devices in hands-free, eyes-free environments.
The current networking infrastructure is not configured for providing seamless, multi-modal access to information. Indeed, although a plethora of information can be accessed from servers over a communications network using an access device (e.g., personal information and corporate information available on private networks and public information accessible via a global computer network such as the Internet), the availability of such information may be limited by the modality of the client/access device or the platform-specific software applications with which the user is interacting to obtain such information.
By way of example, one of the most widely used methods for accessing information over a communications network is using a conventional HTML browser to access information over the WWW (world wide web) using, for example, portals such as Yahoo! and AOL. These portals typically include a directory of Web sites, a search engine, news, weather information, e-mail, stock quotes, etc. Typically, only a client/access device having full GUI capability can take advantage of such Web portals for accessing information.
Other conventional portals and access channels include wireless portals/channels that are typically offered by telephone companies or wireless carriers (which provide proprietary content to subscribing users and/or access to the Internet or a wireless portion of the Internet, with no restrictions or access control). These wireless portals may be accessed via WAP (wireless application protocol) by client/access devices (via a WAP browser) having limited GUI capabilities declaratively driven by languages such as WML (wireless markup language), CHTML (compact hypertext markup language) such as NTT DocoMo imode), or XHTML (eXtensible HTML) Mobile Profile (XHTML-MP) as specified by WAP 2.0. WAP together with WML and XHTML-MP and iMode with CHRML allow a user to access the Internet over a cellular phone with constrained screen rendering and limited bandwidth connection capabilities. Currently, wireless portals do not offer seamless multi-modal access (such as voice and GUI) or multi-device (more than one device simultaneously available) regardless of the access device. Instead, a separate voice mode is used for human communication and a separate mode is used for WAP access and WML browsing.
In addition, IVR services and telephone companies can provide voice portals having only speech I/O capabilities. The IVR systems may be programmed using, e.g., proprietary interfaces (state tables, scripts beans, etc.) or VoiceXML (a current speech ML standard) and objects. With a voice portal, a user may access an IVR service and perform voice browsing using a speech browser (or using telephone key pads). Unfortunately, a client/access device having only GUI capability would not be able to directly access information from a voice portal. Likewise, a client/access device having only speech I/O would not be able to access information in a GUI modality.
Currently, new content and applications are being developed for Web accessibility with the intent of delivering such content and application via various channels with different characteristics, wherein the content and applications must be adapted to each channel/device/modality. These “multi-channel applications” (an application that provides ubiquitous access through different channels (e.g., VoiceXML, HTML), one channel at a time) do not provide synchronization or coordination across views of the different channels.
One challenge of multi-channel applications/content is that since new devices and content emerge continuously, this adaptation must be made to work for new devices not originally envisioned during the development process. In addition, it is important to be able to adapt existing content that may not have been created with this multi-channel or multi-modal deployment model in mind.
Further challenges of multi-channel applications is that, notwithstanding that multi-channel applications enable access to information through any device, it is difficult to enter and access data using small devices since keypads and screens are tiny. Further, voice access is more prone to errors and voice output is inherently sequential. One interaction mode does not suit all circumstances: each mode has its pros and cons. One optimal interaction mode at a moment can no more be optimal at another moment or for another user. All-in-one devices are no panacea, and many different devices will coexist. In fact, no immediate relief is in sight for making multi-channel e-business easier. Devices are getting smaller, not larger. Devices and applications are becoming more complex requiring more complex or efficient user interfaces. Adding color, animation, streaming, etc. does not simplify the e-business issues mentioned above. Considering these factors leads to the conclusion that an improved user interface will accelerate the growth of mobile e-business.
Accordingly, systems and methods for building and implementing user interfaces an applications that operate across various channels and information appliances, and which allow a user to interact in parallel with the same information via a multiplicity of channels and user interfaces, while presenting a unified, synchronized view of information across the various channels, are highly desirable. Indeed, there will be an increasingly strong demand for devices and browsers that provide such capabilities.