1. Technical Field
The present invention relates generally to a user interface for rendering intent-based markup language scripts in a multi-modal environment and, more specifically, systems and methods for presenting a modality-independent markup script in a plurality of modalities (e.g., speech and GUI) and synchronizing I/O (input/output) events between the different modalities presented.
2. Description of Related Art
The computing world is evolving towards an era where billions of interconnected pervasive clients will communicate with powerful information servers. Indeed, this millennium will be characterized by the availability of multiple information devices that make ubiquitous information access an accepted fact of life. This evolution towards billions of pervasive devices being interconnected via the Internet, wireless networks or spontaneous networks (such as Bluetooth and Jini) will revolutionize the principles underlying man-machine interaction. In the near future, personal information devices will offer ubiquitous access, bringing with them the ability to create, manipulate and exchange any information anywhere and anytime using interaction modalities most suited to an individual's current needs and abilities. Such devices will include familiar access devices such as conventional telephones, cell phones, smart phones, pocket organizers, PDAs and PCs, which vary widely in the interface peripherals they use to communicate with the user.
The increasing availability of information, along with the rise in the computational power available to each user to manipulate this information, brings with it a concomitant need to increase the bandwidth of man-machine communication. The ability to access information via a multiplicity of appliances, each designed to suit the individual's specific needs and abilities at any given time, necessarily means that these interactions should exploit all available input and output (I/O) modalities to maximize the bandwidth of man-machine communication. Indeed, users will come to demand such multi-modal interaction in order to maximize their interaction with information devices in hands-free, eyes-free environments.
The current infrastructure is not configured for providing seamless, multi-modal interaction between man and machine, although new and emerging protocols are being generated and advanced to provide such broad multi-modal interaction. Indeed, various components are preferred to provide seamless multi-modal interaction. One component comprises an application user interface that preferably provides coordinated, synchronized, multi-modal user interaction over a plurality of modalities (e.g., speech, GUI, etc.). For example, one thinks of the various manners in which users interact with a computer as “user interfaces.” Thus, the keyboard/mouse/display may be viewed as one user interface, which the microphone/sound card/speakers may be considered a different “user interface” or modality. One can readily appreciate the advantages associated with a speech-enabled application that can be controlled through voice/aural interface while simultaneously retaining the ability to provide control via the display/keyboard/pointing device interface. For example, an operator of a motor vehicle might desire to interact in a hands-free manner with a navigation system to obtain driving directions through a speech interface, while being able to view an image of a selected map though a visual interface to gain a better sense of orientation.
Another component for providing seamless, multi-modal interaction comprises applications that are authored using a modality-independent programming paradigm, wherein such applications can be created once and rendered and presented across different user interfaces or modalities (a concept referred to as “single authoring”). The adoption of single authored, modality-independent applications has been slow, due in part to difficulties in dealing with the differences in presentation styles across interfaces. In fact, new markup languages (e.g., VoiceXML and WML (Wireless Markup Language) have been developed to address the vagaries of new user interfaces. By way of example, IVR (interactive voice response) services and telephone companies provide voice portals having only speech I/O capabilities. The IVR systems may be programmed using, e.g., proprietary interfaces (state tables, scripts beans, etc.) or VoiceXML (a current speech ML standard) and objects. With a voice portal, a user may access an IVR service and perform voice browsing using a speech browser (or using telephone key pads). Unfortunately, a client device having only GUI capability would not be able to directly access information from a voice portal. Likewise, a client/access device having only speech I/O would not be able to access information in a GUI modality.
Accordingly, a need exists for systems and methods (e.g., an application user interface) to render modality-independent applications in a multi-modal environment.