1. Technical Field
The present invention relates generally to a system and method for providing conversational computing and, in particular, to a protocol for providing dialog management and automatic arbitration among a plurality of conversational (multi-modal) applications and an architecture that supports the protocol.
2. Description of Related Art
The computing world is evolving towards an era where billions of interconnected pervasive clients will communicate with powerful information servers. Indeed, this millennium will be characterized by the availability of multiple information devices that make ubiquitous information access an accepted fact of life. This evolution towards billions of pervasive devices being interconnected via the Internet, wireless networks or spontaneous networks (such as Bluetooth and Jini) will revolutionize the principles underlying man-machine interaction. In the near future, personal information devices will offer ubiquitous access, bringing with them the ability to create, manipulate and exchange any information anywhere and anytime using interaction modalities most suited to an individual's current needs and abilities. Such devices will include familiar access devices such as conventional telephones, cell phones, smart phones, pocket organizers, PDAs and PCs, which vary widely in the interface peripherals they use to communicate with the user.
The increasing availability of information, along with the rise in the computational power available to each user to manipulate this information, brings with it a concomitant need to increase the bandwidth of man-machine communication. The ability to access information via a multiplicity of appliances, each designed to suit the individual's specific needs and abilities at any given time, necessarily means that these interactions should exploit all available input and output (I/O) modalities to maximize the bandwidth of man-machine communication. Indeed, users will come to demand such multi-modal interaction in order to maximize their interaction with information devices in hands-free, eyes-free environments.
The current infrastructure is not configured for providing seamless, multi-modal access across a plurality of conversational applications and frameworks. Indeed, although a plethora of information can be accessed from servers over a communications network using an access device (e.g., personal information and corporate information available on private networks and public information accessible via a global computer network such as the Internet), the availability of such information may be limited by the modality of the client/access device or the platform-specific software applications with which the user is interacting to obtain such information.
With the increased deployment of conversational systems, however, new technical challenges and limitations must be addressed. For example, even in current frameworks that support the co-existence of various conversational applications, the possibility to move naturally from one application to another, across all modalities—especially ambiguous modalities such as speech—is not possible without significant modification to the programming model of such applications and the platform on which such applications are executed. For example, explicit (or pre-built) grammars need to be defined for speech applications to shift from one application to the other. Thus, arbitrating in such systems cannot not be performed in an automatic manner without knowledge of the applications that have been installed on the platform.
Furthermore, developing a conversational application using current technologies requires not only knowledge of the goal of the application and how the interaction with the users should be defined, but a wide variety of other interfaces and modules external to the application at hand, such as (i) connection to input and output devices (telephone interfaces, microphones, web browsers, palm pilot display); (ii) connection to variety of engines (speech recognition, natural language understanding, speech synthesis and possibly language generation); (iii) resource and network management; and (iv) synchronization between various modalities for multi-modal applications.
Accordingly, there is need for a system to provide dialog management and automatic arbitration amongst a plurality of conversational (multi-modal) applications, and a protocol that supports such architecture.