The computing world is evolving towards an era where billions of interconnected pervasive clients will communicate with powerful information servers. Indeed, this millennium will be characterized by the availability of multiple information devices that make ubiquitous information access an accepted fact of life. This evolution towards billions of pervasive devices being interconnected via the Internet, wireless networks or spontaneous networks (such as Bluetooth and Jini) will revolutionize the principles underlying man-machine interaction. In the near future, personal information devices such as cell phones, smart phones, pocket organizers, PDAs, PCs, etc., will offer ubiquitous access, affording the ability to create, manipulate and exchange information anywhere and anytime using interaction modalities most suited to the user's current needs and abilities.
The increasing availability of information, along with the rise in the computational power available to each user to manipulate this information, brings with it a concomitant need to increase the bandwidth of man-machine communication. The ability to access information using various devices, each designed to suit the user's specific needs and abilities at any given time, necessarily means that these interactions should exploit all available input and output (I/O) modalities to maximize the bandwidth of man-machine communication.
The current networking infrastructure is not configured for providing seamless, multi-channel, multi-modal and/or conversational access to resources in a distributed environment. For instance, in a distributed environment where appliances and devices can be controlled by voice commands, for example, the applications, user interfaces, and servers that enable such control are constructed based on the localization and user language of the environment in which such applications are implemented.
By way of example, assume that lights in a public room can be automatically controlled using, e.g., speech commands in a given language, to turn off, turn on or dim the light. Suppose a foreign visitor entering the room wishes to dim the lights. If the visitor does not know the local commands and/or local language, the visitor would not be able to personally control the lights. The visitor may make a certain gesture (e.g., frowning, or saying something in a foreign language), however, that would be understood by an assistant accompanying the visitor to mean that the visitor would like the lights dimmed. In such instance, the human assistant would then be able to proceed with the visitor's request to dim the lights by, e.g., uttering the known command in the appropriate language or by engaging in dialog with another person or entity to dim the lights.
Therefore, in the above example, it is disadvantageous to require a user to interact with a networked entity using particular gestures, e.g., a set of specific verbal commands in a particular language. Currently, the localization of application dialogs (i.e., use for other languages) or the adaptation of dialog components to different regional settings (e.g., format of address) requires redesigning the application for each language, etc.