1. Technical Field
The present application relates generally to conversational systems and, more particularly, to a system and method for automatic and coordinated sharing of conversational functions/resources between network-connected devices, servers and applications.
2. Description of Related Art
Conventional conversational systems (i.e., systems with purely voice I/O or multi-modal systems with voice I/O) are typically limited to personal computers (PCs) and local machines having suitable architecture and sufficient processing power. On the other hand, for telephony applications, conversational systems are typically located on a server (e.g., the IVR server) and accessible via a conventional and cellular phones. Although such conversational systems are becoming increasingly popular, typically all the conversational processing is performed either on the client side or on the server side (i.e., all the configurations are either fully local or fully client/server).
With the emergence of pervasive computing, it is expected that billions of low resource client devices (e.g., PDAs, smartphones, etc.) will be networked together. Due to the decreasing size of these client devices and the increasing complexity of the tasks that users expect such devices to perform, the user interface (UI) becomes a critical issue since conventional graphical user interfaces (GUI) on such small client devices would be impractical. For this reason, it is to be expected that conversational systems will be key element of the user interface to provide purely speech/audio I/O or multi-modal I/O with speech/audio I/O.
Consequently, speech embedded conversational applications in portable client devices are being developed and reaching maturity. Unfortunately, because of limited resources, it is to be expected that such client devices may not be able to perform complex conversational services such as, for example, speech recognition (especially when the vocabulary size is large or specialized or when domain specific/application specific language models or grammars are needed), NLU (natural language understanding), NLG (natural language generation), TTS (text-to-speech synthesis), audio capture and compression/decompression, playback, dialog generation, dialog management, speaker recognition, topic recognition, and audio/multimedia indexing and searching, etc. For instance, the memory and CPU (and other resource) limitations of a device can limit the conversational capabilities that such device can offer.
Moreover, even if a networked device is “powerful” enough (in terms of CPU and memory) to execute all these conversational tasks, the device may not have the appropriate conversational resources (e.g., engines) or conversational arguments (i.e, the data files used by the engines) (such as grammars, language models, vocabulary files, parsing, tags, voiceprints, TTS rules, etc.) to perform the appropriate task. Indeed, some conversational functions may be too specific and proper to a given service, thereby requiring back end information that is only available from other devices or machines on the network. For example, NLU and NLG services on a client device typically require server-side assistance since the complete set of conversational arguments or functions needed to generate the dialog (e.g., parser, tagger, translator, etc.) either require a large amount of memory for storage (not available in the client devices) or are too extensive (in terms of communication bandwidth) to transfer to the client side. This problem is further exacerbated with multi-lingual applications when a client device or local application has insufficient memory or processing power to store and process the arguments that are needed to process speech and perform conversational functions in multiple languages. Instead, the user must manually connect to a remote server for performing such tasks.
Also, the problems associated with a distributed architecture and distributed processing between client and servers requires new methods for conversational networking. Such methods comprise management of traffic and resources distributed across the network to guarantee appropriate dialog flow of for each user engaged in a conversational interaction across the network.
Accordingly, a system and method that allows a network device with limited resources to perform complex specific conversational tasks automatically using networked resources in a manner which is automatic and transparent to a user is highly desirable.