Currently, computer-supported communication via the Internet has, in a short time, become an important determinant of social life and a driving force of economic development in all developed and developing countries. In this field, global markets with huge potential for development have formed within a very short time out of a rapidly evolving process of innovation.
In order to meet the requirements of the explosive growth of Internet technology, speech researchers have been putting a great deal of effort into integrating speech functions into Internet applications. For simple applications, voice playback and voice recording functions may be sufficient. Although computationally intensive, speech recognition functions are needed for complex applications such as voice access to the World Wide Web on a personal digital assistant or wireless phone. DSR can effectively balance computational loads and utilize the bandwidth of heterogeneous networks for voice recognition applications.
Generally, there are three alternative strategies in the design of DSR architectures. The first strategy is server-only processing, in which all processing is done at the server side and the speech signal is transmitted to the server either through the Internet by using speech coding or via a second channel like telephone. Because all the recognition processing takes place at the server side, this approach has the smallest computational and memory requirements on the client, thereby allowing a wide range of client machines to access the speech-enabled application. The disadvantage of this approach is that users cannot access applications through a low-bandwidth connection.
The second conventional strategy is client-only processing, in which most of the speech processing is done at the client side and the results are transmitted to the server. The typical advantages are that a high-bandwidth connection is not required and recognition can be based on high-quality speech, because the sampling and feature extraction takes place at the client side. This approach is also less dependent on the transmission channel and is therefore more reliable. However, the type of clients that the speech-enabled applications can support is significantly limited with this approach because the client must be powerful enough to perform the heavy computation that takes place in the speech recognition process.
The third strategy is a client-server approach. In the client-server DSR processing model, front-end processing is done at the client side. The speech features are then transmitted to the server and finally the processing of the speech decoding and language understanding is performed at the server side.
At present, the client-server based DSR approach has not been exploited. This approach requires a great deal of effort to integrate a client-server based DSR system into Internet and wireless communication applications.
The difficulty of implementing an efficient client-server DSR architecture relates to diverse handheld device designs, network infrastructure complexity, network gateway or server diversity, and unstructured information content. To make a client-server based DSR system happen in industry and the commercial market, the development efforts of many diverse areas are needed. As such, a comprehensive, practical, and efficient architecture for DSR in a client/server design has not been achieved in the conventional art.