There is a desire in the telecommunication industry to execute large-scale speech recognition applications on mobile handheld devices, e.g., personal digital assistants, mobile phones, and other similar devices and combinations. These devices typically lack the processing power required to accommodate the computing-intensive tasks of recognition and natural language understanding on the device, also referred to as the endpoint, itself. Methods to relocate the recognition portion of the computing-intensive tasks to a remote “backend” in-network recognition system have been proposed and implemented. “Backend” in-network recognition systems are separate from the handheld device employed by a user, but are connected via a telecommunication path, whether the path is a wireless or wired connection, to the user device.
A problem to be addressed in a distributed telecommunication system is where to perform speech recognition. A variety of solutions have been suggested and/or tried in the past:                speech recognition performed on mobile device;        speech recognition performed on server; and        speech recognition performed on a third party/application server.        
With respect to speech recognition performed on mobile device, existing mobile devices such as hand held devices, tablet-based personal computers (PCs), and cellular phones are equipped with computing platforms capable of performing, in most cases, lightweight operations. Speech recognition is a complex process requiring analysis of speech signals, extraction of features, searching statistical models (such as Gaussian Mixture Models, Neural Networks, etc.), and combinations of word and language statistics. Resources, such as memory and processing power, on a mobile device are usually limited due to the nature and size of the devices. Therefore, embedded speech recognition software (e.g., Sensory available at <<www.sensoryinc.com>> or fonix available at <<www.fonix.com>>) is suitable for simple tasks; however, speech recognition software requires a larger, more capable computing platform to perform complex tasks.
Performing speech recognition on end-user mobile device may have the following advantages:                1) recognition is spontaneous as the recognition task on the mobile device starts immediately and there is no network transfer delay;        2) recognition requires less network connection time; and        3) convenient for simple recognition tasks.        
Speech recognition on mobile device has the following disadvantages:                1) embedded recognizers usually have limited processing capabilities; and        2) the recognition task consumes the computing capabilities of the device and slows down other operations executing on the device.        
With respect to speech recognition performed on telecom server, many telecommunication operators provide support for backend interactive voice response systems. For example, cellular telecommunication carriers such as Sprint provide support for voice browsing of voicemail messages using a mobile telephone. The telecommunication provider hosts a voice recognizer subsystem (usually on a separate server) to perform speech recognition. Speech recognition on such servers is usually high-end powerful recognizers because computing resources are available to perform complex recognition tasks.
With respect to speech recognition performed on application/third party servers, sometimes, the telecommunication operator does not provide the voice recognition service to the user. In such systems, the voice signal is routed to a third party application provider, which performs the speech recognition and the requested function. Similar to the telecommunication provider-hosted solution, complex recognition tasks are performed by the computing platform of the third party application provider in this solution.
Performing speech recognition on a server (whether the telecommunication provider or a third party) may have the following advantages:                1) suitable for complex recognition tasks;        2) recognition accuracy is generally higher in comparison to recognition accuracy of the mobile device; and        3) mobile device is offloaded from heavy recognition operations thereby enabling the device to be more responsive to the user.        
Speech recognition performed on the server has the following disadvantages:                1) requires a network connection and utilizes the network bandwidth to transfer voice data; and        2) server computing resources are shared with multiple users and hence the server load is a function of how many callers are using the system simultaneously.        