This application deals generally with the field of automated speech recognition (ASR) and more specifically with ASR systems embedded in mobile devices.
A number of mobile devices, such as smart phones, include embedded ASR systems. A typical application for such systems is voice control for operating the telephone (dialing) or for looking up various kinds of information (search). Such ASR systems are capable running in two modes, a local mode that functions entirely on the mobile device, and a remote mode in which processing is accomplished over a network connection to a server. As its name suggests, the local mode relies on embedded software on the client device to perform the entire speech recognition task. The remote mode (also referred to as server-based or cloud-based mode) transmits a recognition request to a server, which performs the task and sends results back to the mobile device. Even so, subtasks such as feature extraction and the calculation of acoustic scores can be accomplished on the client device; the latter architecture is sometimes referred to as distributed recognition.
Local mode ASR offers advantages of speed and responsiveness, but a local system is inherently limited in processing and data storage capabilities, both of which may impact the quality of the speech recognition result. In contrast, server-based recognition offers full-featured results, but depends upon a fast, reliable communications link, which may not be always available. In some instances, achieving the high quality of server-based speech recognition comes at the cost of unacceptable delays.
Thus, a need exists for an ASR system for mobile devices that combines the responsiveness of a local system with the quality of a server-based system.