It is known in the prior art to have an automatic speech recognition system (ASR) system for determining a semantic meaning of a speech input. Typically, the speech is processed into a sequence of digital speech feature frames. Each speech feature frame can be thought of as a multi-dimensional vector that represents various characteristics of the speech signal present during a short time window of the speech. For, example the multi-dimensional vector of each speech frame can be derived from cepstral features of the short time Fourier transform spectrum of the speech signal, the short time power or component of a given frequency band, as well as, the corresponding first- and second-order derivatives. In a continuous recognition system, variable numbers of speech frames are organized as utterances representing a period of speech followed by a pause, which in real life loosely corresponds to a spoken sentence or phrase.
Recently, ASR technology has advanced enough to have applications that are implemented on the limited footprint of a mobile device. This can involve a somewhat limited stand-alone ASR arrangement on the mobile device, or more extensive capability can be provided in a client-server arrangement where the local mobile device does initial processing of speech inputs, and possibly some local ASR recognition processing, but the main ASR processing is performed at a remote server with greater resources, then the recognition results are returned for use at the mobile device.
U.S. Patent Publication 20110054899 describes a hybrid client-server ASR arrangement for a mobile device in which speech recognition may be performed locally by the device and/or remotely by a remote ASR server depending on one or more criteria such as time, policy, confidence score, network availability, and the like.