                1. Field        
The present invention relates generally to speech processing and, more specifically, to phoneme lattice construction and its application to speech recognition and keyword spotting.                2. Description        
Automatic speech recognition (ASR) or automatic keyword spotting (AKS) is a process of transforming an audio input into a textual representation. This process may comprise two phases: transforming the audio input into a sequence of phonemes, and transforming the sequence of phonemes into a sequence of words or detecting keywords in the sequence of phonemes. These two phases, however, are mathematically coupled and usually must be performed jointly in a single process. A typical ASR system uses hidden Markov models (HMMs) and dynamic programming search to perform the two phases jointly. Similar techniques are used for a typical AKS system.
Recently, the concept of distributed speech recognition (DSR) was introduced and the speech processing research community has invested considerable efforts in this approach. The main idea in DSR is to distribute the computation of a speech recognition application between a client and a server. The current standard defined by the European Telecommunications Standards Institute (ETSI) is very limited because only a small fraction of computation is performed by a client. Such a limitation is largely due to the fact that it is hard to separate the two phases of the computing process in a typical ASR or AKS system. The portion of computation performed by a client, as specified by the ETSI, is parameterization of a speech signal, specifically, extracting Mel-frequency cepstral coefficients (MFCC) for each short segment of the speech signal. Nowadays even a small handheld device (e.g., Intel Xscale Architecture based personal digital assistants (PDAs)) can have much more computing power than that required by parameterization of a speech signal. Thus, it is desirable for a DSR system to distribute more jobs to a client device without sacrificing recognition accuracy.
For an AKS application, on one hand, audio data to be searched may be large and might not be able to be stored in a client device. On the other hand, a user may want to submit a search request from a mobile device. Therefore, it also is desirable to distribute AKS processing between a client and a server.