Speech recognition systems generally use one of two principal techniques for searching for the best scoring sentence hypothesis, a frame synchronous beam search or a priority queue based search (also called a “stack decoder”).
The priority queue search (or stack decoder) uses a priority queue so that the most promising hypotheses can be evaluated first. However, in practice, current recognition systems do not achieve this potential advantage. It has been discovered (X. Huang, A. Acero, H. W. Hon, Spoken Language Processing, Prentice Hall, 2001, p. 639) that when the hypotheses on the priority queue are sorted strictly by their match scores, the priority queue search often requires much more computation than the frame-synchronous beam search. This extra computation arises in spite of the fact that the most promising hypotheses are evaluated first. In such a system, it is still necessary to evaluate many alternate hypotheses. A priority queue search with priority based solely on match scores is not as efficient at pruning these alternate hypotheses as a frame-synchronous search, so overall it does more computation. In particular, there is extra computation whenever the previously most promising hypothesis matches poorly on new frames and needs to be replaced by a new hypothesis in the priority queue. Potentially, the computation in this search of the tree of alternate hypotheses can grow exponentially with the length of the sentence.
Therefore, speech recognition systems using a priority queue search generally use a priority queue that ranks shorter hypotheses ahead of longer hypotheses (a hypothesis is considered “shorter” or “longer” based on the estimated ending time of the hypothesis in the speech data), regardless of their actual match scores (such systems are sometimes called “multi-stack decoders,” although the multi-stack implementation is mathematically equivalent to a single stack or priority queue in which the priority sort is hierarchical based first on the ending time of the hypothesis and then only comparing scores for hypotheses that end at the same time). This priority scheme makes such a priority queue search comparable in computational efficiency to frame-synchronous beam search, but it also removes most of the potential advantage of priority queue search, because all shorter hypotheses must be evaluated first before the most promising longer hypothesis can be evaluated.