The present invention relates generally to speech recognition systems. More particularly, the invention relates to dynamic programming pattern sequence recognition techniques in isolated word and continuous speech recognition applications.
Dynamic programming techniques are commonly used today for time-warping problems in both isolated and continuous speech recognition and optimum word sequence searching problems in continuous speech (connected word) recognition. A well known type of dynamic programming recognition that can be used in the context of the Hidden Markov Model (HMM) is the Viterbi algorithm. Dynamic programming techniques can also be used with a variety of other types of speech models besides HMMs, such as neural network models, for example.
The classic Viterbi algorithm is an inductive algorithm in which at each instant (each frame) the algorithm stores the best possible state sequence for each of the n states as an intermediate state for the desired observation sequence O. In this way, the algorithm ultimately discovers the best path for each of the n states as the last state for the desired observation sequence. Out of these, the algorithm selects the one with the highest probability. The classic Viterbi algorithm proceeds frame, by frame, seeking to find the best match between a spoken utterance and the previously trained models.
Taking the case of a Hidden Markov Model recognizer as an example, the probability of the observed sequence (the test speaker's utterance) being generated by the model (HMM) is the sum of the probabilities for each possible path through all possible observable sequences. The probability of each path is calculated and the most likely one identified. The Viterbi algorithm calculates the most likely path and remembers the states through which it passes.
The classic Viterbi algorithm is computationally expensive. It keeps extensive linked lists or hash tables to maintain the list of all active hypotheses, or tokens. A great deal of computational energy is expended in the bookkeeping operations of storing and consulting items from these lists or tables.
Because the classic Viterbi algorithm is computationally expensive, it can noticeably slow down the apparent speed of the speech recognizer. This is especially problematic in real-time systems where a prompt response time is needed. The current solution is simply to use more powerful processors—an expensive solution which can be undesirable in some embedded systems and small consumer products, like cellular telephones and home entertainment equipment.
The present invention seeks to improve upon the classical Viterbi algorithm and is thus useful in applications where processing power is limited. In our experiments we have shown that our new technique improves recognition speed by at least a factor of three. The invention employs a unique lexical tree structure with associated searching algorithms that greatly improve performance. While the system is well-suited for embedded applications and consumer products, it can also be deployed in large, high-speed systems for even greater performance improvement. The algorithm can be used for isolated word recognition, or as a first pass fast match for continuous speech recognition. It can also be extended to cross-word modeling.
For a more complete understanding of the invention, its objects and advantages, refer to the following specification and to the accompanying drawings.