In known speech recognition systems, an input speech signal representing an audible utterance or word is analyzed to provide a set of feature or template signals which characterize the word. Such templates may typically be derived from spectral analysis such as linear prediction analysis. Initially, the recognition system is trained through the use of input utterances of identified reference words. Each input utterance is analyzed to provide a set of reference feature signals which are stored for subsequent use in identifying unknown words. During operation of the system, unknown utterance feature signals representing unknown words are compared with the sets of stored reference feature signals to determine the correspondence between the unknown utterance and stored reference signals. A common comparison technique is the dynamic time warping technique which is based on dynamic programming. The dynamic time warping technique allows the unknown feature signals to be non-linearly stretched or compressed in either time or space to optimally match the reference feature signals. The technique compensates for the variable displacement in time of the unknown features due to the many different ways of pronouncing the same word. Different utterances of the same word, even by the same individual, may be widely out of time alignment. An overview of automatic speech recognition may be found in the article by S. E. Levinson and M. Y. Liberman entitled, "Speech Recognition by Computer", Scientific American, April, 1981, Vol. 244, No. 4, pages 64-76.
The basic dynamic time warping technique is described in the article by F. Itakura, "Minimum Prediction Residual Principle Applied to Speech Recognition", IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-23, No. 1, February, 1975. Implementing this technique in real time has required either special purpose hardware as described in U.S. Pat. No. 4,509,187 or special purpose computers. Not only are prior implementations of the technique computationally intensive but require the execution of many decisions and a large amount of random access memory, RAM, to store intermediate results. Those requirements have precluded the complete implementation of the technique on a Very Large Scale Integrated (VLSI) circuit digital signal processor, DSP, such as described in R. C. Chapman, Guest Editor, "Digital Signal Processor", Bell System Technical Journal, Vol. 60, No. 7, Part 2, September, 1981. Such DSP devices are programmable computers having the capabilities of performing computations very rapidly because of their pipeline architecture. However, that architecture makes it difficult to perform the various decision operations of the dynamic time warping technique as priorly implemented since the instructions in the pipeline stream must be aborted after each decision operation requiring a transfer, and time is lost reinitializing the pipeline stream. Further, since such devices are implemented for fast throughput at minimum cost, these devices have limited RAM capabilities.
From the foregoing, it can be seen that there exist a need for a method for implementing dynamic time warping on DSP devices which is tailored to the capabilities of these devices.