The present invention relates to a method and apparatus for the computerized recognition of speech.
Many attempts have been made to develop a computer data base which would allow the computer to understand human speech. Early attempts at computerized speech recognition tried to represent a speech window of approximately 10-20 milliseconds by using a fast Fourier transform, or a similar technique such as linear predictive coefficients. A data bank consisting of templates (acquired averages) of the transform coefficients of known speech parameters was developed. Unknown speech elements were compared with the various templates in an attempt to recognize what was said.
Such early attempts to perform speech recognition were generally failures. One problem was that the templates would incorporate all of the aspects of the known utterance, and would fail to discriminate relevant from irrelevant information. The templates would generally contain so much irrelevant information that the relevant and useful information was essentially buried. Also, the templates could be confused by changes in the tempo of the speech, which occurs even for the same speaker. In addition, these techniques assume a constant frequency distribution for the entire time window. It has been found that the speech pattern changes significantly during the period of the window, which blurs the data when averages are computed and seriously degrades its utility.
More recent attempts at speech recognition have focused on features of the speech which distinguish one spoken utterance from another. One example is to identify sounds by their change in frequency rather than their base frequency. This represents a significant improvement over earlier techniques, because there is less irrelevant information incorporated into templates which are based on frequency change. However, ferreting out identifiable features on the templates which are useful in discriminating one sound from another has been difficult. Also, such techniques generally use a Fourier transform to identify the frequency change. Such a Fourier transform still assumes constant characteristics over a period of time which do not exist.
Speech recognition has succeeded in certain very limited and controlled environments. For example, a voice pattern analysis can be performed which can recognize a known speaker, but such devices do not recognize actual speech. Also, devices have been constructed which can recognize commands in a limited vocabulary spoken slowly and distinctly, in the absence of ambient noise. However, no computerized system has come close to the speech recognition capabilities of the human body, and none has general utility.