OUr invention relates to pattern recognition and more particularly to arrangements for automatically recognizing a continuous speech pattern as a series of words.
In communication, data processing, and control systems, it is often desirable to use speech as a direct input for inquiries, commands, data or other information. Speech recognition devices obviate the need for expensive manually operated terminal equipment and permit individuals to interact with automated equipment while simultaneously engaging in other activities. The variability of speech patterns from speaker to speaker and even for a particular speaker, however, has limited the accuracy of speech recognition. As a result, speech recognition arrangements have been successful only in specially designed environments.
Most speech recognition systems are adapted to receive input speech signals and to transform the speech signals into sets of prescribed acoustic features. The input speech acoustic features are compared to stored sets of previously obtained reference features for identified words. The speech signal is identified when the input speech features match the stored features of a particular reference word in accordance with predetermined criteria. The accuracy of such recognition systems is highly dependent on the selected features and on the prescribed recognition criteria. Best results are obtained when the reference features and the input speech features are derived from the same individual and the speech pattern to be recognized is spoken with distinct pauses between individual words.
Recognition of continuous speech patterns may be accomplished by comparing the sequence of input speech features with every possible combination of reference word feature signal patterns derived from continuous speech. Such arrangements however require time consuming testing on all possible reference word pattern combinations and an exhaustive search through the large number of reference word combinations. As is well known, the number of possible sequences increases exponentially with the number of words in the series. Consequently, it is generally impractical to perform the exhaustive search even for a limited number of words in a pattern. Semantic and syntactic rules may be devised to limit the number of possible sequences in a search so that certain classes of information can be readily analyzed. U.S. Pat. No. 4,156,868 issued to S. E. Levinson May 29, 1979 and assigned to the same assignee discloses a recognition arrangement based on syntactic analysis. But recognition of random sequences of unrelated words such as a series of numbers is not improved by resorting to such contextual constraints.
U.S. Pat. Nos. 4,049,913 and 4,059,725 disclose continuous speech recognition systems in which the similarity between the reference word feature patterns and all possible portions of the input speech pattern are calculated. Partial recognition results are derived from the similarity measures and both the partial similarity measures and the partial recognition results are stored in a table. All possible partial pattern series which form continuous patterns are selected from the table. The continuous pattern for which the similarity is maximum is then chosen. The recognized results from the table are extracted to provide the reference word series corresponding to the input speech pattern. These systems have been effective in continuous speech recognition. The signal processing to obtain reference patterns and partial pattern similarity measures, however, is exceedingly complex and uneconomical for many applications.
Alternative arrangements have been proposed in which an input speech pattern is segmented and each segment is recognized as one of a set of reference words. These alternatives require much less signal processing but do not take into account the high degree of coarticulation in continuous speech which makes accurate segmentation difficult. The coarticulation or merging together of adjacent words in a continuous speech pattern makes recognition unreliable and also makes selection of reference word training patterns difficult. It is an object of the invention to provide improved continuous speech recognition utilizing economical signal processing arrangements and simplified reference word training patterns.