My invention relates to pattern recognition arrangements and, more particularly, to automatic speech recognition systems incorporating syntactic analysis.
In communication, data processing and control systems, it is often desirable to use speech as direct input for inquiries, commands, data or other information. Speech input arrangements may be utilized to record information, to request information from processing equipment, or to control machine tools or other apparatus. Because of the variability of the speech signal from speaker to speaker and the variability for even a particular speaker, the degree of accuracy of speech recognition has been limited.
One type of priorly known speech recognition system receives an input speech signal and transforms the speech signal into a set of prescribed acoustic features. The set of features is compared to stored sets if previously obtained reference features corresponding to the possible words to be recognized. When the prescribed features of the input speech signal correspond to a particular set of reference features in accordance with predetermined criteria, the word associated with the corresponding set of reference features is identified as the input speech signal. It is readily seen that the reliability of the recognition system is highly dependent on the selected features and on the prescribed recognition criteria. Where the reference features and the features of the input speech signal are obtained from the same speaker and the word to be recognized is spoken in isolation, the recognition system is relatively simple and its accuracy is improved.
The accuracy of recognition of a series of spoken words can be further improved by resorting to various non-acoustic sources of information, such as syntax or semantics, to detect and correct inaccuracies in the acoustical recognition of single words on the basis of prescribed rules governing the relationship among the acoustically recognized words in the series. For example, a series of acoustically recognized words may be compared to each of a set of previously stored allowable sequences of reference words. In this manner, impermissible sequences can be discarded and permissible sequences similar to the combination of acoustically recognized words can be detected. Such an arrangement requires an exhaustive search of all syntactically or semantically allowable sequences. It is known, however, that even a limited series of words results in a large set of allowable sequences and that the number of allowable sequences increases exponentially with the number of words in the series. Therefore, an exhaustive search through the store of all allowable sequences of reference words to find the allowable sequence with the closest correspondence to the series of acoustically recognized words is impractical.
The article, "On the Use of Syntax in a Low-Cost Real Time Speech Recognition System," by Richard B. Neely and George M. White, appearing in Information Processing 74, published by North Holland Publishing Co., 1974, describes a selective syntax recognition technique in which an heuristic search in made through lists of acoustically recognized candidate words to select a sequence of candidate words that conform to prescribed word juxtaposition rules and are acoustically likely. Once a syntactically correct and acoustically likely sequence is found, it is identified as the series of spoken words. Other allowable acoustically likely sequences for the series of spoken words, however, are ignored.
Another syntactic analysis arrangement described in the article, "The Vocal Speech Understanding System," by S. E. Levinson, appearing in Proceedings of 4th International Joint Conference on Artificial Intelligence, Tbilisi, U.S.S.R., September 1975, discloses a speech understanding system which tests an acoustically recognized input word sequence to determine whether it conforms a prescribed syntactic rules and utilizes semantic analysis of unsuccessful sequences to correct words in the acoustically recognized sequence which corrections force the sequence to follow the syntactic rules. Since the semantic analysis is heuristic, several syntactically correct candidate sentences may be generated. The candidate sentence conforming to the syntactic and semantic restraints which is most similar to the acoustically recognized sequence is identified as the input word sequence. While the identifying sequence is selected from a plurality of heuristically formed sequences, other equally likely sequences are ignored.