Our invention relates to pattern recognition arrangements and, more particularly, to automatic continuous speech recognition systems incorporating syntactic analysis.
In communication, data processing and control systems, it is often desirable to use speech as a direct input for inquiries, commands, data or other information. Speech input arrangements may be utilized to record information, to request information from processing equipment, or to control machine tools or other apparatus. Because of the variability of the speech signal from speaker to speaker and the variability for even a particular speaker, the degree of accuracy of speech recognition has been limited.
One type of priorly known speech recognition system receives an input speech signal and transforms the speech signal into a set of prescribed acoustic features. The set of features is compared to stored sets of previously obtained reference features for possible words to be recognized. When the prescribed features of the input speech signal correspond to a particular set of reference features in accordance with predetermined criteria, the word associated with the corresponding set of reference features is identified as the input speech signal. It is readily seen that the reliability of the recognition system is highly dependent on the selected features and on the prescribed recognition criteria. Where the reference features and the features of the input speech signal are obtained from the same speaker and the word to be recognized is spoken in isolation, the recognition system is relatively simple and its accuracy is improved.
The accuracy of recognition of a series of spoken words can be further improved by resorting to various non-acoustic sources of information such as syntax or semantics. The non-acoustic information sources are used to detect and correct errors in the acoustical recognition of single words on the basis of prescribed rules governing the relationship among the acoustically recognized words in the series. For example, a series of acoustically recognized words may be compared to each of a set of previously stored allowable sequences of reference words. In this manner, impermissible sequences can be discarded and permissible sequences similar to the combination of acoustically recognized words can be detected. Such an arrangement requires an exhaustive search of all syntactically or semantically allowable sequences. It is known, however, that even a limited series of words results in a large set of allowable sequences and that the number of allowable sequences increases exponentially with the number of words in the series. Therefore, an exhaustive search through the store of all allowable sequences of reference words to find the allowable sequence with the closest correspondence to the series of acoustically recognized words is impractical.
In U.S. Pat. No. 4,156,860 issued to S. E. Levinson May 29, 1979, and assigned to the same assignee a syntactic analyzer is described in which a series of spoken words is recognized as one of a plurality of predetermined sentences. A state sequence array defines the predetermined sentences in terms of state linked prescribed words. Each sentence corresponds to a selected plurality of state connected prescribed words ending in a final state. For each word position of the input series, a set of signals representative of the acoustic correspondence of the input series position word and the array prescribed words is generated.
A cumulative correspondence signal is produced for each sequence from the series position correspondence signals responsive to the state sequence array. Upon termination of the last word position of the input spoken word series, the sentence in its final state having the closest cumulative correspondence to the spoken word series is identified. This syntactic analyzer is adapted to recognize sentences or phrases when each word of the input series is spoken in isolation. There are many uses for speech recognizers, however, where the input utterances are not a series of isolated words but are continuous speech patterns with coarticulation. In such applications the utterance to be recognized must be segmented into separate words for syntactic analysis to be performed.
Prior art U.S. Pat. Nos. 3,816,722, 4,049,913 and 4,059,725, disclose arrangements for automatic recognition of continuous speech in which similarity measures between stored reference word patterns and patterns of an unknown utterance are calculated to select a reference pattern corresponding to the utterance. Both segmentation and word selection are made on the basis of the reference word pattern with the greatest similarity to the partial pattern for the input utterance. Arrangements based solely on similarity measures are useful in systems receiving spoken digit series where any order of digits is possible and there are no syntactic restraints.
It is often required, however, to recognize an utterance as one of a set of sentences which do have syntactic restrictions. In airlines reservation systems, for example, a typical request may be "What is the fare". The syntactic constraints, i.e., the arrangement of predetermined words in the sentence, inherent in this type of request can substantially aid in the recognition of an utterance. Prior art systems which segment and recognize solely on the basis of similarity are not adapted to utilize syntactic analysis in the choice of sentence corresponding to an input utterance. It is an object of the invention to provide improved recognition of continuous speech wherein syntactic arrangements are utilized.