The present invention relates to machine recognition of speech.
Computer recognition of speech has heretofore been a formidable problem. Rather than approach the entire problem, attempts have been made to restrict the problem to specific areas, thus simplifying the recognition task. Much previous work has been done on isolated word recognition, both speaker dependent and speaker independent, and digit recognition. Current systems are able to perform acceptably well for some applications, although by no means is recognition under even highly constrained circumstances completely reliable.
The problem of Continuous Word Recognition (CWR) is more difficult than isolated word recognition. Greater difficulties are encountered in determining which words are actually spoken, as well as how they should be linked together to form sentences.
Attempts to recognize all spoken sentences without any restraints does not appear to be within the capability of current technology. Prior attempts to narrow the problem area have resulted in use of grammars or other constricting methods to determine which words are allowed to follow other words. When a word is recognized, a determination is made of a set of next words which are allowed by the grammar. A word hypothesizer is then instructed to try to match the following utterance with that set.
Such prior art solutions produce systems which are too tightly constrained in terms of vocabulary and grammar which can be handled, and result in unacceptably high recognition error rates. If an error occurs anywhere during an utterance, it is often difficult or impossible to recover, and the speech sequence must be aborted and started over.
It is therefore an object of the present invention to provide a robust speech recognition system which can handle relatively complex vocabularies and grammars. It is another object of the present invention to provide a speech recognition system which can consider several sentence alternatives in parallel, and select the best one.
Therefore, according to the present invention, a word hypothesizer and sentence recognizer are provided which are loosely coupled. The word hypothesizer constantly generates word hypotheses based on an incoming speech signal. The sentence recognizer assembles the hypotheses into allowable partial and complete sentences. The output of the word hypothesizer is not restricted by feedback from the sentence recognizer. Techniques including the measurement of the time gaps and time overlaps between words can result in the inclusion of additional selection criteria.
The novel features which characterize the present invention are defined by the claims. The foregoing and other objects and advantages of the present invention will hereafter appear, and for purposes of illustration, but not of limitation, a preferred embodiment is shown in the accompanying drawings.