Systems for speech recognition have been developed to recognize and understand an utterance of a word or words in a sequence. The recognition of an utterance comprises analyzing the sequential phonetic parts of the utterance and generating probabilities that spoken words are found in a sequence of words. Other systems have developed a Rejection Grammar with a vocabulary of words. The highest probabilities are developed for the utterances being in the Main and in the Rejection Grammars, the highest probability determines if the speech recognizer accepts or rejects the utterance.
FIG. 1 is a block diagram/flow chart for a Main Grammar system. Each path 2 starts from a first node 4 and ends at a second stop node 6. Each path consists of a word or a sequence of words. For a given utterance, the speech recognition system (SR) will map every path in the Main Grammar for the utterance, and the system will compute probability scores for every path. The word sequence in the Main Grammar that has the highest score is accepted by the system as the "correct" one by the SR, and the appropriate/sensible response is output as determined by the system design.
However, a limitation of system using only a Main Grammar as in FIG. 1 is illustrated by the following example. A question is posed, "Are you past your twentieth birthday?" The only utterances that are an acceptable answer are "yes" or "no". which are the only two word paths of the Main Grammar. If a "yes" or a "no" is returned the system will compute the score for each response, which will be high. The system will determine that response was in the Main Grammar, and, thus, "understood" by the system. The system can make a sensible response depending on the specific application. For example a line on a form can be completed by the system. However, if some other utterance is received, e.g. "good-bye," the system will compute the probability score that the "good-bye" response was a "yes" or a "no." The system will accept the higher of the scores and an error will occur.
MAIN GRAMMARS:
Recognition of words in the Main Grammar using triphoneme probabilistic models have been developed as described in the references below. The present invention relates only to Rejection Grammars and will operate with substantially any Main Grammar. One prior art speech recognition approach (Main Grammar) is described in a paper, State of the Art in Continuous Speech Recognition, by John Makhoul and Richard Schwartz, published in the Proceedings of the National Academy of Sciences on Feb. 8-9, 1993. This paper is incorporated herein by references as if set out in full. The words are recognized by use of the triphoneme model using Hidden Models and algorithms such as the Viterbi. Grammars used to determine the speech can be complex as the size of the vocabulary 30 and the length of the utterance being analyzed increase.
WORD REJECTION GRAMMARS
In order to handle out-of-grammar response researchers have developed a "Rejection Grammar" theory with the Rejection Grammar formed in parallel to the Main Grammar as shown in FIG. 2. The Main Grammar 8 remains as in FIG. 1. The system, when an utterance is received, will compute the highest probability scoring of all the paths in the Main Grammar as well as the highest scoring path of all the paths in the rejection Grammar. If the highest score was found for a path in the Rejection Grammar (higher than any score found for the Main Grammar) the utterance is rejected as "out of grammar." If the highest score was found for a path in the Main Grammar, the utterances is accepted as in the grammar and the SR system so reports. The system will perform, as designed, to that "understood" utterance.
A practical implementation of a Rejection Grammar is shown in FIG. 3, which is a Select Word Rejection Grammar. In this system, there is a vocabulary of words in a Main Grammar, and a vocabulary used in the Select Word Rejection Grammar comprised of all the words in the Main Grammar vocabulary plus some number (maybe large) of additional words. As shown in FIG. 3A, the Select Word Rejection Grammar provides a back arc 12, and a weighting factor 15 path to a Begin node 14, that are improvements over the system of FIG. 2. The loop back 12 acts to process the sequence of words in the utterance and calculate probabilities over multiple parallel word path representing all the possible word combinations in the Rejection Grammar.
For example, with respect to FIG. 3, WORD 1, WORD 2 . . . WORD N are word paths which run from the BEGIN node 14 to the LAST node 16. These are the first group of word paths used. However, because there is a BACK ARC 12 from the LAST node 16 back to the BEGIN node 14, the same word paths, WORD 1, WORD 2 . . . WORD N, are used again, and again. The actual operation of the Rejection Grammar with the BACK ARC supports all combinations of the possible word paths as follows, first showing the single word path and then for combinations of using the BACK ARC once and the word paths twice:
______________________________________ (WORD 1) (WORD 2) ........ (WORD N) (WORD 1)(WORD 1) (WORD 1)(WORD 2) ................ (WORD N)(WORD 1) (WORD N)(WORD 2) ................ (WORD N)(WORD N). ______________________________________
Continuing use operation with the BACK ARC 12 produces:
______________________________________ (WORD 1)(WORD 1)(WORD 1) (WORD 1)(WORD 1)(WORD 2) ....................... (WORD N) ............WORD N) ______________________________________
The operation of the BACK ARC 12 word loop yields all the combinations of all the words in the Rejection Grammar vocabulary as possible word sequence paths.
The Select Word Rejection Grammar is built using all the words from the Main Grammar plus some number of added "words" (which may be real words or simply non-real word sounds). Of course, the words and the sounds are digitized representations suitable for processing via computer systems. All the words in the language should be included in the Rejection Grammar so that spoken words outside the Main Grammar, when analyzed in the Rejection Grammar will have a high probability of being rejected. However, if the same recognition techniques are used in the Main and Rejection Grammars, an actual utterance analyzed in the parallel paths of the Main and the Rejection Grammars, may produce the same probability score in both Grammars. To avoid this occurrence the weighting factor 15 is set to less than one.
In order to make the speech recognition more accurate by reducing the false rejection rate, the additional "words" in Rejection Grammar word set (those words over and above the words in the vocabulary of the Main Grammar) are constructed as follows:
1) include a specific word set; and PA1 2) remove any word from the specific word set that has a similar sound to any word in the Main Grammar.
Still referring to FIG. 3, a weighting factor 15 of less than one reduces the calculated probabilities for the Rejection Grammar. This factor is known in the art as "Rejection Sensitivity." Rejection Sensitivity is a tradeoff between two main types of recognition errors: false acceptance (false positives)--where an illegal utterance is accepted as a legal utterance; and false rejection (false negative) where a legal utterance is rejected as an illegal utterance. If the Rejections Sensitivity is lowered there will be fewer false negatives but more false positives. The relative number of false positives to false negatives can be skewed towards one or the other with the "Rejection Sensitivity" in specific applications by the designer. A discussion of the tradeoff and the implementation is found in Addendum 1, to this application, HARK Recognizer Reference Manual, Chapter 4: Rejection, release 2.0, BBN Corp. It should be noted that there is a basic aim of all SR systems to reduce both false accept and false reject recognition errors, to maximize accuracy.
The above prior art systems, utilizing a Select Word Rejection Grammar, are limited by the need for large memories when large vocabularies are used. The total size of a grammar including a Main and a Rejection Grammar will be at least two times the size of a Grammar using only a Main Grammar. This large size is a limitation of the use of prior art word Rejections Grammars. This large size also can increase the cost and decrease the speed of the SR systems. Moreover, as described and recognized in prior art, Select Word Rejection Grammars have been found to be not suited for applications with large vocabularies of nearly all the words in any given language.
It is an object of the present invention to provide a practical speech recognition system with rejection functionality applicable for any size vocabularies.
It is an object of the present invention to provide a speech recognition system based on a Main Grammar and a Rejection Grammar wherein the Rejection Grammar Vocabulary remains substantially the same size regardless of the size of the Main Grammar vocabulary.
Another object of the present invention is to improve the accuracy, speed, cost, and size of vocabularies of speech recognition systems with rejection functionality.
It is another object of the present invention to provide a speech recognition system with rejection functionality for use with nearly all the words in a language.