This invention relates to pattern classification, such as speech recognition in trained systems.
More particularly, the present invention relates to apparatus and method for the rejection of out-of-class inputs.
Pattern classification systems can be used in two typical situations. A closed set situation involves a fixed set of known classes. Given an input, the classifier is required to pick the best choice from this list of classes. Alternatively, a classifier may be used in an open set manner. A general scenario for open set classification is that the classifier is presented with feature vectors from a single class. The classifier then determines if the features are from a previously known class or an unknown class. An application of the open set problem is out-of-vocabulary rejection in speech recognition. Speech recognition is used as an example herein because of its widespread knowledge and use, but it will be recognized by those skilled in the art that virtually all pattern recognition systems are equally applicable. In the speech recognition case, the recognizer has a known vocabulary; typically, the user would prefer flagging of an unknown word to misrecognition.
Using a speech recognition system as a typical example, assume that in a specific instance the system is looking for a xe2x80x9cyesxe2x80x9d or a xe2x80x9cnoxe2x80x9d. In many different situations, the user might utter some out-of-vocabulary sounds, such as xe2x80x9cohxe2x80x9d, xe2x80x9cahxe2x80x9d, xe2x80x9cerxe2x80x9d, or the user might cough or clear his throat. Typically, the speech recognition system looks at components of the utterance, compares them to components of the words it is looking for, i.e., yes and no, and uses a threshold to determine whether the utterance is sufficiently close to one of the words to be positively recognized. Here the problem is two-fold. In many instances the utterance may be so close,(e.g. xe2x80x9cohxe2x80x9d and xe2x80x9cnoxe2x80x9d) it is misclassified. To add to this problem, in a noise system or under noise conditions, much of an utterance may be masked or lost. Thus, while the threshold remains constant, the masking (e.g. noise, closeness of the received word, or signal, etc.) may vary substantially under different operating conditions.
Current speech recognition applications experience significant limitation in customer acceptance due to inadequate rejection of out-of-class input. However, reject options for decision rules are not a new area for pattern recognition. The optimum rejection rule for pattern recognition (Bayes rejection rule) was introduced over 30 years ago in an article by C. K. Chow, entitled xe2x80x9cOn optimum recognition error and reject tradeoffxe2x80x9d, IEEE Trans. Inf. Theoryxe2x80x9d, IT-16, no. 1, pp. 41-46, January 1970. Nevertheless, this work, and extension to it, assume we have exact knowledge of the class statistics.
In the case of speech recognition, only an estimate of the class statistics is available to the pattern recognition system. Furthermore, when the input speech is corrupted by noise, resulting in mismatched conditions, the original probability distribution estimates are no longer good approximations to the actual distributions. Thus, the optimal Bayes reject rule in quiet conditions is no longer valid.
Recent work to improve the rejection criterion for out-of-vocabulary words has focused on likelihood ratios between in-class model scores, as well as garbage, or filler, models to model the out-of-class feature space. See for example: C. S. Ramalingam et al., xe2x80x9cSpeaker-dependent name dialing in a car environment with out-of-vocabulary rejectionxe2x80x9d, Proc. ICASSP, pp. 165-168, 1999; A. Bayya, xe2x80x9cRejection in speech recognition systems with limited trainingxe2x80x9d, Proc. ICSLP, 1998; H. Boulard, B. D""hoore and J. M. Boite, xe2x80x9cOptimizing recognition and rejection performance in wordspotting systemsxe2x80x9d, Proc. ICASSP, pp. I-373-I-376, 1994; and R. C. Rose and D. B. Paul, xe2x80x9cA hidden Markov model based keyword recognition systemxe2x80x9d, Proc. ICASSP. pp. 129-132, 1990. However, a score based threshold is still employed in order to provide a mechanism for trading-off recognition and rejection error rates based on some cost function. The use of a threshold leads to significant performance degradation in severely mismatched conditions. This is due to the altered statistics of the input features, resulting in a compression of the score range.
Accordingly it is highly desirable to provide pattern recognition apparatus and a method of overcoming these problems. Further, it is highly desirable to provide apparatus and methods of overcoming the problems without substantially increasing the computations required or the amount of memory used.