Speech recognition is the process by which computers interpret acoustic patterns in human speech. Generally, there are two types of speech recognition. The first type is often called “dictation speech recognition.” With dictation speech recognition, a user's speech can include a continuous stream of spoken words that are each recognized and converted to text. Dictation speech recognition is often used for applications such as medical transcription, legal and business dictation, and general word processing.
The second type of speech recognition is commonly called “command and control speech recognition.” Command and control speech recognition systems are often integrated into larger systems such as personal computers, mobile phones, call-routing systems, or interactive data retrieval systems such as travel information systems. Generally, such systems perform speech recognition in order to cause the larger system to perform functions and actions in response to the user's speech. Thus, recognition of the user's spoken words or utterance, “Open Netscape” or “Call Bill Gates” would cause the larger system to do just that.
For such applications, it is important for these speech recognition systems to reliably detect and reject the Out-Of-Vocabulary (OOV) words and “misrecognized” in-vocabulary words. An OOV word is a word spoken by the user that is not in the list of words, lexicon, or vocabulary that can be recognized by the system. A misrecognized word is a word within the vocabulary or “in-vocabulary” that is recognized incorrectly. Misrecognition can be caused, for example, by background noise or a user's speaking style or accent.
Generally, command and control speech recognition systems search the user's utterance in order to identify or select words that are most likely to be specific command and control words. However, OOV or misrecognized words can cause the speech recognition system to output an erroneous command. This erroneous command causes the larger system to perform actions and functions not intended by the user. Thus, it is important for speech recognition systems, to reliably detect OOV words and misrecognized words for rejection.
Various confidence measures have been proposed to measure recognition reliability. Generally, confidence measures are some measure of probability that a word has been recognized correctly. Often, a word is only recognized when its corresponding confidence measure exceeds a particular threshold or probability value. Thus, confidence measures and thresholds are designed to increase the reliability of the speech recognition system.
Some methods of measuring confidence rely on posterior probability. Posterior probability can be viewed as a revised probability obtained by updating a prior probability after receiving new information. Word graph-based and analogous methods are often used to estimate posterior probability. However, for some Context Free Grammar or “CFG”-based applications, such as command and control speech recognition, the word graph generated by the speech decoder can be too sparse for reliable posterior probability computation.
With a sparse word graph, the best or most probable path can become dominating, thereby causing an artificially high posterior probability estimate regardless of correctness. This artificially high posterior probability score can result in OOV and misrecognized words escaping detection and rejection.
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.