1. Field of the Invention
The present invention is related to language recognition methods for understanding language input from a user, and more particularly, to a method and apparatus for improving language recognition performance and accuracy by resolving ambiguities in the language input using an intermediate to the recognition process.
2. Background Description
Existing language recognition decoders such as automatic speech recognition (ASR) systems, automatic handwriting recognition (AHR) systems and machine translation (MT) systems must deal with large numbers of decoding alternatives in their particular decoding process. Examples of these decoding alternatives are candidate word lists, N-best lists, e.g., for ASR. Because the number of decoding alternatives may be so large, decoding errors occur very frequently.
There are several approaches to minimizing such errors. Typically, in these approaches, users are allowed to correct errors only after the decoder has produced an output. Unfortunately, these approaches still result in too many decoding errors and a cumbersome process, i.e., requiring users to correct all of the errors which, ideally, would be caught by the system.
Other approaches include systems such as using ASR in a voice response telephone system to make appointments or place orders. In such a voice response system, after the user speaks the system repeats its understanding and provides the user with an opportunity to verify whether the system has recognized the utterance correctly. This may require several iterations to reach the correct result.
So, a recognition system can misrecognize the phrase xe2x80x9cmeet at sevenxe2x80x9d having a temporal sense as being xe2x80x9cmeet at Heavenxe2x80x9d which may have a positional sense, e.g., as the name of a restaurant. Unfortunately, using these prior art systems requires the user to do more than just indicate that the recognition is incorrect. Otherwise, the recognition system still has not been informed of the correct response. In order to improve its recognition capability, the system must be informed of the correct response. Further, repeating the recognition decoding or querying other alternative responses increases user interaction time and inconvenience.
Thus, there is a need for language response systems with improved recognition accuracy.
It is a purpose of the invention to provide a method and system for improving language decoding performance and accuracy;
It is another purpose of the invention to resolve language decoding ambiguities during voice recognition, thereby improving language decoding performance and accuracy.
The present invention is a method of language recognition wherein decoding ambiguities are identified and at least partially resolved intermediate to the language decoding procedures. The user is questioned about these identified decoding ambiguities as they are being decoded. These identified decoding ambiguities are resolved early to reduce the subsequent number of final decoding alternatives. This early ambiguity resolution significantly reduces both decoding time and the number of questions that the user may have to answer for correct system recognition. In the preferred embodiment speech recognition system there are two language decoding levels: fast match and detailed match. During the fast match decoding level a comparatively large potential candidate list is generated, very quickly. Then, during the more comprehensive (and slower) detailed match decoding level, the fast match candidate list is applied to the ambiguity to reduce the potential selections for final recognition. During the detailed match decoding level a unique candidate is selected for decoding. In one embodiment decoding is interactive and, as each ambiguity is encountered, recognition is suspended to present questions to the user that will discriminate between potential response classes. Thus, recognition performance and accuracy is improved by interrupting recognition, intermediate to the decoding process, and allowing the user to select appropriate response classes to narrow the number of final decoding alternatives.