A dialog system is a computer system that is designed to converse with a human through a coherent structure using text, speech, graphics, or other modes of communication on both the input and output channels. Dialog systems that employ speech are referred to as spoken dialog systems and generally represent the most natural type of machine-man interface. A major challenge in designing dialog systems is to ensure that they correctly understand the user's vocal input. At present, no speech recognition system is 100% perfect, and thus all systems suffer from some degree of fault with regard to input recognition. A user often responds to misunderstood input by repeating the utterance that is problematic for the system. Thus, the presence of repeated utterances by a user is a good indication that the dialog system is not operating properly. However, many systems do not detect such repeated utterance reliably enough so that corrections can be properly made. This problem can be due to several factors, such as the variability of users, which makes it difficult to detect repeats accurately, or changes in user input while repeating word or phrase. When a spoken dialog system conveys back a misunderstood message, it is not uncommon for a user to repeat the utterance with certain acoustic variations. These variations can lead to worse recognition performance, and create further misunderstanding, thus leading to low user satisfaction.
Present methods of repeat detection in dialog systems often use dynamic time warping (DTW) processes. DTW is a process for measuring the similarity between two sequences which may vary in time or speed. It finds an optimal match between two sequences by warping them non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. A common application of this method is in present car-navigation systems, which use DTW and N-best hypotheses overlapping measures on a location name input task. This system takes only the misrecognized parts of the original utterance as the correction utterance, and judges whether the correction utterance is included in the original utterance. For more spontaneous speech, the system can be extended to detect the common parts between the original and correction utterances, where the repeated content may appear at any position within the correction utterance. A significant drawback to this system is that the order of the words or components of the repeated utterance must be the same as in the original utterance. Since users can easily alter the order of words while repeating them to the system, such a limitation can effectively impact the performance of the speech recognition capability of the system.
Present repeat detection systems also often require access to the internal components of the speech recognizer process. Such access is often not convenient or even available, thus further limiting the effectiveness of these repeat detection systems.
What is needed, therefore, is a dialog system that reliably detects repetitions in user input without requiring access to the internal information of the speech recognition engine.
What is further needed is a dialog system that detects repeats without requiring the order of the words within a repeated phrase be the same in the correction utterance and the original utterance.