This invention relates to active error detection and resolution for interactive linguistic translation, and more particularly for speech-to-speech translation.
The bulk of research exploring speech-to-speech systems has focused on maximizing the performance of the constituent automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) components in order to improve the rate of success of cross-lingual information transfer. Comparatively little effort has been invested in the exploration of approaches that attempt to detect errors made by these components, and the interactive resolution of these errors with the goal of improving translation/concept transfer accuracy.
Previous work presented by Stallard et al., “Recent Improvements and Performance Analysis of ASR and MT in a Speech-To-Speech Translation System,” Proc. ICASSP 2008, pp. 4973-4976, included a methodology for assessing the severity of various types of errors in BBN's English/Iraqi S2S system. These error types can be broadly categorized into: (1) out-of-vocabulary concepts; (2) sense ambiguities due to homographs, and (3) ASR errors caused by mispronunciations, homophones, etc. Several approaches, including implicit confirmation of ASR output with barge-in and back-translation, have been explored for preventing such errors from causing communication failures or stalling the conversation, for example, as described in U.S. Pat. No. 8,515,749, titled “Speech-to-Speech Translation”.
However, previous approaches impose a burden of error detection, localization, and recovery on the user, requiring the user to infer the potential cause of the error and determine an alternate way to convey the same concept. This may require high-level technical knowledge of how S2S systems work, and can be impractical for the broad population of users.