This description relates to speech-to-speech translation.
Speech to speech translation systems generally cascade (1) speech recognition in the source language, (2) language translation, and (3) text to speech in target language. It is desirable to catch speech recognition errors before they propagate through the translation and text-to-speech steps. It is also desirable to reduce delay between the utterance in the source language and presentation in the target language.