Speech translation systems combine recognition of speech with translation from one language (“source language”) to another language (“target language”) followed by optional synthesis or text output in a target language. The development of such systems requires development of high performance speech recognition systems and translation systems. For their development, these systems require substantial data resources based on how the recognition and translation engines are trained or developed. Thousands of spoken sentences have to be transcribed, and thousands of sentences in one language have to be translated into another. Moreover, data collection has to be redone with each new language and, when necessary, for different domains and genres.
Thus, there is a need for methods and apparatuses that allow speech translation systems to be trained or to “learn” from examples provided by human simultaneous translators. There is a further need for methods and apparatuses in which speech data in both source and target languages are presented and which speech and translation engines iteratively learn together, thereby foregoing the labor intensive and costly steps of annotating data from speech and translating texts and of training and optimizing the speech recognition and translation engines independently first before system combination is attempted. Also, there is a need for a field-correctable translation system in which a person in the field of use can correct errors made by the system so that the system will adapt. There is a further need for a translation system that is adept at translating languages for which there is not a large written corpus.