1. Field of the Invention
The present invention relates to a speech translating apparatus, a speech translating method, and a speech translating program product.
2. Description of the Related Art
In recent years, research into elemental technologies including speech recognition, machine translation, and speech synthesis has been progressing. Through combination of speech recognition, machine translation, and speech synthesis, practical application of a speech translation system can be realized. In the speech translation system, when an input of speech in a source language is received, speech in a target language is outputted.
However, numerous technical problems remain in each elemental technology. Therefore, it is difficult to actualize a system with accuracy high enough to always correctly recognize and translate speeches made by a user.
For example, in speech recognition, measures are required against surrounding noise present in an environment in which speech recognition is performed, a sudden pause in the user's speech, and the like. However, it is difficult to completely eliminate errors caused by the surrounding noise, sudden pauses, and the like. Moreover, a correct result cannot be attained if a text including a speech recognition error, such as the errors described above, is mechanically translated.
Furthermore, in mechanical translation, contextual processing technology for performing discriminative translation depending on context remains underdeveloped. As a result, a correct translation cannot always be made.
Therefore, numerous interfaces used to detect errors in speech recognition, mechanical translation, and the like are being proposed. When users converse in real time, the interfaces serve an important role in reducing complicated operations and waiting time required as a result of a system being provided between the users.
For example, a following technology is proposed in JP-A 2000-29492 (KOKAI). A phrase including a recognition error is automatically detected from a text that has been converted from an inputted speech. The detected phrase is presented by text or by speech to a speaker who had made the speech. The speaker corrects the error.
Only the erroneous phrase is presented to the speaker who speaks in a source language. Therefore, work involved with checking a content of an entire speech is reduced. A technology such as this can shorten time required for checking.
However, in the technology described in JP-A 2000-29492, a following series of procedures does not change. A source language speaker speaks. A speech recognition result is audibly outputted. A correction speech made by the user is recognized again. Then, a speech is outputted in a target language. Therefore, a time lag between when the source language speaker speaks and when the speech is transmitted to a speaking partner is long.
Furthermore, although automatic error detection is performed, not all erroneous phrases can be automatically detected. Therefore, the speech in the target language is outputted to the speaking partner without the source language speaker noticing the error, thereby causing a misunderstanding between the source language speaker and the speaking partner.