1. Field of the Invention
The present invention relates to a method, an apparatus, and a computer program product for machine-translating a first language speech into a second language speech.
2. Description of the Related Art
A human interface technology using a speech input has been put to practical use in recent years. As an example of the speech-based human interface technology, a voice-activated operating system is operated by a user's speech. Upon receiving a speech input instructing a predetermined command that is set by the user in advance, the voice-activated operating system recognizes the speech input and executes the command. Another example of the speech-based human interface technology is a system that analyzes a user's speech and converts the speech into a character string, to create a document.
Moreover, a speech translation system translates a speech in a first language into a speech in a second language and outputs the speech in the second language, thereby supporting a communication among people with different languages. Furthermore, a speech interaction system makes it possible for a user to interact with the system in spoken language.
In the systems described above, a speech recognition technique is used in such a manner that a speech signal included in a user's speech is converted into a digital signal and the digital signal is compared with predetermined patterns so that contents of the speech is recognized into a source text. In the speech recognition technique, to improve a recognition accuracy, a statistical language model such as an N-gram language model is used to select the most probable candidate from a plurality of candidates that are recognized from the comparison with the predetermined patterns. In this case, the most probable candidate is selected by referring to an example of speech content that is stored in advance.
Furthermore, a machine translation technique is used in the above systems, with which a source text that is content of speech in a first language obtained by using the speech recognition technique, is machine-translated into a target text in a second language as a target language. As a method for machine translation, for example, a rule-based translation, an example-based translation, and a statistical translation are currently used. In the rule-based translation method, a first language text is translated into a second language text based on rules for a correspondence between lexical structures or a correspondence between sentence structures in both languages. In the example-based translation method, bilingual example pairs, which are semantically-equivalent examples in the first language and the second language, are collected as many as possible so that a target second-language translation can be obtained by referring to the bilingual example pairs. In the statistical translation method, a translation of a first-language input, i.e., a second-language output is obtained by referring to statistical information based on a massive amount of example data.
However, in the speech recognition technique, it may happen that a recognition result is affected by a surrounding environment such as a noise, or the recognition result varies depending on user's vocal conditions such as tone, volume, and speaking speed. In addition, to support every type of speech sentences, it causes an increase in a processing load, for example, a load for comparing with the predetermined patterns. Therefore, it becomes difficult to achieve a sufficient recognition accuracy.
Furthermore, in the example-based translation, it is virtually impossible to collect examples relevant to all sentences in advance because an infinite variation of phrases exists. Therefore, it is scarcely possible that a second-language example is retrieved by the example-based translation method. In addition, although it is possible to translate any sentence by the application of generic rules in the rule-based translation method, it is still difficult to obtain a natural translation.
To solve the above problems and achieve a highly accurate translation result, U.S. Pat. No. 6,356,865 discloses a hybrid translation method that is a combination of a plurality of machine translation methods, for example, a combination of the example-based machine translation method and the rule-based machine translation method.
However, in the hybrid translation method, it is not possible to provide an appropriate input for each of the translation methods employed in the hybrid translation method. For example, in the hybrid translation method described above, only a recognition result obtained by the typical speech recognition method using, for example, a hidden Markov model (HMM), is provided as an input for translation processing.
Therefore, even in a case in which an accuracy of speech recognition can be increased if a different speech recognition method is used, a result of the machine translation is not sufficiently accurate because a machine translation process is performed based on a recognition result with low accuracy, which is obtained by the predetermined speech recognition method.