There is a machine translation device to accept an input of a character string of source language described in a first language and to translate the character string into another character string described by a second language (a user's desired language). Furthermore, by recent development of speech language processing technique, a speech translation device to translate a speech of the first language uttered by one user (a first speaker) into the second language and to output to the other user (a second speaker), is realized.
In a speech recognition dictionary and a translation dictionary used by the speech translation device presently, vocabularies in a range provided by the developer are only stored. Especially, in the speech translation device, by limiting or switching recognizable vocabularies and translatable vocabularies based on a scene or a situation where this device is used, the performance is improved. This reason can be imagined by a fact that, if a user hears a talk with premise knowledge to some extent, it is easier for the user to understand the talk than hearing the talk without the premise knowledge.
On the other hand, technique to translate a new word of the first language into the second language and to register this translation result into the speech recognition dictionary for the second language, is disclosed in PCT international publication number WO 2009/129315.
In actual conversation via a speech interpretation device, by hearing mutual utterances and by confirming respective interpretation results between first and second speakers, the conversation is advanced. As to a word not existed in the other party's language (the second speaker's language), for example, in translation from Japanese to Chinese, or translation from Japanese to English, this word is transliterated into representation in the Latin alphabet. Furthermore, in translation from English to Japanese, this word is outputted as it is by representation in the alphabet, or transliterated into representation in the Japanese syllabary (the square form of kana).
In this case, when the other party (the second speaker) cannot estimate speech from representation of the translation result, the other party often utters based on the first speaker's utterance. Accordingly, if the speech recognition dictionary is updated by only representation of the word according to conventional technique, the word is not correctly recognized by uttering the word with different pronunciation from the speech recognition dictionary.
For example, Japanese word “ (Nattou)” (fermented soybeans) is thought about. This word is translated into Chinese word “” (Pinyin (Chinese Romanization system): na4dou4) and English word “Natto”. When a foreigner watches this translation result and tries to utter this word in next speech, except for utterance by watching representation of his/her native tongue, the foreigner generally utters by imitating Japanese pronunciation “Nattou”. This Japanese pronunciation “Nattou” is not directly related to Chinese word “” and English word “Natto”. Accordingly, in conventional technique to recognize the foreigner's utterance by using only a pronunciation estimated from the character string of the translation result, the speech recognition of the foreigner's utterance is failed.
Furthermore, in order to raise the translation accuracy, it is considered that all translatable words and all words acquired as translation results are previously registered into the speech recognition dictionary. However, if the number of recognizable words is increased disorderly, possibility to generate incorrect words having utterances similar to the correct word is raised. As a result, the interpretation accuracy is not always improved.