1. Field of the Invention
The present invention relates to an apparatus and a method for translating input speech to synthesize and output the translated speech, thereby enabling communication by speech between people who use different languages as their native languages, and to a computer program product for executing the method.
2. Description of the Related Art
Recently, there has been high demand for implementation of a speech translation apparatus that supports communication between people who use different languages as their native languages. Such a speech translation apparatus basically uses a unit that recognizes speech, a unit that translates a character string obtained as a result of speech recognition, and a unit that synthesizes the character string obtained as a result of translation by speech, and can be constituted by sequentially executing a speech recognition process, a translation process, and a speech synthesis process.
A speech translation system that recognizes speech spoken by a user to output character information has been already put to practical use in a format of packaged software or the like. Further, a machine translation system using written words (text) as an input has already been put to practical use similarly in the format of package software or the like. A speech synthesis system has already been put to practical use, and by appropriately using the software, a speech translation apparatus can be realized.
In realizing speech translation, various speech recognition methods and machine translation methods can be used. Regardless of employing any method, improvements on accuracy of speech recognition and machine translation have been major issues.
For example, in example-based machine translation in which translation is performed by using bilingual corpus of source language and target language, all texts cannot be prepared as examples, and as the number of examples increases, texts to be searched relative to an input text increase. Therefore, there is a problem in that user's time and labor are required for selecting an appropriate text.
Further, because the contents of communication by the speech translation apparatus are diversified, to translate the contents by speech accurately, speech recognition, machine translation, and speech synthesis relative to large-scale vocabularies are required. As for words in categories of common nouns, verbs, adjectives, and adverbs, the numbers thereof are limited to some extent, and therefore these words can be registered in advance in a dictionary used for speech recognition, machine translation, and speech synthesis. However, proper nouns such as place names, person's names, cuisine names, store's names, and company names are created almost daily, and all proper nouns cannot be registered in the dictionary in general.
Thus, there are occasions, as experienced in overseas travel, in which a tourist himself needs to speak proper nouns such as place names and store's names in the country or area in the native language, due to no registration of appropriate words for translation in the dictionary. However, in the case of a tourist whose phonetic system is largely different from the language of the country he is traveling, it is difficult to accurately pronounce the words as the native language, and such a situation frequently occurs that the pronounced proper noun cannot be understood.
As the simplest solution to this problem, by using a mobile terminal having a display function of travel guide information and map information to point a specific part in the travel guide information or map information displayed on a display of the mobile terminal, the user can indicate a desired place.
However, the intention of the user cannot be sufficiently communicated only by indicating a place or a place name. For example, it cannot be communicated only by indicating certain facilities whether the user wishes to go to the facilities, or the user wishes to confirm how long it takes to go there, what kind of event they are having now, or how much it costs to go there.
Therefore, a method can be considered in which a display unit that displays the travel guide information and the map information, a unit that indicates a place name or a facility name from the presented information, and a speech translation unit are combined, and translated spoken sound of the user is output to communicate the user's intention.
As a technique involved with this method, such a technique has been proposed that a speech recognizing unit and a map display unit are included, to recognize a pointing operation performed by the user on a map simultaneously with speech recognition, and a semantic structure of a text in which a demonstrative pronoun included in the spoken text is replaced by position information of a specific map is output based on timewise relation between these (for example, see JP-A H09-114634 (KOKAI)).
However, according to the method disclosed in JP-A H09-114634 (KOKAI), the accuracy of speech recognition can be improved by analyzing semantic representation including the demonstrative pronoun, referring to the content instructed by the user. However, there is a problem in that the machine translation accuracy cannot be improved by using the instructed content.