1. Technical Field
The present invention relates to a translation apparatus, translation method and translation program using a bilingual example sentence dictionary.
2. Related Art
Machine translation is a transformation from a language into another language using a computer. It has been researched and developed for half a century around the world. Approaches for machine translation can be roughly classified into three; (1) analysis-based machine translation approach, (2) example-based machine translation approach, and (3) statistic-based machine translation approach.
The analysis-based machine translation approach is a technique that performs, for example, morphemic analysis, syntactic and semantic analysis of a first language, transforms the result of the analysis into a second language, and generates a translation sentence in the second language. The technique for analyzing natural language is still immature, and thus the practical use of the analysis-based machine translation approach has been facing limit. In addition, because of lack of learning capability, it is difficult to improve or alter a translation engine.
The statistic-based machine translation approach is a technique that builds a translation model by using a language model and statistical model. Learning data (corpus) that is necessary for building each model is limited, and thus it is difficult to put it in practical use.
The example-based machine translation approach mimics a mechanism in which a human learns a foreign language. It translates a new document by referring to translation of example sentences that are already learned. This approach was first proposed by Professor Nagao in the 1980s. Since then, research and development of this approach have been intensively conducted.
Alternatively, there are translation assisting systems for assisting translation work. Translation assisting software differs from machine translation software in that, when a sentence is not be correctly translated, the translation assisting software provides to a translator a similar example sentence and a translation sentence or partial translation result of the example sentence from a stored bilingual example sentence dictionary.
FIG. 23 illustrates an outline of a bilingual example sentence dictionary. The bilingual example sentence dictionary shown in FIG. 23 includes a memory 1 that stores plural example sentence pairs of an example sentence in Chinese and an example sentence in Japanese corresponding to the Chinese sentence. When an input sentence 2 in Chinese is inputted by a user, an example sentence search portion 3 searches for an example sentence in Chinese that matches the input sentence 2, and outputs a translation sentence 4 in Japanese that corresponds to the input sentence 2.
With a bilingual example sentence dictionary of a related art, matching between an input sentence and an example sentence has been searched for, and thus information of translation sentences other than the translation of the example sentence that matches the example sentence cannot be obtained. Therefore, even if an example sentence that is similar to the input sentence is stored, a user cannot use the similar example sentence, which means that the bilingual example sentence dictionary has not been effectively used. In addition, when a document data scanned using an OCR (optical character recognition) system is used for the input sentence, if there is any error recognition in the scanning, the matching with example sentences results in mismatch, and translation of the input sentence cannot be obtained although the matching example sentence is stored.
As methods for searching for an example sentence in a bilingual example sentence dictionary, there are a character index method and word index method. The former creates a character index for every character that exists in a bilingual corpus. With this method, it is difficult to translate in real time because the amount of data to be searched for becomes huge amount. The latter creates a word index for every word that exists in a bilingual corpus. This requires a morphemic analysis to extract a word from the input sentence, and thus if the result of the morphemic analysis is not correct, translation becomes difficult. The morphemic analysis is not adequate especially for technical terms and idioms.