(a) Field of the Invention
The present invention relates to a method of automatically constructing translation knowledge to be used in a translation apparatus. More particularly, the invention relates to a method of cumulatively and automatically constructing translation knowledge to be used in a translation apparatus that automatically translates Korean having an agglutinative and inflective phenomenon in the extreme into English and Chinese having little or no agglutinative and inflective phenomenon by using a previously held bilingual dictionary, an unsupervised learning, and a language processing module.
(b) Description of the Related Art
Translation knowledge can be usefully used in various ways regardless of methodology of a machine translation. Particularly, in a frame of a sentence and a subcategory of an inflected word above a word and syntax, the translation knowledge can also be usefully used in, for example, a translation system and of methodology of a machine translation. Particularly, in a frame of a sentence and a subcategory of an inflected word above a word and syntax, the translation knowledge can also be usefully used in, for example, a translation system and foreign language learning. That is, acquired translation knowledge can be used in an example-based machine translation system and a program for the foreign language learning as well as a statistics-based machine translation system.
The machine translation system receives text that is composed of source-language sentences, translates the received source language into a target language, and outputs the translated result. In this case, the translation is performed by using a bilingual dictionary of word/syntax, translation rules, a translation pattern, and so on. Alternatively, the translation is performed by the learning of a statistical translation model. Generally, since accurate knowledge is essential for the translation system using the translation rule or the translation pattern, the translation knowledge is universally acquired by experts. However, considerable time and cost are required for the acquisition of the translation knowledge. To overcome this problem, many studies have been conducted so as to automatically extract the knowledge or to semi-automatically acquire the knowledge by developing tools.
One of existing approaches is Feedback Cleaning of Machine Translation Rules using Automatic Evaluation by Kenji Imamura. This relates to a method of automatically leaning and refining parallel-translation knowledge and conversion rules for Japanese-English machine translation operated in an example-based way and a conversion-driven way.
According to the approach of Kenji Imamura, Japanese and English sentences are syntactically analyzed by using a syntactic analyzer of each language with respect to a pair of parallel sentences in Japanese and English, respectively, and the syntaxes are aligned by connecting the syntaxes having the same syntax category in English and Japanese to each other, after aligning words by using an algorithm for word alignment. Furthermore, according to the above-described approach, parallel-translation information on the syntax and parallel-translation knowledge on the word are acquired from results of a syntactic alignment. In addition, the conversion rules are extracted from the results of the syntactic alignment, the conversion rules being composed of the syntactic category and a functional language carrying out a grammatical role. The conversion rules include a syntactic category of the syntax to be converted and a pattern of source-language syntax, a pattern of target-language syntax, and examples of the source-language syntax. Furthermore, a process of refining a bilingual corpus is performed so as to extract accurate conversion rules. The bilingual corpus is classified into a literal translation corpus and a non-literal translation corpus. The literal translation corpus is composed of pairs of sentences for maximizing alignment links of words that constitute the source-language sentence and the target-language sentence, and others are considered as a non-literal translation corpus. The conversion rules are extracted from the literal translation corpus. If a phrase exists in the non-literal translation corpus, a generalized syntactic conversion rule is also extracted from the phrase. In other parts that are difficult to generalize, the translation pattern is extracted by using vocabulary in itself.
However, according to the methodology of Kenji Imamura, only when two languages have the same syntactic category can the syntactic alignment be attempted. In a case of applying it to two languages with different structures of language and cultural background, the above-described methodology reduces the reproduction ratio of the extracted translation knowledge. Moreover, even though the translation knowledge is constructed, the constructed translation knowledge becomes too generalized knowledge of the sentence. Accordingly, ambiguity of the word and complexity of word rearrangement have occurred in the above-described methodology.
The above information disclosed in this Background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not form the prior art that is already known in this country to a person of ordinary skill in the art.