1. Field of the Invention
The present invention relates to a domain-adaptive portable machine translation device for closed captions using dynamic translation resources and a method thereof. More particularly, the present invention relates to a machine translation device and a method thereof, which can improve translation performance in various specialized domains by dynamically constructing a specified translation module and knowledge suitable for automatically recognized style and domain of a caption sentence, and automatically translating a corresponding sentence with optimized translation resources.
2. Discussion of Related Art
While a Japanese-Korean/Korean-Japanese machine translation system has been successfully commercialized, most machine translation systems for translation between heterogeneous pattern languages such as Korean-English/English-Korean and Korean-Chinese/Chinese-Korean translation, etc. have enjoyed little commercial success. The reason for this is that, although translation performance varies depending on the target language and the passage to be translated, machines generally exhibit poor performance when translating between heterogeneous pattern languages.
Recently, attempts are being made to enhance output quality by creating specialized machine translation systems programmed for sentence characteristics in a specific domain of application. As a result of such efforts, translation systems for partially specialized domains such as the domain of technical manuals, the patent domain, the bible translation domain, etc. have been commercialized with varying degrees of success.
In particular, fueled by the spread of satellite TV, there is increasing demand for a machine translation system for closed captions that can provide viewers with captions in their language of choice by automatically translating a caption signal extracted from a broadcast signal.
Here, closed captions refer to an image signal output from a broadcasting station containing the caption signal. Recently, many broadcasting companies are providing such closed captioning for the hearing impaired. In 1990, the United State made it obligatory for 13-inch or more televisions to have a closed captioning function, and domestic television broadcasting stations and CATV companies are expanding closed captioned programs as well. Also, closed captioned programs in foreign languages provided by CNN, NHK, AFKN, etc. are expected to continue to expand.
However, in closed captions on TV, colloquial style and literary style are mixed in various genres, for example, drama, culture and current events, entertainment, etc., as well as the news. Especially, in the news, various technical terms and expressions from almost all domains are used, so there is technical difficulty in developing a machine translation system for closed captions which can provide high quality output on a commercial scale.
To overcome this technical difficulty, Korean Patent Publication No. 1997-56985 (Publication date: 1997 Jul. 31) discloses a TV with a function for translating closed captions. The TV has separate Korean and foreign-language translation parts so as to display caption data in a language selected by a viewer, thus conveniently meeting viewers' needs.
However, the TV with a closed captioning function performs a process of extracting the caption data from the input broadcast signal, translating the caption data into the selected language, and then displaying the translated result on the TV screen. Thus, it has disadvantages in that a TV that supports closed captioning must be separately purchased, and when the broadcast signal is input through another media device, for example, a satellite set-top box, a video player, a DMB terminal, etc., the captioning function cannot be provided.
Moreover, the TV with the closed captioning function performs translation only on the caption data, and thus it cannot provide high output quality for colloquial style and literary style sentences, and sentences used in various technical domains dealt with in captioned programs.
For instance, in the news, “die” is usually used as a verb meaning “stop living,” but in a science domain, “die” is most often used as a noun meaning “mold.” So, if “die” is mistranslated as a verb meaning “stop living” in a science domain caption, it is because the translation was performed without consideration of the application domain.
That is, since the TV with a function of translating closed captions applies the same translation module and knowledge to all domains as a whole, when various styles and technical sentences are input, it is obvious that translation quality will be degraded.
Therefore, there have been many attempts to improve the translation quality in consideration of technical domains. A representative method is a user-adaptive machine translation method in which errors in the translated result are corrected and the corrected result is stored as additional translation knowledge and automatically applied the next time, thereby improving translation quality for similar input sentences.
As the user-adaptive machine translation method described above, a translation memory-based adaptive translation method is generally used, in which a user adds his/her own translation dictionary or manually registers a pattern-based translation corpus and then applies the result to sentence translation.
A related, conventional adaptive machine translation method is disclosed in Korean Patent Publication No. 2004-0111188 (Publication date: 2004 Dec. 31). The adaptive machine translation method disclosed in Korean Patent Publication No. 2004-0111188 improves translation quality by preventing repetition of errors. This is accomplished by a user of the machine translation system directly correcting errors and converting the corrected result into an input knowledge format of the system, and then applying the converted result to the translation system again.
However, in the above methods, the user has to continuously proofread and correct an enormous amount of machine translated documents. Proofreading on a small scale is not effective in improving machine translation performance for documents in various domains.
Also, the adaptive machine translation method uses a data-driven machine translation engine to avoid conflict of translation data or rules. In this case, since there is a limit to adding word-by-word translation correction knowledge to improve translation performance, a separate statistics database built from Copious amounts of translation knowledge is required.
Consequently, in order to commercialize the portable machine translation system for closed captions, it is necessary to improve translation performance by automatically recognizing target domains and styles and constructing a specialized translation environment, and enable linking with various types of media devices.