The performance of a machine translation system which translates sentences of one language into another language has been continuously improving. However, such a machine translation system still makes many translation errors. In order to remove translation errors, the performance of relevant modules in a translation engine must be improved. However, this method is problematic in that, since an individual module causing corresponding error must be directly corrected, a new translation module must be implemented for error correction when the development of the translation system has already been completed. In addition, such a method is problematic in that, since error correction in individual modules does not consider whole generated sentence, there is a high probability that accurate translation is not performed and errors still remain, and in that various types of errors are not solved at once. Due to these problems, for the improvement of the performance of a machine translation system, a post-editing of translation error, capable of automatically correcting errors occurring in the final translation results by using a post-processing scheme, is useful.
Recently, many statistics-based machine translation systems have been developed, but they do not exhibit excellent performance in the case of language pairs such as the Korean-English which are quite different owing to the difference in their word order. Actually, commercialized machine translation systems are rule or pattern-based machine translation systems. One of the great characteristics of the translation results made by using the rule or pattern-based machine translation systems is that, in many cases, although the meaning of a translated sentence is correct, the translated sentence is not natural, or is awkward due to a grammatical error.
Meanwhile, a language model may be used to estimate errors of a machine translation system. Such language models are built in the form of a database (DB) of the probabilities of a sequence of specific words appearing in a large corpus. The language models are used as indices for appropriately used expressions of a target language in a statistics-based machine translation. Therefore, the language models may provide basis for automatically finding a portion in which errors have occurred by comparing a translation created by a machine translation system with the built language models and for accurately correcting that portion.
Errors of a machine translation system may be estimated using an n-gram language model which is one type of the conventional basic language models. As n is increased, much surrounding context may be viewed from the language model, but model data insufficiency may occur. Further, based on a simple n-gram model, estimation of error occurring in long-distance dependency is difficult. Moreover, since only simple arrangement of words is considered while building the n-gram language model, unnecessary word sequences, i.e., erroneous word sequences such as noise, are recognized as correct word sequences, thus decreasing accuracy in error detection and correction.
Therefore, there is a need to build a new language model for post-editing capable of handling long-distance dependency and preventing noise from occurring in the language model.
Though, one or more translation errors may coexist in one translated sentence, conventional post-editing systems for correcting translation errors does not consider the sequence of processing the coexisting translation errors. Therefore, in order to improve entire correction performance of the language model-based post-editing system, a technique primarily correcting an error having higher priority in consideration of the priorities of the coexisting errors is required.
Furthermore, the existing post-editing system is configured in a loosely-coupled structure in which it is difficult for a post-editing system to refer to information analyzed and generated by the translation engine of a translation system which performs actual translation. However, better translation performance may be achieved if errors are corrected with reference to information about an analysis of original text or a translated text by using a rule or pattern-based translation engine.