1. Technical Field
The present invention relates generally to active learning systems and methods in machine translation systems, and more specifically to active learning systems and methods for developing a parallel corpus in a statistical machine translation system for new language pairs or new domains.
2. Description of the Related Art
The increasing globalization of the international community has brought about an ever-growing demand for machine translation systems. Parallel corpus is a very essential resource for developing many machine translation systems, particularly for those that are based on statistical learning algorithms. Parallel corpus refers to sets of words, phrases and/or sentences from two different languages that are translations for each other.
Generally speaking, the performance and accuracy of a machine translation system increases with the size of the parallel corpus. Thus, when developing a statistical machine translation system (SMT) for new language pairs or new domains, the creation of a large, accurate parallel corpus is extremely important.
Current methods of parallel corpus creation rely solely on human translators to create translations and correct inaccurate translations produced by the SMT. As a result of the reliance on human translators, the process of updating parallel corpus in the current state of the art is typically expensive and slow.