Accurate measurement of the translation quality of machine translation systems is essential to improve machine translation algorithms. A wide range of metrics for evaluating the quality of machine translated text have been proposed and are used for various purposes. Automatically generated metrics, such as Bilingual Evaluation Understudy (“BLEU”) scores, Metric for Evaluation of Translation with Explicit ORdering (“METEOR”) scores, Human-targeted Translation Error Rate (“HTER”) scores, and others, compare the output of a machine translation system to human-generated reference translations to quantify the quality of the output generated by the machine translation system. Once a set of human reference translations has been provided, automatically generated metrics such as these can be computed quickly. Often, however, automatically generated metrics do not perfectly correspond to translation quality as perceived by humans.
Other mechanisms for evaluating translation quality can also be utilized in which humans judge the quality of machine translated output. These mechanisms can be more reliable than automatically generated metrics. However, these mechanisms carry their own challenges, such as subjectivity and disagreement between human annotators as to translation quality.
Because of the challenges described above, the use of existing mechanisms for evaluating the quality of machine translations can result in inaccurate quality measurements of machine translated output. Inaccurate quality measurements of machine translated output can, in turn, result in unnecessary retraining of machine translation models, thereby wasting valuable computing resources and power. For example, a machine translation quality evaluation that indicates that the quality of translations generated by a machine translation system is lower than it actually is can result in significant unnecessary retraining of models utilized by the machine translation system.
The disclosure made herein is presented with respect to these and other considerations.