The exemplary embodiment relates to machine translation and finds particular application in connection with a system and method for predicting the quality of automatic translation of a whole document.
Automated translation, also called machine translation (MT), is concerned with the automatic translating a textual document from a source language to a target language. Statistical Machine Translation (SMT) is a common approach. In a phrase-based method the translation entails, for each source sentence, drawing biphrases (source-target phrase pairs) from a biphrase library to cover the sentence. The candidate translation is then scored with a scoring function which takes into account probabilities of occurrence, in a parallel corpus, of the biphrases which were used. Other SMT systems are based on the translation of syntactic units, using partial parse trees. Although the quality of SMT is usually lower than what a professional human translator can achieve, it is valuable in many business applications.
To cope with translation errors in automated translation, translation quality estimation methods have been developed to predict the quality of the translation independently of the translation process itself. Such methods include Confidence Estimation (CE) and Quality Prediction (QP). See, for example, Blatz, et al., “Confidence Estimation for Machine Translation,” Proc. 20th Intern'l Conf. on Computational Linguistics Article No. 315 (2004), hereinafter, “Blatz 2004”; Specia, et al., “Estimating the sentence-level quality of machine translation,” 13th Annual Conf. of the European Association for Machine Translation (EAMT), pp. 28-37 (2009), hereinafter, “Specia 2009.” While the quality and confidence approaches differ slightly from each other, from a practical applicative perspective, their computation and usage is generally the same.
The quality estimation is often cast as a multi-class classification problem or as a regression problem. First, a training set is generated by human evaluation of the quality of translation of a textual dataset at the sentence level. Then a classifier (or regressor) is learnt in order to predict a score for a new translation. Often, a coarse grained scale is used for labeling the training data, since the evaluation of the quality of a translation is highly subjective, as evidenced by the low level of agreement among multiple human evaluators. In some cases, the labels can be integers from 1-4, with 4 indicating the highest quality and 1, the lowest. Often, a binary classifier suffices because in a typical business context, the goal is simply to decide whether to trust the machine translation or not. In some approaches, two dimensions are evaluated separately, such as fluency (of the output text) and adequacy (of its content with respect to the input text). If the score is below a threshold quality (or 0 in the binary case), a manual translation of the input text is obtained, generally from a person who is fluent in the two languages.
QP is often used by professional translators, who combine the use of a translation memory and machine translation to perform their translation tasks. When the quality of machine translation is good, post-editing is faster than editing the target sentence from scratch. However, in other applications, it may not be feasible for the output of machine translation to be post-edited, for example, because the user does not speak the target language. In such a situation, the goal is to decide whether to trust the machine translation, or whether to use a speaker of the language to process the input source text language. In such cases, a binary classifier can suffice.
Machine translation quality predictors are sometimes built on a statistical model operating on a feature set obtained from both the input and output texts (black box features) as well as from information from the inner functioning of the SMT system (glass box features). See, for example, Specia, et al., “Improving the Confidence of Machine Translation Quality Estimates,” Proc. MT Summit XII (2009).
One problem with existing quality predictors is that an SMT system works at the sentence-level, so quality estimation is performed on each sentence independently. It is therefore difficult to estimate the quality of an entire document, which may be composed of a number of sentences. Often, it is a document-level quality estimate which is needed.
U.S. application Ser. No. 14/244,385, filed Apr. 3, 2014, describes an approach for computing message translation quality. The method employs annotation both at the sentence level and at the document level, and relies on the availability of a relatively large set of annotated documents in the same domain.
There remains a need for a system and method which are able to estimate the quality of translation of a document composed of a series of sentences that were translated individually, without the need for training set that is annotated at both sentence-level and message-level.