The exemplary embodiment relates to machine translation and finds particular application in connection with a system and method for predicting an optimal translation system for a given user, or for use in performing machine translation of text for that user.
With the availability of large amounts of data, customer applications are tending to become more personalized, accommodating the customer's behavior in order to create a unique experience and provide a better service to each user. Such personalization is done through customer modeling, based on the customers' attributes, such as demographics (gender, age, etc.), preferences, and personalities. For example, in the field of information retrieval, search results are often customized based on location or the user's search history. Automatic speech recognition and targeted advertising are other areas where personalization has been successfully employed.
When customers speak different languages, modeling customers and interacting with them involve multilingual aspects. Multilingual customers may express themselves differently, discuss different topics, and even have a somewhat different personality in each language. Machine translation (MT) systems, however, have not been personalized for customers, in part, because training statistical machine translation (SMT) models relies on the availability of a large amount of parallel training data for the language-pair, i.e., a corpus where each sentence in one language (the source language) is translated to the other language (the target language). Using training data of the same domain of the text being translated has a significant positive impact on the quality of the translation. Domain adaptation allows SMT models to be adapted to a particular topic, genre, or style.
Domain adaptation can be performed, for example, using an organization's corpora. Other methods include data-selection (Axelrod, A., et al., “Domain adaptation via pseudo in-domain data selection,” EMNLP '11, pp. 355-362 (2011), Gascó, G., et al., “Does more data always yield better translations?” Proc. 13th Conf. of ECACL, pp. 152-161 (2012), Mirkin, S., et al., “Data selection for compact adapted SMT models,” AMTA-2014, pp. 301-314 (2014)), mixture models (Foster, G., et al., “Mixture-model adaptation for SMT,” Proc. WMT, pp. 128-135 (2007)) and table fill-up (Bisazza, A., et al., “Fill-up versus interpolation methods for phrase-based SMT adaptation,” Proc. IWSLT, pp. 136-143 (2011)).
However, even though domain adaptation can be achieved with a smaller amount of training data than is used for building a machine translation model from scratch, by adaptation of translation systems to the topic, genre or the style of the translated material, such methods do not factor in the user's own particular preferences of one translation over another.
In some cases, it is feasible to deploy several distinct translation systems (or different models of the same system) and then choose among the multiple alternative translations that are produced for a given input text. The choice may be made by a professional translator or by using automatic estimation of the quality of the translations. These approaches assume that the choice of the best system is independent of the actual user receiving the translation. However, this does not account for the customer's personal translation preferences (TPs) for choosing which translation system to use. TP is a factor when alternative translations are all correct or when each of them is wrong in a different way. In the former case, a preference may be a stylistic choice, and in the latter, a matter of comprehension or a selection of the least intolerable error, in the user's opinion. For example, one user may place a priority on having the syntax correct, even if some words remain untranslated, while another may prefer that all the words are translated, even at the cost of some syntactic flaws. One user may prefer shorter sentences than others, or may favor a more formal style, while another would prefer a more casual style. One user may accept reordering errors but be more demanding concerning punctuations. Such differences may be the result of the type of translation system being employed (e.g. syntax- vs. phrased-based), the specific training data or many other factors. On the user's side, a preference may be attributed, for example, to native language, age, personality, or other factors.
Users can also utilize their own glossaries (Federico, M., et al., “The Matecat tool,” Proc. COLING 2014: System Demonstrations, pp. 129-132 (2014)), corpora (parallel or monolingual) and translation memories (TM), either shared or private ones (U.S. Pat. No. 8,805,672 to Caskey, et al.). Through Adaptive and Interactive MT, the system learns from the translator's edits, to avoid repeating errors that have already been corrected (Nepveu, L., et al., “Adaptive language and translation models for interactive machine translation,” Proc. EMNLP, pp. 190-197 (2004)). Post-editions can continuously be added to the translator's TM or be used as additional training material, for tighter adaptation to the domain of interest, through batch or incremental training. While useful, such methods take time to implement and may discourage users from trying new translation models. Additionally, most of the focus is on customization for companies or professional translators.
Given a set of two or more translation models to choose from, metrics for inter-rater reliability or inter-annotator agreement, such as Cohen's Kappa Cohen, J., “A Coefficient of Agreement for Nominal Scales,” Educational and Psychological Measurement, 20(1) pp. 37-46 (1960) can measure the extent to which annotators disagree on which is the better of two or more translations. Disagreement may be the result of an untrained annotator, a task that is not well defined, or when there is no obvious truth. In the case of the evaluation of translation quality, it is not always straightforward to tell whether one translation is better than another. A single sentence can be translated in multiple correct ways or in multiple incorrect ways. The choice of what is best often depends on the users' preferences. Kappa levels, even when measured on simpler tasks, such as short segments, are often low. (Macháček, M. et al., “Evaluating machine translation quality using short segments annotations,” The Prague Bulletin of Mathematical Linguistics, No. 103, pp. 85-110 (2015)).
There remains a need for a system and method for personalized machine translation (PMT).