Many business organizations today have to manage assets, customers and employees distributed across several different countries or regions which are linguistically distinct from one another. As such, accurate and efficient translation of various types of text documents (and/or speech) between the languages used in the countries or regions may become important factors contributing to the success of the organizations. A large web-based retailer may, for example, sell products in dozens of countries, and the efficient translation of product descriptions, reviews and the like may be required to enhance international sales. Translations may also be required at high volumes for governmental and other non-business entities, such as multi-national political groups like the European Union or the United Nations, scientific/technical journals, and the like. Tourists and other international travelers may also require quick and accurate translations. Automating text/voice translation, if translations of a sufficiently high quality can be obtained using automation, may often represent the most cost-effective approach.
A number of different approaches may be taken towards automated translation, including rule-based techniques, example-based techniques, statistical machine translation (SMT), and more recently, neural network based machine translation (NMT). In SMT, translations are usually generated using statistical models whose parameters are derived from existing translated data sets. A given model may comprise, for example, mappings between words or phrases of a source language and words or phrases of the target language (the language into which source language text is to be translated), together with various parameters and/or other metadata regarding the mappings. After such a model has been trained, it is utilized by a decoding algorithm to perform translations in production environments. In NMT, words, phrases or sentences in a source language are typically mapped to high-dimensional vectors within a model using layers of interconnected artificial neurons, and then corresponding words, phrases or sentences in a target language are generated from those vectors.
Although NMT has been shown to be superior to other approaches in terms of translation quality in various scenarios, as with all automated translation techniques it has its flaws. One of the potential problems with NMT is that it is more opaque than alternative approaches—that is, it may not be very clear how exactly translations are generated, and it may therefore be harder to correct translation errors. Evaluating and analyzing the capabilities of a trained NMT system at a suitably granular level to enable appropriate proactive responses to be taken with respect to potential translation problems remains a non-trivial technical challenge.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to. When used in the claims, the term “or” is used as an inclusive or and not as an exclusive or. For example, the phrase “at least one of x, y, or z” means any one of x, y, and z, as well as any combination thereof.