The present invention relates to machine translation. More specifically, the present invention relates to machine learning a confidence metric associated with machine translation results.
Machine translation refers to the process of receiving an input string in a source language and automatically generating an output string in a target language. The output string will desirably be an accurate and fluent translation of the input string from the source language to the target language.
When translating a set of sentences using a machine translation system, the quality of the translations output by the machine translation system typically varies widely. Some sentences are translated accurately and fluently, others are translated adequately, but not necessarily accurately or fluently, and some (hopefully a small set) are translated into a translation result which is simply incomprehensible.
One primary application of a machine translation system is to aid human translators. In other words, as a human translator translates a document, a component of helper software which is sometimes referred to as a translator's workbench attempts to minimize the human effort involved by consulting a database of past translations and suggesting translations that match the input string within a certain threshold. In order to perform properly, the translator's workbench must somehow decide which of the translation hypotheses is most useful to a human translator. It has been found that if the translator's workbench chooses the wrong translation hypotheses to display to the user, this may actually waste more time than it saves because it confuses or misleads the human translator.
In prior systems, each individual rule used in the machine translation process was given a hand-coded score. The score was indicative of how well each rule worked in the machine translation process. However, the individual scoring was performed by doing a slow, hand-coded pass through an entire machine translation system which is extremely expensive and subject to errors, in addition to being difficult to customize to different domains.