The present invention relates to automatic translation systems. In particular, the present invention relates to resolving ambiguities in translations.
In translation systems, a string of characters in one language is converted into a string of characters in another language. One challenge to such translation systems is that a word in one language can have multiple possible translations in the other language depending on the sense of the word. For example, in English, the word “plant” can either be translated to the Chinese word “gongchang” which corresponds to the sense of “factory” or to “zhiwu” which corresponds to the sense of “vegetation”.
Under the prior art, this problem has been viewed as one of classification in that the word in one language must be classified to a particular sense before it can be translated into the other language. Such classifiers typically operate by examining the context in which the word is found and applying this context to a set of probability models that provide the likelihood that a word will have a particular sense given a particular context.
Such classifiers are typically trained on hand-labeled data that identifies the sense for an ambiguous word in the text. Hand labeling data is expensive because it requires a significant amount of labor. To solve this problem, one system of the prior art developed by Yarowsky used a bootstrapping method that iterates between training a classifier using labeled data and using the classifier to label data. Under this method, a small set of data is first labeled by an expert. This labeled data is then used to build an initial classifier. The remaining unlabeled data in the corpus is applied to the classifier to classify the data. Classifications with a high probability are accepted as correct and are added to the set of labeled data. The classifier is then retrained using the updated set of labeled data. Under Yarowsky, this bootstrapping method is performed with data of a single language.
The field of classification extends beyond translation systems and other systems for training classifiers have been developed. One system, known as co-training, was developed by Blum and Mitchell. Under co-training, two classifiers are constructed in parallel, which are used to identify a topic for a web page. One classifier uses text segments from a web page to classify the web page and another classifier uses links to the web page to classify the web page. The topics identified for the web pages by the classifiers are then used to retrain the classifiers. Under Bloom and Mitchell, both types of classifiers are trained for the same sets of classes or topics. Thus, the classes or topics identified for the web pages from the text of the web pages are the same as the classes identified from the links to the web pages.
Although the classification systems described above have been useful, there is continuing need to provide improved classifiers using as little labeled data as possible.