The present invention relates to classifier training systems. In particular, the present invention relates to uncertainty reduction in collaborative bootstrapping.
Collaborative bootstrapping systems include both co-training and bilingual bootstrapping. Generally, collaborative bootstrapping is iterative and begins with a small number of labeled data and a large number of unlabeled data. Two classifiers or types of classifiers are trained from the labeled data. The two classifiers label some unlabeled data and then train two new classifiers from all the labeled data. The process then repeats. During the process, the two classifiers collaborate with each other by exchanging labeled data. Generally, in co-training, the two classifiers have different feature structures, and in bilingual bootstrapping, the two classifiers have different class structures.
Under co-training, which was developed by Blum and Mitchell (1998), two classifiers were constructed in parallel and used to identify a topic for a web page. One classifier used text segments from a web page to classify the web page and another classifier used anchor texts linking to the web page to classify the web page. The topics identified or labeled for the web pages by the classifiers were then used to retrain the classifiers. Other types of co-training were developed by Collins and Singer (1999) and Nigram and Ghani (2000). Under bilingual bootstrapping, which was developed by Li and Li (2002), two classifiers were constructed in parallel, exchanged information with one another, and used to disambiguate words that had two possible translations in another language.
In certain situations, the classifiers in conventional collaborative bootstrapping are unable to boost their classification performance while bootstrapping more labeled data. Therefore, a system and/or method to address this problem would enhance the performance or accuracy of classifiers.