In recent years, machine learning applications, which typically include computer applications learning from a set of examples to perform a recognition task, have becoming increasingly popular. A task typically performed by these types of machine learning applications is classification, such as automatically classifying documents under one or more topic categories. This technology is used in filtering, routing and filing information, such as news articles or web pages, into topical directories or e-mail inboxes. For example, the text documents may be represented using a fixed set of attributes, each representing the number of times a particular key word appears in the document. Using an induction algorithm, also referred to as a classifier learning algorithm, that examines the input training set, the computer ‘learns’ or generates a classifier, which is able to classify a new document under one or more categories. In other words, the machine learns to predict whether a text document input into the machine, usually in the form of a vector of predetermined attributes describing the text document, belongs to a category. When a classifier is trained, classifier parameters for classifying documents are determined by examining a training set of documents that have been assigned labels indicating to which category each training example belongs. After the classifier is trained, the classifier's goal is to predict to which category a case provided to the classifier for classification belongs.
Known classification techniques treat each classification task independently. However, in many practical situations the same document must be classified with respect to multiple sets of categories. For example, A(A1, A2) and B(B1,B2) represent two different classification tasks, each having a pair of categories. For example, the task A may be to classify whether cases are in a ‘surfing’ category (A1) or not (A2), and the task B may represent the ‘Hawaii’ category (B1) or not (B2).
The existing literature on machine learning mainly treats each of the classifications for the examples described above as separate and independent classification problems. This would be fine if there were ample training data for each of the categories. Unfortunately, in practice there may be many documents to learn from that are labeled for classification task A, but few for classification task B. This generally results in either poor prediction accuracy for task B, or spending additional time and resources to obtain more training data to improve the learned classifier for task B.