The following relates to the machine learning arts and related applications such as Internet spam filtering, document relevance assessment, and so forth.
The simplest machine learning problem is a single-task, single-class problem in which a sample is classified as belonging to a class or not belonging to the class. Substantially any binary classifier can be used for such a problem. Conceptually, the binary classifier defines a (hyper)plane in the feature space that divides samples belonging to the class from samples not belonging to the class. The classifier is learned (or trained) based on a training set of samples that are annotated as to whether or not they belong to the class, and the learning optimizes the position of the hyperplane. The learned classifier can then be used as a predictor to predict whether or not an input sample (which is not annotated, in general) belongs to the class.
A multi-task single-class problem performs multiple such “binary” classification tasks, each for a different class. An example of such a problem is a document annotation system in which a document may, or may not, be annotated with each of a set of classes of the classification system. To illustrate, the classes may be article categories, e.g. “sports”, “politics”, “national news”, “weather”, . . . . A given document may belong to none, one, two, or more of these classes, e.g., an article about a former athlete running for political office may properly belong in both “sports” and “politics”. The simplest approach here is to separately learn a binary classifier for each task. However, this approach cannot leverage task interrelatedness. For example, an article classification of “politics” may increase the likelihood that the article also properly belongs in the “national news” category, but such a correlation will not be captured by independent classifiers that are separately learned for the two classes. Multi-task learning approaches simultaneously learn an integrated predictor that outputs predictions for all tasks of the multi-task problem. The multi-task learning approach can leverage correlations between the tasks.
An illustrative example of multi-task single class learning is set forth in Faddoul et al., “Boosting Multi-Task Weak Learners with Applications to Textual and Social Data”, in Proceedings of the Ninth Intl Conf. on Machine Learning and Applications (ICMLA) pages 367-72 (2010), which extends multi-task adaptive boosting (MT-Adaboost) to the multi-task setting. The boosted weak classifiers were multi-task “stumps”, which are trees having at each node a decision stump for one task. (A stump can be thought of as a one-level decision tree which has a test node and decision leaves). In this approach suitable re-weighting of examples from different tasks without label correspondences or shared examples was used to leverage the local relatedness of tasks.
A single-task multi-class problem performs a single classification task, but in this case the output is not binary but rather includes three or more possibilities. For example, rather than deciding whether a sample belongs to class A or not, a multi-class problem may decide to which class (or, in a multi-label setting, which class or classes) of the group of classes A, B, C, . . . the sample belongs. The various possible outputs are sometimes called “labels”, and so the multi-class problem assigns one label (or one or more labels, in a multi-label setting) to the sample as selected by the classifier from the set of labels. Some intrinsically multi-class classifiers are known; additionally, a set of binary classifiers for the various classes can be employed as a multi-class classifier by using a combinational strategy such as “one versus all”.
Finally, a multi-task multi-class problem includes multiple tasks, at least one of which is a multi-class task. A further distinction that can be made here is whether the label sets of the various tasks overlap. (In this context, a single-class task can be viewed as having a “label set” of two labels: the label “belongs to the class” and the label “does not belong to the class”). The assumption of no label overlap between tasks (i.e., label distinctness or no label correspondence) maximizes versatility. This assumption can be made even if some tasks actually do share some labels, by treating the labels as distinct in the different problems. However, even with label distinctness there remains the possibility of correlations between tasks. (Indeed, if two tasks actually share the a common label which is treated as distinct in the learning of the two tasks, it is likely that one task outputting the common label will strongly correlate with the other task outputting the common label.) Thus, it is advantageous to apply a multi-task learning framework to a multi-task multi-class problem.
One approach for machine learning is the decision tree (DT) approach. In a DT, at each node a decision rule is learned that optimally splits the available training data, and the processing iteratively follows from node to node, splitting at each node, until a decision node is reached, which is a leaf of the DT. Multi-class decision tree learning algorithms that employ adaptive boosting or bagging, such as the C4.5 algorithm, are known. See, e.g. Quinlan, “Bagging, Boosting, and C4.5”, AAAI-96 pages 725-730 (1996); Schapire et al., “Improved Boosting Algorithms Using Confidence-rated Predictions”, Machine Learning vol. 37 pages 297-336 (1999). The information gain (IG) is sometimes used as the criterion for optimizing the split performed by each decision rule.
In spite of substantial work in machine learning as briefly outlined above, there remains need for multi-task multi-class learning approaches that are applicable in the most versatile context of label distinctness (that is, which do not assume sharing of labels between tasks) and that effectively leverage local relatedness between tasks that may vary across the learning space.