The following relates to the multi-label classification arts, document archiving and retrieval arts, database arts, and related arts.
Single-label classification is concerned with learning from a set of examples xiεX which are associated with corresponding labels yiεY where Y is a set of k disjoint labels. The special case of k=2 is known as a binary classifier, whereas for k>2 the classifier is termed a multi-class classifier.
For a given input object xIN, the output of a single-label classifier is a single label y. A single-label classifier cannot accommodate more complex situations in which an input object xIN may be meaningfully associated with more than one label. In document classification, by way of illustration, a single-label classifier can only assign the document to a single document class, such as “sports” or “politics” or “law”. However, it is not uncommon for a document to be meaningfully associated with more than one document class. By way of further example, a document pertaining to legal difficulties of a famous athlete might be meaningfully associated with both “sports” and “law”, as well as perhaps some other document class or classes such as “celebrities”.
Multi-label classifiers can accommodate such situations. A multi-label classifier assigns for xiεX one or more labels selected from a set of labels Y={yj}j=1, . . . , k where the various labels yj are no longer mutually exclusive, that is, a single given object xIN may be assigned two or more of the labels yjεY. For example, in one suitable notation x, is assigned a set yi=(yi1, . . . , yii) where each element yi associates xi with the i-th label or class of the set of labels Y.
Multi-label classification is more complex than single-label classification. A given input xIN may in general be assigned, one, two, or more classes, and any two classes yj, yj′≠jεY may be wholly independent (i.e., assigning xIN to class yj has no impact on whether xIN is also assigned to yj′); or may be strongly correlated (i.e., assigning xIN to class yj strongly implies xIN is also assigned to yj′) or strongly anti-correlated (i.e., assigning xIN to class yj strongly implies xIN is not also assigned to yj′). The complexity generally increases with the size of the set of labels Y={yj}j=1, . . . , k as the possibilities for correlations increases. For example, in a larger set of labels there may be correlations involving three or more labels.
Various techniques for multi-label classification are known.
One-against-all (1AA) techniques transform the problem into k separate binary problems and train k independent single-label binary classifiers hi, each deciding whether to label an input respective to one label of the set of labels Y={yj}j=1, . . . , k. The outputs of the k single-class classifiers are ranked or thresholded and combined to generate the multi-label result. These techniques are fast and easy to implement, as they decompose the multi-label classification problem into k single-label binary classifications. However, there is a strong assumption of independence among the labels Y={yj}j=1, . . . , k imposed by the k single-class classifiers operating independently.
Length-based one-against-all (L-1AA) classification is a modification of the one-against-all approach, which trains additionally a length predictor hL on the composite input {xi,|yi|} training set where |yi| is the numbers of labels in yi. This approach assumes a probabilistic setting for all the binary classifiers hi, in which the binary classifiers first predict length Li for a given xi, then it is labeled with classes yi having the top Li scores. This modification can improve performance, and is scalable. However, the performance is sensitive to the performance of the length predictor hL, and retains the assumption of independence of the classes that is made in the one-against-all approach.
Unique multi-class (UMC) multi-label classification approaches take each label set yi present in the training set T={xi,yi} as a unique label, and performs single-label classification using this set of composite labels. The approach is in principle easy to implement, since it employs one single-class classifier. However, the number of disjoint composite labels for this single-class classifier can be very large, since it equals the number of unique label combinations yi in the training set T. By way of example, even for a small value of k=4 for the set of labels Y={1,2,3,4}, the number of composite labels can be as high as 15. The training set T may have few (or even no) example of some possible composite labels, and there is no way to generalize to encompass any composite labels that are not included in the training set. This technique also assumes independence of the labels of the set of labels Y.
Collective multi-label (CML) classification is intended to address situations in which the set of labels Y include correlated labels. It utilizes multi-label conditional random field (CRF) classification models that directly parameterize label co-occurrences. This approach performs well on small datasets, but the learning and inference is complex and is not scalable.
Latent variables in Y (Latent-Y) classification techniques discover relevant groups or “topics” among the set of labels Y and replaces discovered groups with single “topic” labels. This approach also can accommodate correlations. Most techniques require a priori knowledge or selection of the number of topics, and informational loss can be incurred when decoding from topics to labels to generate the final multi-label assignments. Latent variables in (x, y) (Latent-X-Y) techniques are an extension aimed at discovering topics in the (X, Y) space, using a multi-label informed latent semantic indexing (MLSI) algorithm which preserves the information of inputs and captures correlations between multiple outputs. The recovered “latent semantics” thus incorporate the annotated category information and can be used to improve the prediction accuracy. However, implementation is difficult, and scalability is limited.
The various multi-label classification techniques each have advantages and disadvantages, which flow from various assumptions made in the various techniques. For example, techniques such as the one-against-all that make an independence assumption are fast to implement because the multi-label classification is reduced (or at least partially reduced) to a set of single-class classification problems. However, these techniques can be expected to perform poorly in the presence of correlations or anti-correlations between labels. On the other hand, techniques such as CML or latent-Y techniques are designed to accommodate correlations between labels, but at the expense of higher complexity and possible information loss (e.g., when a sub-set of labels is replaced by a single “topic” label).
The following discloses methods and apparatuses for system monitoring and other time series processing which accommodate delayed disclosure.