Technical Field
The present invention relates to label filtering and more particularly to a label filtering system for preselecting a number of labels for consideration by a multi-label classifier.
Description of the Related Art
With the advent of big data, classifiers can make more and more fine-grained distinctions between various classes, leading to the rise of classification problems with very large numbers of labels. Data sets with tens and hundreds of thousands of labels are already becoming standard benchmarks in domains such as, e.g., object recognition and text classification. Additionally, multi-label problems with millions of labels have recently been tacked into the literature. As more and more data becomes available, the number of problems with large label sets, as well as with the number of labels per problem, is going to increase.
One consequence of the explosion in the number of labels is a significant increase in the test time (production time) computational burden. Most approaches to multiclass and multi-label classification, such as, e.g., the popular one-vs-all scheme or the Crammer-Singer multiclass SVM, have to systematically evaluate the match between each label and the test instance in order to make a prediction, leading to a test-time complexity that is linear in the number of labels. As the number of labels grows, the systematic evaluation of all labels becomes prohibitive, especially for applications that require real-time response and/or have limited computational resources. In problems with a large number of labels, most multi-label and multiclass techniques incur a significant computational burden at test time.