1. Field of the Invention
The invention disclosed herein generally pertains to efficient information management. More particularly, the invention pertains to reducing redundancy in information management processes by combining and sharing models to create efficient learning systems for multimedia content annotation, management access or distribution.
2. Description of the Related Art
“Multi-label classification” refers to the simultaneous categorization of a given input into a set of multiple labels. The need for multi-label classification in large-scale learning systems is ever increasing. Examples include diverse applications such as music categorization, image and video annotation, text categorization and medical applications. Detecting multiple semantic labels from images based on their low-level visual features and automatically classifying text documents into a number of topics based on their textual context are some of the typical multi-label applications.
In reality, multi-label data collections may contain hundreds, thousands or millions of data items. Such items may be associated with as many different labels (however, the number of labels is typically some subset). Current multi-class learning solutions assume each data can only associate one single class. Accordingly, algorithms used are not efficient for handling multiple labels. Other existing multi-label learning approaches call for learning an independent classifier for every possible label using all the data samples and the entire feature space. As each label set contains some redundant information (e.g. label “mountain” is often overlapping with label “sky”), there is a great opportunity to improve the accuracy of the learning system that can simultaneously learn and predict multiple labels.
To speed up multi-label classification without performance degradation, one approach is to exploit the information redundancy in the learning space. To this end, researchers have proposed several ensemble learning algorithms based on random feature selection and data bootstrapping. Examples include those described by R. E. Schapire, “Using output codes to boost multiclass learning problems,” Proceedings of the Fourteenth International Conference on Machine Learning, pages 313-321, San Francisco, Calif., USA, 1997. Morgan Kaufmann Publishers Inc. D. Tao, X. Tang, X. Li, and X. Wu; “Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval.” IEEE Trans. Pattern Anal. Mach. Intell., 28(7):1088-1099, 2006; U.S. Pat. No. 6,453,307, “Method and apparatus for multi-class, multi-label information categorization,” Robert E. Schapire, Yoram Singer; U.S. Pat. No. 6,662,170 “System and method for boosting support vector machines,” Byron Edward Dom, Jianchang Mao, Dmitry Pavlov; U.S. Pat. No. 7,139,754, “Method for multi-class, multi-label categorization using probabilistic hierarchical modeling,” Cyril Goutte, Eric Gaussier.
The random subspace method (RSM), “The random subspace method for constructing decision forests,” T. K. Ho., IEEE Trans. Pattern Anal. Mach. Intell. 1998, takes advantage of both feature space bootstrapping and model aggregation, and combines multiple base models learned on a randomly selected subset of the feature space. Although RSM considerably reduces the size of the feature space in each base model, and the model computation is more efficient than a classifier directly built on the entire feature space, certain problems are known.
Thus, by combining bagging (bootstrap aggregation) and RSM, Breiman has developed a more general algorithm called “random forest.” Reference may be had to “Random Forests,” Breiman, L., In Machine Learning, 45, 5-32 (2001). This technique aims to aggregate an ensemble of unpruned classification/regression trees using both bootstrapped training examples and random feature selection in the tree induction process. Random forest can be learned more efficiently than the baseline method, and it has empirically demonstrated superiority compared to a single tree classifier. But for the multi-label scenario, above algorithms need to learn an independent classifier for every label or assume that the underlying base models can produce multi-label predictions. However, they ignore an important fact that different labels are not independent of each other, or orthogonal to one another.
In the machine learning community, the idea of sharing the common information among multiple labels has been investigated by the methods called “multi-task learning.” One example is described by R. Ando and T. Zhang in the publication entitled “A framework for learning predictive structures from multiple tasks and unlabeled data,” Technical Report RC23462, IBM T. J. Watson Research Center, 45, 2004. These methods handle the multi-label classification problem by treating each label as a single task and generalizing the correlations among multiple task using neural networks, regularization learning methods, etc. These approaches often use the single-task learners in an iterative process and require a complex inference effort to estimate the task parameters.
The problem of exploring and leveraging the connections across labels have also been seen in many other research areas, such as image annotation and object recognition. However, the foregoing methods do not provide mechanisms to reduce the redundancy among labels and improve the computation efficiency.
The problem of exploring and leveraging the connections across labels has also been seen in many other research areas, such as neural networks (see U.S. Pat. No. 6,324,532, “Method and apparatus for training a neural network to detect objects in an image,” issued to Clay Douglas Spence, Paul Sajda) and text categorization (see U.S. Pat. No. 6,253,169 “Method for improvement accuracy of decision tree based text categorization,” issued to Chidanand Apte, Frederick J. Damerau, Sholom M. Weiss). One such example is the domain of image annotation and object recognition which aims to detect a number of scenes and objects in the given images. See also, for instance, “The Mediamill,” C. Snoek, M. Worring, J. Geusebroek, D. Koelma, and F. Seinstra, TRECVID 2004. In the publication “Semantic Video Search Engine,” Proceedings of TRECVID, 2004, a semantic value chain architecture including a multi-label learning layer called “context link” was proposed. In the document “Mining relationship between video concepts using probabilistic graphical model,” R. Yan and A. G. Hauptmann., Proceedings of IEEE International Conference On Multimedia and Expo (ICME), 2006, the authors studied various multi-label relational learning approaches via a unified probabilistic graphical model representation.
However, these methods need to construct an additional computational intensive layer on top of the base models, and they do not provide any mechanisms to reduce the redundancy among labels other than utilizing the multi-label relations.
Accordingly, what are needed are techniques for reducing redundancy in information management processes by combining and sharing to create efficient systems for multimedia content annotation, management access or distribution.