1. Field of the Invention
The invention disclosed and claimed herein generally pertains to selection of resources from among various methods of automatic content tagging in large scale systems. More particularly, the invention pertains to a method and apparatus for automatically classifying multimedia artifacts by scoring and selecting appropriate ontologies from amongst all possible sets of ontologies, such as by recursive routing selection. Even more particularly, the invention pertains to a method of the above type wherein the semantic tagging of the multimedia artifact is improved, or enhanced by applying only classifiers selected from the selected ontologies, based on the context of the multimedia artifact.
2. Description of the Related Art
Vast amounts of multimedia content are being created in many areas of science and commerce, necessitating the need for new automatic data analysis and knowledge discovery tools for more efficient data management. Semantic classification algorithms are being developed for classification to add metadata and facilitate semantic search. A principal challenge in semantic content modeling is the complexity of the modeled domain. Projecting multimedia content into a high-dimensional semantic space requires a suitable set of semantic classifiers that effectively and efficiently capture the underlying semantics of the data stream. Classifiers are often mapped to classes belonging to a specific ontology and specific domain, such as Broadcast News Video, Surveillance Video, Medical Imaging, Personal Photos, and the like.
As processing power increases and data size increases exponentially, the number of classes that need to be detected, and can be detected, is also increasing considerably. Moreover, as the number and variety of automatic content classifiers increases, so does the entropy or randomness of the respective analysis system. Evaluating thousands of existing semantic concepts against terabytes of data is computationally expensive and redundant, and results in a computational bottleneck, or in an increased need for human experts who can select the appropriate set of classifiers to be automatically evaluated against a content item. Adding automatic classifiers results in a less efficient and less effective system, if the proper context of the automatic tagging is not included. At present, no solution exists for efficient traversing through the set of existing ontologies, and for the smart selection of classifiers associated with the respective ontologies in order to accommodate a large scale of classifiers and data. Moreover, automatic classifiers for the same class can differ significantly in the context of different ontologies;, for example, Person Activity in Surveillance Videos versus Person Activity in Broadcast News. The known solutions for selecting the most appropriate classifiers either adopt (or build) a single ontology, or else evaluate against a manually selected set of concepts within all available ontologies. This can compromise the quality of the content retrieval, since weak and redundant classifiers can have the same relevance in the semantic tagging as the more reliable ones.
In recent years, a substantial amount of effort has been put into designing semantic concept detectors for various concepts of interest in different domains. The “Large Scale Concept Ontology for Multimedia” IEEE Trans. Multimedia, July 2006, initiative has identified nearly 1000 concepts of interest for visual analysis. For example, Kender and Naphade, in “Visual concepts for news story tracking: Analyzing and exploiting the NIST TRECVID video annotation experiment,” IEEE Proc. Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2005, exploited the relationships between concepts, and used various criteria to determine the maturity of LSCOM concept definition and ontology completeness. Also, performance of semantic classifiers can be enhanced using context, as shown on a moderate-size lexicon by Naphade and Smith in “Mining the Semantics of Concepts and Context,” Intl. Workshop on Multimedia Data Management (MDM-KDD), 2003. However, ontologies offer varying interpretations of concepts when used within context.
Moreover, analysis of vast amounts of image and video data available on internet blogs and web chat rooms has produced a need to analyze multiple modalities such as associated text, audio, speech, URL and XML data. This type of data is needed to automatically place a multimedia artifact in a context, and to offer clues that will result in correct ontology selection. For example, Benitez, Smith, and Chang introduced a multimedia knowledge representation framework of semantic and perceptual information in “MediaNet: A Multimedia Information Network for Knowledge Representation”, Proc. SPIE 2000 Conference on Internet Multimedia Management Systems (IS&T/SPIE-2000), Vol. 4210, 2000.
Reconciling ontology entries to create a normalized omniscient ontology may be virtually impossible. Thus, choosing the right set of ontologies, and the right set of classifiers for a multimedia artifact is one of the key problems in regard to large simultaneous information feeds of video streams that need to be analyzed and indexed. Statistical approaches to determine both classifiers and ontologies simultaneously need exhaustive evaluation and pruning in order to make an ontology manageable for a large number of classes.
In the absence of a solution that addresses the above situation, selecting the right set of classifiers for multimedia artifacts that are based on the appropriate context and determined by the appropriate ontologies in a large scale classification system, will continue to be a problem.