The growing amount of digital information in the form of video, images, textual and other multimedia documents is driving the need for more effective methods for indexing, searching, categorizing, and organizing the information. Recent advances in content analysis, feature extraction, and classification are improving capabilities for effectively searching and filtering multimedia documents. However, a significant gap remains between the low-level feature descriptions that can be automatically extracted from multimedia content, such as colors, textures, shapes, motions, etc., and the semantic descriptions, such as objects, events, scenes, and people, that are meaningful to users of multimedia systems.
The problem of multimedia indexing can be addressed by a number of approaches that require manual, semiautomatic, or fully automatic processing. One approach uses annotation or cataloging tools that allow humans to manually ascribe labels, categories, or descriptions to multimedia documents. For example, authors M. Naphade, C.-Y. Lin, J. R. Smith, B. Tseng, and S. Basu, in a paper entitled “Learning to Annotate Video Databases,” IS&T/SPIE Symposium on Electronic Imaging: Science and Technology—Storage & Retrieval for Image and Video Databases X, San Jose, Calif., January, 2002, describe a video annotation tool that allows labels to be assigned to shots in video. The authors also teach a semiautomatic method for assigning labels based on active learning. Fully-automatic approaches are also possible. For example, authors M. Naphade, S. Basu, and J. R. Smith teach, in “A Statistical Modeling Approach to Content-based Video Retrieval,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-2002), May, 2002, methods for automatically assigning labels to video content based on statistical modeling of low-level visual features. The automatic labeling technique is useful for allowing searching of video based on the automatically assigned labels, however, the indexing is limited to matching values of a small vocabulary, such that if the user enters a search term that does not match one of the label terms, then the search does not find any target multimedia documents.
Given that automatic systems are improving capabilities for assigning labels, categories, and descriptions to multimedia documents, new techniques are needed that leverage these descriptions to provide more meaningful ways for indexing, searching, classifying and clustering these documents using the descriptions. Furthermore, the systems should take into account the uncertainty or reliability of the automatic systems as well as the relevance of any labels, categories, or descriptions assigned to multimedia documents in order to provide an effective index.
It is, therefore, an objective of the present invention to provide a method and apparatus for indexing multimedia documents using a model vector representation that captures the results of any automatic labeling and its corresponding scores, such as confidence, reliability, and relevance.
It is another objective of the invention to use the model vector representation in applications of information discovery, personalizing multimedia content, and querying of a multimedia information repository.