The growing amounts and importance of digital video data are driving the need for more complex techniques and systems for video and multimedia indexing. Some recent techniques include extracting rich audio-visual feature descriptors, classifying multimedia content and detecting concepts using statistical models, extracting and indexing speech information, and so forth. While progress continues to be made on these directions to develop more effective and efficient techniques, the challenge remains to integrate this information together to effectively answer user queries of multimedia repositories.
There are a number of approaches for multimedia database access, which include search methods based on the above extracted information, as well as techniques for browsing, clustering, visualization, and so forth. Each approach provides an important capability. For example, content-based retrieval (CBR) allows searching and matching based on perceptual similarity of video content. On the other hand, model-based retrieval (MBR) allows searching based automatically extracted labels and detection results. For example, M. Naphade, et al., “Modeling semantic concepts to support query by keywords in video,” IEEE Proc. Int. Conf. Image Processing (ICIP), September 2002, teaches a system for modeling semantic concepts in video to allow searching based on automatically generated labels. New hybrid approaches, such as model vectors allow similarity searching based on semantic models. For example, J. R. Smith, et al., in “Multimedia semantic indexing using model vectors,” in IEEE Intl. Conf. on Multimedia and Expo (ICME), 2003, teaches a method for indexing multimedia documents using model vectors that describe the detection of concepts across a semantic lexicon. Text-based retrieval (TBR) applies to textual forms of information related to the video, which includes transcripts, embedded text, speech, metadata, and so on. Furthermore, video retrieval using speech techniques can leverage important information that often cannot be extracted or detected in the visual aspects of the video.
A typical video database system provides a number of facilities for searching based on feature descriptors, models, concept detectors, clusters, speech transcript, associated text, and so on. These techniques are broadly classified into three basic search functions: content-based retrieval (CBR), model-based retrieval (MBR), and text-based retrieval (TBR), which will now be discussed.
Content-based retrieval (CBR): Content-based retrieval (CBR) is an important technique for indexing video content. While CBR is not a robust surrogate for indexing based on semantics of image content (scenes, objects, events, and so forth), CBR has an important role in searching. For one, CBR compliments traditional querying by allowing “looks like” searches, which can be useful for pruning or re-ordering result sets based on visual appearance. Since CBR requires example images or video clips, CBR be only typically be used to initiate the query when the user provides the example(s), or within an interactive query in which the user selects from the retrieved results to search the database again, CBR produces a ranked, scored results list in which the similarity is based on distance in feature space.
Model-based retrieval (MBR): Model-based retrieval (MBR) allows the user to retrieve matches based on the concept labels produced by statistical models, concept detectors, or other types of classifiers. Since both supervised and unsupervised techniques are used, MBR applies for labels assigned from a lexicon with some confidence as well as clusters in which the labels do not necessarily have a specific meaning. In MBR, the user enters the query by typing label text, or the user selects from an inverted list of label terms. Since a confidence score is associated with each automatically assigned label, MBR ranks the matches using a distance D derived from confidence C using D=1−C. MBR applies equally well in manual and interactive searches, since it can be used to initiate query, or can be applied at intermediate stage to fuse with prior search results.
Text-based retrieval (TBR): Text-based retrieval (TBR) applies to various forms of textual data associated with video, which includes speech recognition results, transcript, closed captions, extracted embedded text, and metadata. In some cases, TBR is scored and results are ranked. For example, similarity of words is often used to allow fuzzy matching. In other cases, crisp matching of search text with indexed text, the matches are retrieved but not scored and ranked. As in the case for MBR, TBR applies equally well in manual and interactive searches.
Given these varied multimedia database search approaches, there is a great need to develop a solution for integrating these methods of data sources given their complimentary nature to bring the maximum resources to bear on satisfying a user's information need from a video database.