As the amount of available digital media grows exponentially, an inability to efficiently search this media content becomes more apparent. In the past, research has focused on the extraction features at either the low level or the semantic level to aid in indexing and retrieval. However, known techniques for interactively searching (or querying) large media databases are unsatisfactory, and significant challenges in this area remain.
Exploration of a large collection of media data, such as video, images, or audio, is a non-trivial task. When a user approaches a new search task, formulating a query (i.e., search criterion) can be difficult. Most modern search systems provide the ability to search with textual input. These types of systems have been studied by the information retrieval community at large, but several problems become apparent when text-based systems are used to search media content.
First, the choice of correct query words can significantly affect the output of a video search system. Often a user may lack information about which words would best match the content he or she is looking for. Second, when using more advanced systems having automatically detected visual concepts derived from low-level image features and trained with a labeled set of data such as systems disclosed in both S. F. Chang et al., “Columbia University's Semantic Video Search Engine,” ACM International Conference on Image and Video Retrieval, Amsterdam, Netherlands, July 2007, and J. R. Smith et al., “Multimedia semantic indexing using model vectors,” Proceedings of IEEE International Conference on Multimedia and Expo, Baltimore, Md., July, 2003, non-expert users may lack knowledge about the concept vocabulary and accuracy of concept detectors.
Techniques have been proposed for fully automated approaches to combining descriptors of multiple modalities (text, low level features, and concepts). However, these solutions are not well-suited to be directly used in an interactive search system. In such systems, once search results are returned, the user may struggle to efficiently navigate through a large set of media content, and a typical interface showing a linear list of thumbnail images is often not sufficient. Such systems provide little information to help users understand why the given set of results were selected, how each returned image/video/media portion is related to the concepts chosen in the query, and how to efficiently adjust the strategies (e.g., fast-skimming vs. in-depth browsing) for exploring the result set. Such difficulties arise from the fundamental problem of disconnection between search result interfaces and the query criteria. Once the query is evaluated, the query criteria are typically discarded and the user is presented with a set of results without any information regarding the correlation of the search results to the concepts or search criteria that were used to identify those results.
Some visualization techniques have been proposed to assist users in fast browsing and exploration of result sets. However, these techniques do not provide for relating the search results to search criteria, and are unable to dynamically adjust the influence of each query criterion and thereby allow a user to interactively and dynamically modify searches. Thus, there is a need in the art for a technique for searching a media database which provides guided query formulation as well as dynamic and interactive query adaptation.