1. Field of the Invention
The present invention generally relates to retrieval of multimedia data (images, video and audio) from a database and, more particularly, to a system which understands the user""s perception from the query object(s) itself via user interaction, thereby increasing the relevance of the data retrieved from the database, and subsequently increasing the speed of retrieval of the objects of interest.
2. Background Description
Effective utilization of rapidly growing collection of digital images, audio and video necessitates the development of efficient content-based query and retrieval methods. Traditional content-based image retrieval (CBIR) methods primarily focused on finding the xe2x80x9cbestxe2x80x9d representations for the image-centric visual features (e.g., color, texture, shape). During the retrieval process, the user specifies weights to different visual features, and a retrieval system finds similar images to the user""s query based on specified weights. See J. R. Smith and S. F. Chung, xe2x80x9cVisual seek: A fully automated content based image query systemxe2x80x9d, Proc. ACM Multimedia 96, 1996, and W. Y. Ma and B. S. Manjunath, xe2x80x9cNetra: A toolbox for navigating large image databasesxe2x80x9d, Proc. IEEE Int. Conf. on Image Processing, 1997.
Performance of such a computer-centric retrieval paradigm is not satisfactory (i.e., number of images irrelevant to the user is large), essentially due to the gap between high-level concepts (i.e., user""s actual intention) and low-level visual features, and the inherent subjectivity of human perception. On the other hand, in emerging relevance feedback based approach to CBIR, the retrieval process is interactive between the computer and the user. Based on the initial query image, the computer returns a set of similar images from the database. The user assigns relevance to the retrieved images (from highly relevant to irrelevant). The computer tries to correlate the user""s perception of the image in terms of the low-level features, typically by employing some machine learning techniques. It then performs the retrieval process again. This interactive process is repeated until the user finds the image of interest. See Y. Rui, Thomas S. Hung, M. Ortega and S. Mehrotra, xe2x80x9cRelevence Feedback: A powertool in interactive content based image retrievalxe2x80x9d, IEEE Trans. Circuits and Systems for Video Technology, Special Issue on Interactive Multimedia Systems for The Internet, 1998, and C. Nastar, M. Mitschke, C. Meilhac, xe2x80x9cEfficient Query Refinement for Image Retrievalxe2x80x9d, Proc. IEEE CVPR, 1998. This process of repeatedly searching the database can become a bottleneck with the increase in database size and the number of users.
Due to subjectivity in human perception, irrelevant images are frequently retrieved from an image database, given a query by the user. Existing relevance feedback techniques repeatedly search the database which can be remotely located, and understand the user""s perception by downloading relevant and irrelevant images to the user in each search. This repeated database search and download slows down the retrieval speed of the image of interest.
It is therefore an object of the present invention to provide an understanding of the user""s perception from the query object(s) itself via user interaction, thereby increasing the relevance of the multimedia objects retrieved from the database, and subsequently increasing the speed of retrieval of the object of interest. The query can consist of any of the following: an image, an image set, image(s) derived from a video sequence, a video sequence or an audio clip. Although we describe the invention in detail for image retrieval, we provide sufficient examples for video and audio such that anyone skilled in the art can use the same techniques for general media queries.
In this invention, we present a new methodology which incorporates interactive intra-query object relevance feedback and learning to understand the user""s perception about the query object. The query is adjusted using the feedback given by the user about the relevance of previously extracted part(s) from the query object itself, such that the adjusted query is a better approximation to the user""s perception. Since a single query object is utilized in the system according to the invention, high-performance learning techniques, which are often computationally intensive, can be employed for this intra-query object learning of user""s perception. The refined query can be subsequently used using prior-art techniques for inter-query object relevance feedback where data is retrieved from the database based on parameters learned by intra-query object feedback mechanism, and the user provides feedback by ranking the retrieved data in order of their relevance to her/him. In the system according to the invention, inter-query object learning of user""s perception is expedited by utilizing the learned parameters in the intra-query object relevance feedback. Furthermore, the methodology of the invention allows for building refined queries based on part(s) of the query object rather than the entire object itself, thereby reducing irrelevant data being retrieved from the database. Also, unlike existing systems where the query object(s) is restricted to the set of database objects, there is no such restriction in the system according to the invention. In addition, the present invention allows the user to synthesize/modify the query object(s) starting from a textual query or from a set of dictionary objects; e.g., using a xe2x80x9cdrag-and-dropxe2x80x9d approach. The user""s action during the synthesis and modification process is further used to learn his or her perception of the image.