The exemplary embodiment relates to information retrieval and finds particular application in connection with multimedia information retrieval.
Retrieval systems enable selective retrieval of digital objects (e.g., text documents, images, audio files, video files, sound recordings, multimedia documents such as web pages, and the like) from a database (for example, a dedicated database, Internet accessible database content, or some other collection of documents). Retrieval systems can be useful as stand-alone systems, for example being employed by a user to retrieve documents of interest to the user, or can serve as component of another system, such as an object annotation system. To retrieve digital objects, a query is submitted which may be textual, e.g., keywords, or non-textual, such as an image or information extracted from an image. The retrieval system may output the top K most similar objects which are responsive to the query.
Digital information is no longer mono-modal. Web pages can contain text, images, animations, sound and video. Photographs on photo-sharing websites often have tags and comments. The same is true for corporate documents and document collections. This shift in the way content is stored has generated a need for tools that enable interaction with multi-modal information.
Retrieval systems have been developed which provide multimedia functionality. These systems retrieve objects that include content of more than one type of medium. One multimedia retrieval method employs a query including information represented by two or more different media types. For example, a query which includes both textual and visual parts. The textual part of the query is used to access the textual component of the objects being queried and the visual component of the query is used to access the visual component of the objects. Another multimedia retrieval operation uses cross-media relevance querying in which a query whose content is of purely one media type (e.g., a stand-alone image) is used to retrieve multimedia objects, based on their relevance to the mono-modal query. The non-queried textual content of the multimedia objects retrieved is then used to form further queries.
One problem which arises in combining the results of two types of query is how to fuse the results to provide meaningful multimedia information retrieval. Current data fusion techniques, such as late fusion methods, use a weighting scheme to try to account for the relative importance of the two types of media, such as text and images, to the query. However, the results of the search can be highly dependent on the weights used and the type of database in which the search is performed.