The following relates to the information processing arts, information retrieval arts, classification and clustering arts, and related arts.
Information retrieval systems provide a user-friendly interface by which a user can retrieve documents from a database that are relevant to or match a query. Typically, an information retrieval system ranks a “top N” documents that best match the query. An example of such a system is an Internet search engine.
Many information retrieval systems are text-based. That is, the information retrieval system receives a textual query and searches textual content of documents for similarities with the textual query, such as the same or similar words or terms, common semantic content (based, for example, on derivation of semantically related words determined using an on-line thesaurus), or so forth. In a more complex approach, language models may be developed to represent the query and documents to be searched, and the information retrieval is based on similarity of query and document language models.
Advantageously, textual content is commonplace, and can be efficiently stored and searched. However, digital information repositories enable storage and processing of information in many different media types or modalities, such as text, images, audio, video, or so forth. It is not unusual for a single document to include content of two or more different media types or modalities. Many, and perhaps most, Internet websites today include both text and images. Numerous Internet sites further include audio content, video content, and/or further media modalities.
In view of this, there is interest in information retrieval systems that are capable of retrieving documents based on non-textual content. Toward this end, it is known to represent image content in the form of image “features” that are believed to have semantic significance, that is, to be discriminative of the subject matter depicted in the image. For example, a feature indicating the fractional image content that is blue or green or bluish or greenish may be useful for detecting seascapes. A feature indicating a characteristic mammalian shape may be useful in detecting images of animals. Facial recognition features are also known that are indicative of human facial images, and so forth. Features can also be defined for other modalities. For example, a feature indicative of audio pitch may be useful for discriminating between male and female voice audio. The features based paradigm is also applicable to text, by defining textual features such as counts of semantically rich terms and so forth. Depending upon the available text layout information, textual features may also include layout information such as font type, column layout, or so forth. (For example, if a particular medical journal is published in a distinctive font, then the font type feature may be highly discriminative for identifying articles from that medical journal).
In sum, it is known that for a given media type or modality one can identify semantically discriminative features. One can therefore generate information retrieval systems for the various types of media, for example for text content, image content, video content, audio content, or so forth. For example, an image-based information retrieval system may operate by comparing features of a query image with features of images in an image repository.
Extending information retrieval to cross-media operation is more difficult. For example, given an image, one may wish to retrieve documents with textual content semantically related to the subject matter of the image. However, there is a “semantic gap” in that semantically relevant image features typically have no discernable analog in textual features, and vice versa.
For multimedia, some common approaches employ pseudo-relevance feedback. To illustrate using a query image as an example, one may perform a first information retrieval operation limited to image content on a multimedia reference repository to identify multimedia documents including images that are similar to the query image. The results of this first information retrieval operation are used to enrich the query with textual content. For example, if the image is a seascape, the first information retrieval operation is likely to return many multimedia documents relating to the sea, nautical themes, or the like. In these returned multimedia documents one may expect to identify nautically related terms such as “ocean”, “water”, “boat”, or so forth, and these derived terms may be used to enrich the original image query with textual query content. This textual query content may in turn be used in a second information retrieval operation limited to textual content to retrieve additional multimedia documents related to the textual query including “ocean”, “water”, “boat”, or so forth. The results of the first and second query operations then may be fused or combined to produce final query results, some of which may be cross-media in character (that is, some documents may have little or no image content that is similar to the query image, but may have instead been retrieved due to nautically related textual content alone).
Brief Description
In some illustrative embodiments disclosed as illustrative examples herein, a multimedia information retrieval method performed by an electronic device is disclosed, the method comprising: performing an initial information retrieval process respective to a multimedia reference repository to return a set of initial repository documents; computing values of at least one monomodal pairwise similarity measure for candidate documents of the multimedia reference repository respective to repository documents of the set of initial repository documents; and identifying a set of top ranked documents of the multimedia reference repository based at least in part on the values computed for the candidate documents.
In some illustrative embodiments disclosed as illustrative examples herein, a multimedia information retrieval method performed by an electronic device is disclosed, the method comprising: performing an initial monomodal information retrieval process operating on a first media modality to retrieve a set of initial repository documents from the multimedia reference repository; and identifying a set of top ranked documents of the multimedia reference repository based at least in part on pairwise similarity measure values respective to a second media modality different from the first media modality for document pairs that include documents of the set of initial repository documents.
In some illustrative embodiments disclosed as illustrative examples herein, a multimedia information retrieval system is disclosed, comprising a storage and an electronic processing device configured to perform a process including: performing an initial monomodal information retrieval process respective to a multimedia reference repository to return a set of initial repository documents, the monomodal information retrieval process operating on one member of the group consisting of text content and image content; and identifying a set of top ranked documents of the multimedia reference repository based at least on pairwise similarity measure values indicative of similarity with documents of the set of initial repository documents, the pairwise similarity measure values being indicative of similarity respective to the other member of the group consisting of text content and image content.