Visual databases exist that organize millions of digital images according to meaningful concepts described by synsets. A synset may include a set of one or more keyword units from the same lexical category (e.g., nouns, verbs, etc.) that are roughly synonyms of each other. A keyword unit may consist of a single word or a phrase or other logical grammatical unit.
A visual database may associate multiple digital images with each synset. Each of the images associated with a synset may have content that is representative of the concept described by the synset. For example, the concept of the act of jumping as far as possible from a standing or running start might correspond to a synset with the keyword units “broad jump” and “long jump.” This synset, then, may be associated with images in the database containing images of a person broad jumping or long jumping, as in a track and field contest, for example.
Images in a visual database may be associated with tens of thousands of different synsets spanning a range of different concepts. For example, the IMAGENET visual database currently contains over fourteen million digital images associated with over twenty-one thousand different synsets. The synsets of IMAGENET themselves are based on a large lexical database known as WORDNET. More information on IMAGENET is available on the Internet in the image-net.org domain, the entire contents of which is hereby incorporated by reference. More information on WORDNET is available on the Internet in the wordnet.princeton.edu domain, the entire contents of which is hereby incorporated by reference. Another visual database is the OPEN IMAGES database which is available on the Internet at /openimages/web/index.html in the storage.googleapis.com domain, the entire contents of which is hereby incorporated by reference.
Using computer vision techniques, deep convolutional artificial neural networks can be trained based on visual databases such as IMAGNET to classify input digital images as to image content in different image content classes with a relatively high degree of accuracy. A classification of an image by such a trained network may indicate one or more image content classes that correspond to one or more synsets. For example, the trained network may output a set of softmax values for a given image. The set of softmax values may indicate, for example, that the most probable IMAGENET image content class for the given image is the image content class with identifier “n00440382,” which corresponds to a synset containing the keyword phrases “broad jump” and “long jump.” In likewise manner, a set of images can each be classified as to image content in one or more different image content classes.
After images are classified as to image content in image content classes, it may be desirable to electronically retrieve certain images. For example, a user of a cloud-based content management service such as, for example, the DROPBOX cloud-based content management service provided by Dropbox, Inc. of San Francisco, Calif., may wish to search or browse through the user's digital photos hosted with the service that are images of the user's pet poodle. To facilitate this, the service may provide a user interface to the user that allows the user to enter and submit the keyword unit “poodle” and then receive results that indicate ones of the user's images that are deemed relevant to that keyword unit.
One possible approach to identify which of the user's digital photos are relevant to a given keyword unit is to index the digital photos in an inverted keyword unit index by the keyword units of the synsets associated with the image content classes to which the digital photos belong, as determined by a trained image classifier. For example, a digital photo of a poodle may be classified in the IMAGENET image content class “n02111277.” The synset associated with this image content class contains the keyword units: “dog,” “domestic dog,” and “Canis familiaris.” The photo of a poodle may be indexed in an inverted keyword unit index by these keyword units of the associated synset. However, since this synset does not include the keyword unit “poodle,” then the digital photo may not be indexed in the inverted keyword unit index by the keyword unit “poodle.” As a result, the photo, which is clearly relevant to the user's query, may not be identified as relevant to the query. Overall, the inverted keyword index approach may not identify images that are relevant to an input keyword unit that is roughly synonymous to, but not included in, the set of keyword units of the synsets by which the images are indexed in the inverted keyword unit index.
The present invention address this and other issues.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art, or were well-known, routine, or conventional, merely by virtue of their inclusion in this section.