The use of text-based search engines to locate information, including images, stored within a single computing device or across a great many computing devices accessible via a network (e.g., the Internet) is commonplace. Unfortunately, while text-based searching has proven quite effective in finding items of text (e.g., literature, lyrics, etc.), as well as non-text items that are somehow linked to descriptive text, it often proves of limited value in locating an image (whether a still image or an image of a video) that depicts a particular sought-for item.
A longstanding approach to enabling a search for an image depicting a particular item has been the manual tagging of images, by people, with descriptive text. Unfortunately, given that images are captured by all kinds of people all over the world with varying opinions of what constitutes an effective description, the effectiveness of such descriptive text varies widely. Also, there is a tendency among many people to describe only objects in their captured images that are important to them, thereby failing to describe other objects in their captured images that may be of interest to other people. Further, there are many people who capture images, but never actually tag them with any descriptive text, at all, sometimes simply as a result of finding it difficult to use text-labeling features of image-handling devices to create such textual tags. As a result, there are a great many images that are not tagged with any searchable text description, whatsoever.
One approach to resolving insufficient or missing descriptive text for many images has been to employ people to review large numbers of images and manually create or edit textual tags. Unfortunately, such an approach is time consuming and quickly becomes cost-prohibitive. Another past approach attempts to resolve these problems by automating the textual tagging of images. Specifically, computing devices have been used to scan images, employ various visual recognition algorithms to identify everything that is depicted, and tag those images with automatically-generated text listing the objects identified. Unfortunately, such an approach tends to require considerable computing resources, and can misidentify or fail to identify objects in those images. Further, neither of these approaches can discern what object(s) in those images were of interest to the people who captured them, or the relationships between objects. It is with respect to these and other considerations that the techniques described herein are needed.