1. Field of the Disclosure
The present disclosure generally relates to labeling of images with words and, more particularly, to a method to determine the contents of an image through a computer game.
2. Brief Description of Related Art
There are millions of images on the World Wide Web portion of the Internet (conveniently referred to hereinbelow as “the Internet”) and it is important to have a method that can assign word descriptions to each image (so that the images can be searched and indexed, for instance). Writing a program that can automatically label images in any meaningful way is still impossible. Even recognizing slightly distorted text—a much simpler sub-problem—is hard for current computer programs. To get around this, image search engines on the World Wide Web label images according to file names: an image named “car.jpg”, for instance, is labeled as an image of a car. This method, though somewhat successful, is clearly not optimal. First, there is no reason for anybody other than the person who originally posted the image file to name it accurately, and second, a single file name is not enough to describe the contents of an image. Text appearing adjacent to the images in web pages can also be used as an aid in the labeling process, but most images have little or no associated text, and even when such text is present it can be difficult to process and is oftentimes unstructured and misleading. Thus a significant percentage of all images on the World Wide Web are incorrectly labeled and cannot be found through reasonable search queries.
A possible solution to this problem is manual labeling. Manually labeled image databases such as the Corbis Collection and the Getty Images (which can be viewed at www.corbis.com and www.gettyimages.com, respectively) allow for very accurate search results. However, manually classifying all images on the World Wide Web could be extremely expensive given the sheer volume of the image collection over the Internet (there are over 1 billion images on the World Wide Web at this time).
Over the years there has been considerable artificial intelligence work in the area of automatic determination of the contents of images. The most successful attempts learn from large databases of annotated images (annotations typically refer to the contents of the image, and are fairly specific and comprehensive). Some of these methods cluster image representations and annotations to produce a joint distribution linking images and words. Such methods can predict words for a given image by computing the words that have a high posterior probability given the image. Other algorithms attempt to combine large semantic text models with annotated image structures. Though impressive, such algorithms based on computer learning do not work very well in general settings and work only marginally well in restricted settings. For example, the work described in Duygulu, P., Barnard, K., de Freitas, N., and Forsyth, D. A., Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary, (Seventh European Conference on Computer Vision, 2002, IV 97-112) only gave reasonable results for 80 out of 371 vocabulary words (the evaluation procedure in this study consisted of searching for images using the vocabulary words, and only 80 queries resulted in reasonable images).
Another line of work that is relevant is one that attempts to find specific objects within images. Schneiderman and Kanade (Object Detection Using the Statistics of Parts, International Journal of Computer Vision, 2002), for instance, introduced a method to locate human faces in still photographs. Such algorithms are typically accurate, but have not been developed for a wide range of objects. Additionally, combining algorithms for detecting different objects into a single general-purpose classifier is a non-trivial task. Thus, even a method that can produce reasonable labels (not necessarily good labels) for images in general is desirable.