With the advent of inexpensive digital cameras, camera phones, and other imaging devices, the number of digital images being taken and posted on the internet has grown dramatically. However, to use these images they must be identified and organized so that they may be browsed, searched, or retrieved.
One solution is manual image annotation in which a person manually enters descriptive text or keywords when the image is taken, uploaded, or registered. Although manual image annotations are generally very accurate (e.g., people generally select accurate descriptions), manual image annotation is time consuming and consequently many digital images are not annotated. In addition, manual image annotation can be subjective in that the person annotating the image may disregard the key features of an image (e.g., people typically annotate images based on the person in the image, when the image is taken, or the location of the image).
Another solution is automatic image annotation which annotates images with keywords automatically. Generally, automatic image annotation is either classification-based or probabilistic modeling-based. Classification-based methods attempt to associate words or concepts by learning classifiers (e.g., Bayes point machine, support vector machine, etc.). While probabilistic modeling methods attempt to infer the correlations or joint probabilities between images and the annotations (e.g., translation model, cross-media relevance model, continuous relevance model, etc.).
While classification-based and probabilistic-based image annotation algorithms are able to annotate small scale image databases, they are generally incapable of annotating large-scale databases with realistic images (e.g., digital pictures).
Moreover, these image annotation algorithms are generally incapable of annotating all the various types of realistic images. For example, many personal images do not contain textual information while web images may include incomplete or erroneous textual information. While current image annotation algorithms are capable of annotating personal image or web images, these algorithms are typically incapable of annotating both types of images.
Furthermore, in large-scale collections of realistic images the number of concepts that can be applied as annotation tags across numerous images is nearly unlimited, and depends on the annotation strategy. Therefore, to annotate large-scale realistic image collections the annotation method should be able to handle the unlimited concepts and themes that may occur in numerous images.
Lastly, given the sizeable number of images being generated everyday, the annotation method must be fast and efficient. For example, approximately one million digital images are uploaded to the FLICKR™ image sharing website each day. To annotate one million images per day, approximately ten images per second must be annotated. Since the best image annotation algorithm annotates an image in about 1.4 seconds, it is incapable of annotating the large number of images that are generated daily.
Accordingly, there is a need for a large-scale image annotation technique that can annotate all types of real-life images, containing an unlimited number of visual concepts, and that can annotate images in near real time.