Field
This disclosure relates generally to image annotation.
Background
As the availability of information grows, due at least in part to the advancements in computing technology and the growth of the Internet, searching for information has taken great importance. In order to take advantage of the massive amounts of network accessible data, such as text, image, and video data, each of these data types should be made searchable.
Searching for images presents many difficulties that are generally not encountered in searching text collections. Images, unlike text, do not necessarily have any uniform characters that are used across a broad spectrum of images. Images may include any number of characteristics, objects, and/or objects with any number of characteristics. The descriptions of the same image by different persons may be substantially different. Decisions must be made as to what feature(s) of an image are most important and should be described. The most apt description for each of the features, and for combinations of those features may also need to be decided. The annotating of images with one or more labels may facilitate searching for images. However, inaccurate tagging can lead to numerous situations of false positives and false negatives.
Image annotation methods include manual tagging where users assign one or more labels to describe the image. Labels may also be automatically generated based on metadata of images, such as, for example, location information, user information, and date and time of image capture information. However, manual tagging may not be scalable to the task of annotating the millions of images that are network accessible.
Other image annotation methods include latent Dirichlet allocation, probabilistic latent semantic analysis, and hierarchical Dirichlet processes, which require that the joint distribution over image features and annotations is learned. Requiring the determination of the joint distribution over image features and annotations can make these approaches difficult to scale to the large number of images available in web-scale environments. Methods based on discriminative models, nearest neighbor methods, and methods that rely on prior domain knowledge are also used for image annotation.