Internet-based search engines traditionally employ common image search techniques for locating digital image content on the World Wide Web. One of these well-known image search techniques can be categorized as “text-based” image searches. Traditional text-based image searches may receive a text-based query used for searching a database having keyword-tagged images to generate a resulting set of images that each has one or more keyword tags matching the text-based query. These text-based searches rely primarily on the quality and the level of detail of the keyword tags in the image database on which the search is conducted. These keyword tags are often provided by automated tagging systems.
Current tagging systems treat tagging as an image classification problem. In these systems, a large number of sample or training images are collected for each possible tag. A classifier can then be trained to determine the most likely tag for a given test image (e.g., an image that has not yet been tagged). However, when the number of tags is very large (e.g., greater than 10000), training each classifier is computationally challenging. Additionally, these systems often ignore rare tags and are unable to assign very specific tags for a given image. Further, the keyword tags propagated by these systems can be corrupted when similar images are annotated by similar annotators. In these instances, even if the images have some differences, because of the similarity of the images and the annotators, they may be annotated with the same tags, commonly referred to as tagging biases. Even further, large sets of data are often clustered to group similar data points that can be utilized by the classifier to distinguish one group (e.g., tag) from another. However, current clustering algorithms often result in imbalanced data, where a majority of data points (e.g, images) are in the same cluster, leaving other clusters with few or no data points (e.g., images). As a result of these and other limitations, such systems are often inadequate in tagging and retrieving real-world images.