The enormous and ever-growing amount of images on the web has inspired many important applications related to web image search, browsing, and clustering. Such applications aim to provide users with easier access to web images. An essential issue facing all these tasks is how to model the relevance of images on the web. This problem is particularly challenging due to the large diversity and complex structures of web images. Most search engines rely on textual information to index web images and measure their relevance. Such an approach has some well known drawbacks. Because of the ambiguous nature of textual description, images indexed by the same keyword may come from irrelevant concepts and exhibit large diversity on visual content. More importantly, some relevant images under different keyword indices such as “palm pixi” and “apple iphone” fail to be connected by this approach. Another approach estimates image relevance by comparing visual features extracted from image contents. Various approximate nearest neighbor (ANN) search algorithms (e.g. hashing) have been used to improve the search efficiency. However, such visual features and ANN algorithms are only effective for images with very similar visual content, i.e. near duplicate, and cannot find relevant images that have the same semantic meaning but moderate difference in visual content.
Both of the above approaches only allow users to interact with the huge web image collections at a microscopic level, i.e. exploring images within a very small local region either in the textual or visual feature space, which limits the effective access of web images. Although efforts have been made to manually organize portions of web images, it is derived from a human-defined ontology that has inherent discrepancies with dynamic web images. It is also very expensive to scale.