The web is a vast source of multimedia content. With the increasing use of images on the web, building an image search engine that is capable of covering and returning the most informative images as well as the associated pages is desirable. Trillions of page hyperlinks (e.g., (URLs) uniform resource locators) can be discovered on the web, and each page may contain multiple images (e.g., more than ten). It is infeasible to index all of those pages for the image search, due to the limitations of the network bandwidth, data storage, and data processing at the offline stage, as well as the response time at the online serving stage.
Existing image page index selection is generally based on the webpage static rank, which measures how important a page is for the web search. However, as a vertical domain, the objective of the image search differs from the broad-based web search in indexing the pages, if and only if the pages contain interesting and informative images. In practice, this difference results in a huge amount of “good” image pages being omitted from the index, since the pages with high webpage static rank may not contain many good images while the pages with interesting images may not be recognized as sufficiently important. It is desirable to enable an image search engine that is capable of covering and returning the most informative images as well as the associated pages.