Text-based search engines traditionally retrieve relevant images or other elements (e.g., videos, documents, etc.) based on textual information. For example, a search engine may receive an image query “car,” and subsequently search billions of documents and ultimately provide relevant images based on textual information in the documents. These traditional search engines may determine “relevance” by considering such factors as link structure, link text, anchor text, or any other suitable textual information. For example, web images may be indexed with words from image titles, surrounding texts, or the like such that search engines determine a web image's relevance based on the image's title rather than the visual content of the image itself.
While text-based search engines may work well in returning the text-based documents, text-based search engines do not take into account the visual information of images, and therefore may provide inaccurate image-based query results.