1. Field
This disclosure relates generally to detection of placeholder images.
2. Background
It is often the case that a website displays one or more images that clearly do not correspond to the context of the currently displayed webpage. Websites may have generic images that are displayed in lieu of the actual image corresponding to the particular context of the webpage. For example, in a listing of people, a generic image with the label “Image Not Available” may be displayed as the corresponding image next to the names of one or more people. In another example, a shopping web site may display generic images with labels such as “No Image Available,” “coming soon,” “under construction,” “placeholder,” “photo coming soon,” or other label indicating that the displayed image is not the actual image corresponding to the particular context of the webpage.
The generic images that are used in place of actual images that are unavailable are referred to as “placeholder images.” A placeholder image can be any image that is used in place of the actual image for any reason, such as when the latter image is unavailable. Placeholder images are found in many forms.
Although helpful in communicating to the user that the actual corresponding image is unavailable, placeholder images can diminish the user experience by reducing the quality of the presented results. For example, a user may be presented with the results of a product search where a majority of the products are displayed as placeholder images instead of actual images of the respective products.
Placeholder images can also reduce the accuracy of image search results. For example, depending on how the search query is structured, or based on how the system proceeds to search and categorize results, placeholder images may skew the search results by appearing in one or more of the resulting image sets.
Tools to efficiently identify placeholder images can be used to improve the user-experience, and also to improve the accuracy of the search results. For example, by detecting the presence of a substantial number of placeholder images in a webpage to be presented to a user, a webpage rendering program can reformat the webpage so that the products that have available corresponding images are displayed first or more prominently than the products that have only placeholder images. Likewise, placeholder images can be detected and removed from the result set before a user is presented with the results of a search.
Due to the large number of websites that are accessible, there may be numerous forms of placeholder images. It may be desired that a method to detect placeholder images should be capable of identifying placeholder images that appear in any websites that are accessible over the Internet.
The manual identification of placeholder images becomes highly inefficient when the image corpora from which to detect the placeholder images is large. Other methods, such as the use of optical character recognition (OCR) to determine whether images include words such as “Image Not Available,” may be available. However, such approaches based on OCR may not be sufficiently accurate because of the numerous variations of the words, different languages, and also because many placeholders do not include any characters. Other approaches to identify placeholder images include using a detector trained on a large number of such images, and building a placeholder image model for each type of image (and possibly, one per merchant). However, conventional techniques such as those described above are not sufficiently scalable to detect placeholder images that may appear in the numerous websites that are accessible over the Internet.