The amount of digital content users create and view is growing at an exponential rate. It is common for web providers to operate databases with petabytes of data, while leading content providers are already looking toward technology to handle exabyte implementations. One social networking website, for example, has attracted over a billion users including over 30 million businesses and 1.5 million advertisers. In addition, 890 million of these users visit the website each day for an average of 40 minutes per day, viewing user posts and other content such as advertisements, promotions, events, news stories, etc. Much of this viewed content includes images. Users of this website, for example, have posted over 250 billion photos, with another 350 million photos being posted each day.
Sometimes these images include proprietary content such as items subject to trademark, copyright, or patent protection and for which the content poster does not have the appropriate rights to use the content. In addition, these images may contain other information which can be valuable to classify such as who appears in the images, what letters are being shown, whether the content includes objectionable portions, etc. However, due to the sheer volume of images, it is infeasible for each image to be analyzed by a human.
Several computational approaches have been used attempting to match images to known images or to identify image features. For example, neural networks have been used to analyze images and classify them for particular features. As another example, image comparison has been attempted by applying warp transformations to images as a pre-processing step to matching. However, these methods are prone to accuracy errors.
The techniques introduced here may be better understood by referring to the following Detailed Description in conjunction with the accompanying drawings, in which like reference numerals indicate identical or functionally similar elements.