Content-sharing services that operate through websites by which members of the public can upload and share items of content such as photographs and files are commonly subject to abuse. Such abuse may often involve the user's violation of the terms of use that are typically imposed as conditions for using a service. Such violations may include, for example, uploading of copyrighted materials without the permission of the copyright owner and uploading of inappropriate or offensive materials. In addition, third party websites may utilize the services' servers as large object stores for items such as banner advertisements and stock images in a commercial context in violation of the terms of use. In this latter case, the website will often serve an “href” link destination in its HTML (Hypertext Markup Language) code to a file in the external large object store in order to circumvent costs associated with local storage and bandwidth. This then causes the content-sharing website which is serving the files to incur the storage and bandwidth costs.
Abuse of content-sharing services can often be difficult to detect. Traditionally, detection is performed by an administrator or analyst who parses logs generated by the service's web servers and then tabulates hit counts for items being downloaded. If the administrator determines the hit count to be excessive, which can often be evidence of abuse, the administrator can then take steps to confirm the abuse. Unfortunately, this detection approach can be time consuming, expensive, and computationally-intensive as it generally involves importing some portion of the logs into a database and performing queries.
Another difficulty is that the service may utilize many servers. Thus the examination of a single log may not necessarily result in detection of abuse if the abusive item is downloaded across multiple servers, but not excessively so from any given server to arouse scrutiny. But while such traditional abuse detection approach may still be effective, a significant drawback is that it cannot be performed in real time or near real time with the occurrence of the abuse. Because the databases containing the server logs can be very large, often exceeding a terabyte in size, the importing and querying can take hours or even days to complete. This can result in an increase in the service's costs and an increase in the time of exposure for the abusive items on the website.
This Background is provided to introduce a brief context for the Summary and Detailed Description that follow. This Background is not intended to be an aid in determining the scope of the claimed subject matter nor be viewed as limiting the claimed subject matter to implementations that solve any or all of the disadvantages or problems presented above.