A recurring problem in Internet usage is the transmission of unauthorized content. One very commercially important example of this problem relates to copyrighted materials. Copyrighted text, music and movies can be transmitted rapidly and cheaply over the Internet, allowing Internet users to easily obtain unauthorized or pirated copies to the detriment of copyright owners. Policing such unauthorized transmission is difficult for copyright owners, because the sources of copyrighted materials may be elusive, or indeed may be legitimate possessors of copyrighted materials but do not have authorization to permit copies to be made. Pursuing the illegal distributors of such materials is problematic because the users are often numerous and diffuse and individual legal action against multiple small users is expensive—as well as unsympathetic from a public relations standpoint when the users turn out to be teenagers or others whose motives are seldom to make a criminal profit.
Approaches to this problem at the source have included attempts to integrate copy-protection measures in the copyrighted materials, but these attempts have met with marginal success as hackers develop—and publish—countermeasures.
A second approach to dealing with the problem at its source is to try to identify Web sites and/or distribution networks/tools that contain copyrighted materials. For example, a form of structural comparison to detect copyright infringement is disclosed in Sergey Brin, James Davis and Hector Garcia-Molina, “Copy Detection Mechanisms for Digital Documents,” Proceedings of the ACM SIGMOD Annual Conference, San Jose 1995 (May 1995). An available version of the paper can be found at http://dbpubs.stanford.edu:8090/pub/showDoc.Fulltext?lang=en&doc=1995-43&format=pdf&compression=&name=1995-43.pdf. This paper discloses a method which determines whether an identified document is a copy of a specific preidentified copyrighted article. As described in the paper “the service will detect not just exact copies, but also documents that overlap in significant ways.” However, the method requires that the document to be tested be available to start with, would seem to require every data transmission to be tested, and thus does not lend itself to real-time application on Web traffic being transmitted at the high data traffic rates of a typical ISP.
In another example, U.S. Pat. No. 6,658,423 to Pugh et al. discloses duplicate and near-duplicate detection techniques for operating a search engine which assign a number of fingerprints to a given document by extracting parts from the document, assigning the extracted parts to one or more of a predetermined number of lists, and generating a fingerprint from each of the populated lists. Two documents are considered to be near-duplicates if any one of their fingerprints matches. This technique is used to find mirrored Web sites, which either are identical to hosts or are “near-duplicate” copies with insignificant content differences from the host. However, the technique would not be a practical solution for locating illicit content transmitted over an ISP network, first, because it involves the work of completely crawling the Web (a process which is neither economical nor quick) to look for near-replicas of specific pages or portions of a Web site.
These approaches have the drawback that they either require both the copyrighted work and the suspected copy to be already available (Brin article) or they require web-crawling of the entire Web content to locate duplicates or near duplicates (Pugh patent). In addition, they do not deal with the majority of today's distribution of copyright infringement that occurs over Peer to Peer (P2P networks.
An approach which attempts to deal with the problem at the destination is to limit access to or block sites having copyrighted content. These approaches are problematic because the sites are often located outside of the US where copyright laws are not easily enforceable. In addition, techniques to block or limit access by US-based consumers can be thwarted either by the consumer or by the end site providing the content.
Other approaches have attempted to detect the Internet transmission of copyrighted material. These approaches require the participation of those managing transmission resources, such as ISPs, and have included deep packet inspection tools to look for specific protocol types or specific files. Other specialized network appliances have been used to investigate the payload of an IP packet to check for copyright infringement such as the comparing service VideoTracker™ offered by Vobile, Inc. of Santa Clara, Calif. While these approaches eliminate many of the drawbacks associated with the source and destination approaches listed above, the combination of vast amounts of content transmitted over the Internet, and high transmission speeds, require these prior art transmission inspection techniques to employ too many resources—both software and hardware—to cope with existing traffic throughput, and accordingly none of these prior art techniques can perform this detection function in a cost effective and timely manner. These prior art techniques have the further drawback that they require a detailed examination of transmissions of all customers—whether or not there is probable cause to believe they are infringing—which implicates issues of customer privacy.
While the detection of pirated copyrighted materials is an example that has high commercial visibility, there are other transmissions of content that are of interest. For example, law enforcement officials are interested in detecting the transmission of illicit content in the form of child pornography. As another example, national security officials, when permitted by governing law, may be interested in detecting the transmission of certain forms of content, such as that relating to bomb or weapons construction.
Accordingly, there remains a need for methods and systems capable of detecting the transmission of specific content, such as copyrighted content, over the Internet in a timely and cost effective manner while still preserving customer privacy.
Additionally, there remains a need for methods and systems which allow an ISP to offer a service to clients, such as copyright owners, to detect the transmission of content of interest, such as copyrighted content, over the ISP's network.