Web site operators sometimes receive DMCA “take down” notices from media companies, alleging that content hosted on their web sites is copyrighted and should not be distributed. There is a growing need for automated tools to help web site operators pro-actively identify such content and treat it in a manner that might avoid the need for take-down notices. This need perhaps is felt most acutely by so-called “social networking” sites, to which individual users upload audio, video and picture files—content that is sometimes original, and sometimes not (and sometimes a combination).
Various techniques can be employed to automatically identify copyrighted content. One is to examine content data for a digital watermark embedded by the content owner to signal that the content is copyrighted and should not be reproduced. Such techniques are detailed, for example, in commonly-owned application Ser. No. 09/620,019, filed Jul. 20, 2000, and patent publication US20020052885.
Another approach is to try and identify the content by pattern recognition techniques (sometimes termed “fingerprinting” or “robust hashing”). Once the content is identified, a metadata database can be consulted to determine whether distribution of the content should be allowed or prohibited. (Such techniques are detailed, e.g., in Haitsma, et al, “A Highly Robust Audio Fingerprinting System,” Proc. Intl Conf on Music Information Retrieval, 2002; Cano et al, “A Review of Audio Fingerprinting,” Journal of VLSI Signal Processing, 41, 271, 272, 2005; Kalker et al, “Robust Identification of Audio Using Watermarking and Fingerprinting,” in Multimedia Security Handbook, CRC Press, 2005, and in patent documents WO02/065782, US20060075237, US20050259819, US20050141707, and US20020028000.)
Other techniques and systems related to the technology detailed herein are disclosed in patent publications US20080051029, US20080059211, US20080027931, US20070253594, US20070242880, US20070220575, US20070208711, US20070175998, US20070162761, US20060240862, US20040243567, US20030021441, U.S. Pat. Nos. 7,185,201, 7,298,864 and 7,302,574, and in provisional application 61/016,321, filed Dec. 21, 2007.
Part of the difficulty is that some of the content uploaded to web sites may include copyrighted material, yet qualify as “fair use”—such as parody, or commentary/criticism. (“Tolerated use” is a stepchild of fair use, and encompasses arguably infringing uses that are commonly overlooked by rights holders for reasons such as concern about adverse publicity, or out of desire for the exposure that such use affords.) Existing automated techniques do not make provision for “fair use” (nor for “tolerated use”). Instead, known techniques typically flag as objectionable any content that is determined to include any copyrighted material.
Described below is an illustrative arrangement that allows a more nuanced assessment of content data—one that responds differently, depending on context, environmental factors, and/or other circumstances.