There are many applications of content-based audio copy detection. It can be used to monitor peer-to-peer copying of music or any copyrighted audio over the internet. Global digital music trade revenues exceeded $3.7 billion in 2008 (“IFPI digital music report 2009” www.ifpi.org/cpmtemt/library/dmr2009.pdf) and grew rapidly in 2009 to reach an estimated $4.2 billion (“IFPI digital music report 2010” www.ifpi.org/cpmtemt/library/dmr2010.pdf). The digital track sales in the US increased from $0.844 billion in 2007 to $1.07 billion in 2008, and to $1.16 billion in 2009. These figures do not include peer-to-peer download of music and songs that may or may not be legal. The music industry believes that peer-to-peer file sharing has led to billions in lost sales. Fast and effective copy detection will allow ISPs to monitor such activity at a reasonable cost.
Content-based audio copy detection can also be used to monitor advertisement campaigns over TV and radio broadcasts. Many companies that advertise not only monitor their advertisements, but follow the ad campaigns of their competitors for business intelligence purposes. Worldwide, the TV and radio advertising market amounted to over $214 billion dollars in 2008. In the US alone, TV and radio advertisements amounted to over $82 billion dollars in 2008.
Currently, monitoring of ad campaigns is being offered as a service by many companies worldwide. Some companies offer watermarking for automated monitoring of ads. In watermarking audio, a unique code is embedded in the audio before it is broadcast. This code can then be retrieved by watermark monitoring equipment. However, watermarking every commercial and then monitoring by specialized equipment is expensive. Furthermore, watermarking only allows companies to monitor their own ads that have been watermarked. They cannot follow the campaigns of their competitors for business intelligence. Content-based audio copy detection would alleviate many such constraints imposed by watermarking.
Published papers from the audio copy detection and advertisement detection fields show that the two fields have evolved differently. In audio copy detection (J. Haitsma, T. Kalker, “A highly robust audio fingerprinting system”, [online] ismir2002.ismir.net/proeeedings/02-FP04-2.pdf and Y. Ke, D. Hoiem, and R. Sukthankar, “Computer vision for music identification”, Proc. Compo Vision Pattern Recog., 2005), the emphasis is on speed, since the alleged copy is compared with a large repository of copyrighted audio pieces. A small percentage of misses will not make a big difference so long as most of the copies are captured. The system has to be robust under various coding schemes and distortions that audio may go through over the Internet. Fast audio copy detection uses audio fingerprints. The audio fingerprints proposed by Haitsma and Kalker (J. Haitsma, T. Kalker, “A highly robust audio fingerprinting system”, [online] ismir2002.ismir.net/proeeedings/02-FP04-2.pdf) have been found to be quite robust to various distortions of the audio signals. These fingerprints have been used for music search (N. Hurley, F. Balado, E. McCarthy, G. Silvestre, “Performance of Phillips Audio Fingerprinting under Desynchronisation,”, [online] ismir2007.ismir.net/proceedings/ISMIR2007_p133_hurley.Pdf). These fingerprints have also been proposed for controlling peer-to-peer music sharing over the Internet (P. Shrestha, T. Kalker, “Audio Fingerprinting In Peer-to-peer Networks,”, [online] ismir2004.ismir.net/proceedings/p062-page-341-paper91.pdf), and for measuring sound quality (P. Docts, R. Lagendijk, “Extracting Quality Parameters for Compressed Audio from Fingerprints,”, [online] ismir2005.ismir.net/proceedings/1063.pdf). These audio fingerprints use energy differences in consecutive bands to generate a feature expressed in 32 bits. The audio search using these fingerprints is speeded up by looking for exact match of these 32 bits in the stored repository. A more complete search is only performed around the frames corresponding to these matching fingerprints. This complete search involves computing bit matches and a threshold in order to find matching segments. This search is expensive because of the computing involved in the bit matching.
In contrast, within the advertisement detection field, the emphasis is focused more on finding all the ads broadcast in the campaign (M. Covell, S. Baluja, and M. Fink, “Advertisement Detection and Replacement using Acoustic and Visual Repetition”, IEEE Workshop multimedia sig. proc., October 2006, pp. 461-466) (P. Duygulu, M. Chen, and A. Hauptmann, “Comparison and combination of two novel commercial detection methods”, Proc. ICME, 2004, pp. 1267-1270) (V. Gupta, G. Boulianne, P. Kenny, and P. Dumouchel, “Advertisement Detection in French Broadcast News using Acoustic repetition and Gaussian Mixture Models”, Proc. InterSpeeeh 2008, Brisbane, Australia). This type of search is generally exhaustive. The process is speeded up by first using a fast search strategy that overgenerates the possible advertisement matches. These matches are then compared using a detailed match. In many instances, the detailed match includes comparing video features, although in some instances, the same audio may be played even though the video frames may be different.
Accordingly, there exists in the industry a need to provide improved solutions for content-based copy detection.