The invention relates generally to digital data. More particularly, the invention relates to a method and system for generating a characteristic identifier for digital data and for detection of identical digital data.
In recent years, an increasing amount of audio data is recorded, processed, distributed, and archived on digital media using numerous encoding and compression formats, such as WAVE, AIFF (Audio Interchange File Format), MPEG (Motion Picture Experts Group), and REALAUDIO. Transcoding or resampling techniques that are used to switch from one encoding format to another almost never produce a recording that is identical to a direct recording in the target format. A similar effect occurs with most compression schemes. Changes in the compression factor or other parameters result in a new encoding and a bit stream that bears little similarity to the original bit stream. Both effects make it rather difficult to establish the identity of one audio recording stored in two different formats. Establishing the possible identity of different audio recordings is a pressing need in audio production, archiving, and copyright protection.
During the production of a digital audio recording, usually numerous different versions in various encoding formats come into existence as intermediate steps. These different versions are distributed over a variety of different computer systems. In most cases, these recordings are not cross-referenced and often it has to be established by listening to the recordings whether two versions are identical or not. An automatic procedure will greatly ease this task.
A similar problem exists in audio archives that have to deal with material that has been issued in a variety of compilations (such as Jazz or popular songs) or on a variety of carriers (such as the famous recordings of Toscanini with the NBC Symphony orchestra). Often the archive version of the original master of such a recording is not documented and in most cases it can only be decided by listening to the audio recordings whether a track from a compilation is identical to a recording of the same piece on another sound carrier.
Copyright protection is a key issue for the audio industry. Copyright protection is even more relevant with the invention of new technology that makes creation and distribution of copies of audio recordings a simple task. While mechanisms to avoid unauthorized copies solve one side of the problem, it is also required to establish processes to detect unauthorized copies.
According to one aspect of the present invention, a characteristic identifier for digital data is generated. The information contained in the data is thereby reduced such that the resulting identifier is made comparable to another identifier. Identifiers generated according to the present invention are resistant against artifacts that are introduced into digital data by all common compression techniques. Using such identifiers therefore allows the identification of identical digital data independent of the chosen representation and compression methods.
Furthermore, the generated identifiers are used for detecting identical digital data. It is decided whether sets of digital data are identical depending on the distance between the identifiers belonging to them. A faster, cheaper and more reliable process of detection of identical digital data is established.
In a preferred embodiment of the present invention, the digital data is a digital audio signal and the characteristic identifier is called an audio signature. The comparison of identical audio data according to the invention can be carried out without a person actually listening to the audio data.
The present invention can be used to establish automated processes to find potential unauthorized copies of audio data, e.g., music recordings, and therefore enables a better enforcement of copyrights in the audio industry.
A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings