Digital goods are often distributed to consumers over private and public networks—such as Intranets and the Internet. In addition, these goods are distributed to consumers via fixed computer readable media, such as a compact disc (CD-ROM), digital versatile disc (DVD), soft magnetic diskette, or hard magnetic disk (e.g., a preloaded hard drive).
Unfortunately, it is relatively easy for a person to pirate the pristine digital content of a digital good at the expense and harm of the content owners—which includes the content author, publisher, developer, distributor, etc. The content-based industries (e.g., entertainment, music, film, software, etc.) that produce and distribute content are plagued by lost revenues due to digital piracy.
“Digital goods” is a generic label, used herein, for electronically stored or transmitted content. Examples of digital goods include images, audio clips, video, multimedia, software, and data. Depending upon the context, digital goods may also be called a “digital signal,” “content signal,” “digital bitstream,” “media signal,” “digital object,” “object,” “signal,” and the like.
In addition, digital goods are often stored in massive databases—either structured or unstructured. As these databases grow, the need for streamlined categorization and identification of goods increases.
Hashing
Hashing techniques are employed for many purposes. Among those purposes are protecting the rights of content owners and speeding database searching/access. Hashing techniques are used in many areas such as database management, querying, cryptography, and many other fields involving large amounts of raw data.
In general, a hashing technique maps a large block of raw data into a relatively small and structured set of identifiers. These identifiers are also referred to as “hash values” or simply “hash.” By introducing a specific structure and order into raw data, the hashing function drastically reduces the size of the raw data into a smaller (and typically more manageable) representation.
Limitations of Conventional Hashing
Conventional hashing techniques are used for many kinds of data. These techniques have good characteristics and are well understood. Unfortunately, digital goods with visual and/or audio content present a unique set of challenges not experienced in other digital data. This is primarily due to the unique fact that the content of such goods is subject to perceptual evaluation by human observers. Typically, perceptual evaluation is visual and/or auditory.
For example, assume that the content of two digital goods is, in fact, different, but only perceptually, insubstantially so. A human observer may consider the content of two digital goods to be similar. However, even perceptually insubstantial differences in content properties (such as color, pitch, intensity, phase) between two digital goods result in the two goods appearing substantially different in the digital domain.
Thus, when using conventional hashing functions, a slightly shifted version of a digital good generates a very different hash value as compared to that of the original digital good, even though the digital good is essentially identical (i.e., perceptually the same) to the human observer.
The human observer is rather tolerant of certain changes in digital goods. For instance, human ears are less sensitive to changes in some ranges of frequency components of an audio signal than other ranges of frequency components.
This human tolerance can be exploited for illegal or unscrupulous purposes. For example, a pirate may use advanced audio processing techniques to remove copyright notices or embedded watermarks from audio signal without perceptually altering the audio quality.
Such malicious changes to the digital goods are referred to as “attacks”, and result in changes at the data domain. Unfortunately, the human observer is unable to perceive these changes, allowing the pirate to successfully distribute unauthorized copies in an unlawful manner.
Although the human observer is tolerant of such minor (i.e., imperceptible) alterations, the digital observer—in the form of a conventional hashing technique—is not tolerant. Traditional hashing techniques are of little help identifying the common content of an original digital good and a pirated copy of such good because the original and the pirated copy hash to very different hash values. This is true even though both are perceptually identical (i.e., appear to be the same to the human observer).
Applications for Hashing Techniques
There are many and varied applications for hashing techniques. Some include anti-piracy, content categorization, content recognition, watermarking, content-based key generation, and synchronization in audio or video streams.
Hashing techniques may be used to search on the Web for digital goods suspected of having been pirated. In addition, hashing techniques are used to generate keys based upon the content of a signal. These keys are used instead of or in addition to secret keys. Also, hashing functions may be used to synchronize input signals. Examples of such signals include video or multimedia signals. A hashing technique must be fast if synchronization is performed in real time.