The advent of digital media, such as digital speech, audio, graphics, images, and video, has significantly improved many existing applications, as well as introduced many new applications. This is due, in large part, to the relative ease by which digital media may be stored, transmitted, searched, and accessed.
Media data is often analog data that is converted into digital data using e.g., Pulse Coded Modulation (PCM), which may result in the generation of a significant amount of digital data. As an example, high quality PCM digital music is available on Compact Disk (CD). When a music CD is encoded with stereo PCM digital music at a sampling rate of 44.1 kHz with 16 bits per sample (generating a raw data rate of 1411 kbits/s), the music CD is capable of storing about 650 megabytes of digital music with error correction (about 64 minutes of music) and about 746 megabytes of digital music without error correction (about 74 minutes of music).
Unfortunately, due to the size of the above-described PCM digital music files, when downloading/transferring such digital music files, the download/transfer may take a considerable amount of time. Thus, for efficient communication, storage and/or transmission of digital music files, the digital music files may be compressed using one of a plurality of compression techniques (e.g., MPEG and ITU-T standards committees, as well as proprietary solutions).
Over time the approaches used for compression have grown very sophisticated. Indeed, these approaches may allow high compression for audio that can reach a factor between 5 and 15, while still producing very high psychoacoustic quality that is similar to the uncompressed audio. Moreover, the compression factor attainable for video is even higher, and depending on the resolution, it can vary between factors of 10 and 100. For instance, due to such compression schemes, digitized movies with standard television resolution and quality have been available on Digital Video Discs (DVD's) for approximately 10 years, and they are now also available in High Definition format as well.
However, digital multimedia, if unprotected, also brings with it an increase in the risk of piracy. For one, the process of copying of digital multimedia content does not incur any additional loss of quality due to multigenerational effects (associated with analog audio or video tapes), as e.g., the one millionth copy is identical to the original. This is not only a problem with uncompressed multimedia, but even more so with compressed multimedia. With modest compression factors of, e.g., 5, the quality of music can remain perceptually perfect when state of the art compression schemes are used, while enabling a music album to be downloaded/transferred 5 times faster (e.g., in about 6 minutes instead of 30 minutes).
Furthermore, extraction tools are freely available on the internet for ripping of CDs and protected DVDs, as well as tools for re-compressing multimedia content in various formats. Coupled with the advent of peer-to-peer (P2P) networking, large multimedia files can be easily posted on the internet and illegally shared with millions of users. This results in a significant amount of piracy and, thus, lost revenues for content owners. Further, this type of piracy requires complex monitoring to determine the identity of pirates and downloaders of pirated content. Thus, digital multimedia, if unprotected, can pose a significant challenge to preventing piracy.
To address this challenge, a committee effort was launched in 1998 for developing a Secure Digital Music Initiative (SDMI) standard that comprised a specification for portable devices and an overall architecture for delivery of digital music. Digital watermarks were proposed as a key component of the SDMI system. The embedded watermarks, when extracted by a suitable detector, could be used to control aspects of a digital music system (e.g., permit or deny recording, allow copying a certain number of times). Other notable uses of digital watermarks include their ability to establish authorship or ownership, define usage rights and copyright control, and verify the integrity of the content.
In September of 2000, SDMI invited the public to test the attack resistance of its watermarking technology. While a discussion of the results of these tests is beyond the scope of this application, many vulnerabilities of specific watermarking technologies were demonstrated during the SDMI challenge. This eventually led to the abandonment of SDMI's program.
In general, digital watermarks can be either robust or fragile depending on their design. A robust watermark is intended to survive common attacks by securely carrying embedded information, while a fragile watermark is intended to indicate whether the audio signal has been changed due to certain processing methods, including compression, filtering, as well as some types of attack. In the past, watermarks, such as those developed by SDMI, have predominately been used to carry information about access rights to a multimedia file by the user. However, they can also be used to transport information about a user-initiated multimedia file purchase transaction.
Transactional watermarking may thus be described as the process of digital watermarking of each copy of multimedia content with a unique watermark to allow identification of the specific transaction, which may include information related to the purchase and/or download of the multimedia content itself. This type of watermarking introduces additional technological hurdles as compared to general watermarking, as information about a transaction is only available at the time of the transaction. Therefore, embedding must be performed in realtime (i.e., at the time of the transaction).
Conventionally, transactional watermarking has yielded functional, but less than ideal results. For example, FIG. 1a shows a high-level view of conventional watermarking system 10 for watermarking of digital audio. Digital audio may be provided as input to watermark embedder 12 as well as to perceptual analyzer 14. Concurrently, the message to be embedded in the digital audio file may be provided to watermark generator 16, which converts the message to binary code (i.e., watermark) for embedding. Watermark embedder 12 performs the function of embedding this watermark in the audio signal of the digital audio file, but does so while ensuring that the watermark is below the threshold of audibility. To accomplish this, perceptual analyzer 14 measures the amount of masking energy present and modulates the strength of the watermark to be embedded. Watermark embedder 12 may employ any number of known principles of watermarking, however, a spread spectrum embedder generally provides higher quality results. The resulting watermarked audio signal is then encoded (i.e., compressed) by digital audio encoder 18 (e.g., MP3, AAC, WindowsMediaAudio (WMA), or RealAudio (RA) encoder), resulting in a watermarked compressed digital audio file of corresponding format.
The primary limitations of this system are that it is not practical for realtime distribution on a large scale and the embedded watermark may produce audible interference. For example, if employed as an online music store application, conventional watermarking system 10 may result in a system of very high complexity that would be highly inefficient when serving a large number of music files simultaneously. As discussed above, due to size considerations, online music/media stores generally distribute compressed media files to promote efficient transmission (as well as efficient storage), which may be performed after a media file has been watermarked. However, since transaction information is only known at the time of the transaction, such a system requires realtime watermarking and realtime encoding of thousands, and possibly tens of thousands, of music streams being requested at any given time. Further, while perceptual analyzer 14 may attempt to mask the audible energy of the watermark by the native audio, audible interference may still result. In addition to less than ideal audio quality, such a system can be excessively complex, expensive, and delay sensitive for an online store application.
An example of a conventional approach intended to remedy the limitations of conventional watermarking system 10 is shown at FIG. 1b. Alternate conventional watermarking system 20 of FIG. 1b operates in compressed (bitstream) domain 22. In this system, digital music/audio files may first be encoded by a digital audio encoder 24 similar to that discussed earlier. The resulting compressed audio streams are then stored in compressed media server 26. When specific music content is requested, the corresponding stream is retrieved from compressed media server 26, and partially decoded in partial digital audio decoder 28 to prepare it for embedding of a watermark. The message provided as input to watermark generator 30 is converted to binary code (representing a watermark) and then provided to quantizer scale factors sequence changes mapper 32.
Quantizer scale factors sequence changes mapper 32 alters aspects of the audio encoding that, for example, can slightly change the quantization scale factors of the audio signal of the digital audio file to mimic a sequence of binary digits in order to form a watermark. The selected quantization scale factors may then need to be re-applied on transform coefficients and the changed scale factors, as well as resulting coefficients, may need to be re-encoded in partial digital audio re-encoder 34. The resulting compressed stream is then output, and carries a hidden watermark that may be extracted by a watermark extractor by correctly interpreting embedded variations in quantization scale factors.
The primary limitations of alternate conventional watermarking system 20 are that it is not highly robust, and the audio quality concerns of conventional watermarking system 10 were not addressed. Due to the fact that it operates entirely in the compressed bitstream domain, the coding parameters (e.g., quantization scale factors) of the resulting watermarked compressed digital audio file are rather easy to modify, rendering the watermark useless. A secondary limitation arises from the practical need for fast processing. While alternate conventional watermarking system 20 is more efficient than conventional watermarking system 10, as it operates in compressed domain, the amount of possible realtime processing is still limited. Moreover, and similar to conventional watermarking system 10, audio quality still relies upon the ability of quantizer scale factors sequence changes mapper 32 to mask the embedded watermark's audible energy.
Overall, at the present time, no single commercial watermarking solution exists that can efficiently, securely, in large numbers, and in realtime, address the problem of recording transactional watermarks into multimedia content with essentially no audible change to the native audio.