One of the enablers for online and mobile music has been digital rights management (DRM). It provides the means for protecting the content ownership and copyrights by restricting unauthorized distribution and usage. However, traditional DRM solutions have proved controversial. Different techniques were tried for preventing the copying of audio CDs, but they caused compatibility problems with so many players that DRM is no longer used in audio CD distribution. In mobile music, there are separate groups of music player manufacturers and online music retailers using different DRM techniques, which are not interoperable. This is not an ideal situation from the consumer perspective, because DRM-protected music purchased from an online music store may be playable in digital audio players of only one manufacturer.
The dominant digital music format is currently MPEG-1 Audio Layer 3 (Motion Picture Experts Group), more commonly known as MP3. It is also the de facto standard encoding of music played on digital audio players. The problem with MP3 regarding mobile music distribution is that it does not support copy protection. This has caused online music retailers to use other DRM-enabled proprietary audio formats. The aim is to make using the music files difficult in ways not specified and allowed by the record companies. Most of the current encryption-based solutions can be circumvented with burning the music to CD and then ripping it back into some unprotected format such as MP3.
Digital watermarking can be used for creating a solution for the rights management problem of digital audio. The nature of watermarking allows the audio to be unencrypted because the content protection is embedded into the audio signal itself. The use of an unprotected file format enables the music to be played on any digital audio player, and the music can also easily be burned to CD as well. This eliminates many of the attacks used on other DRM systems and allows better consumer satisfaction because of wider usability. The problem is, however, that digital watermarks can be vulnerable to signal processing attacks. The watermarked signal can be modified so that the modification is inaudible for a human listener, but the watermark signal may be destroyed in the process. This is a major challenge for all watermarking applications.
System enforcing rights model is called a DRM system 10. One example is depicted in FIG. 1. Although the DRM system architecture depends heavily on the specific usage scenario, there are some common components, which are found on most of the systems. This common theme is called DRM reference architecture. It consists of three major components: the content server 11, the license server 12 and the client 13.
The content server 11 includes a content database 111 for all content files, and the functionality 113 to prepare content for DRM-controlled distribution. In addition to the content itself, the database stores metadata information 112 about the content, such as title, author, format and price. For end users, the content server 11 allows access to the DRM-enabled content downloads.
The content files are usually manipulated in some way in order to prepare them for controlled distribution when they are imported into the content repository 111. This is done by the content packager component of the content server. All files which are brought into the system by the content providers are first processed by the content packager 113 and then placed into the content database for storing. Another important task of the content packager 113 is the specification of rights the content provider wants to allow for the user. Separate rights can be specified for previewing purposes, and several purchasing options can be offered to the user. The content packager 113 can be for example a web interface running on top of the server providing database access for the content providers.
An essential feature of the content packager is batch processing. As content providers generally add plenty of content in a single session, it must be possible to input multiple files with customizable rights models into the system.
The license server 12 in a typical DRM system 10 creates licenses by a license generator 123 for each user from content rights 121, user identities 124 and content encryption keys 122. The rights 121 and possible encryption keys 122 are provided by the content server, and the client provides information about the user identity. As the communications path between the license server and the client is usually insecure, the data transmissions must be protected with public-key cryptography.
In addition to generating and transmitting licenses to the client, the license server 12 is responsible for the financial transaction of the licensing process. The license server uses the identity of the user to fetch the necessary details concerning the transaction, such as credit card or account details. The identity of the user can be created from a username, social security number, or any other piece of information which accurately identifies the user.
The DRM client side application 13 can reside in a variety of platforms. The primary functionality of the client 13 is contained in a DRM controller 131, which can either be an independent piece of software or it can be integrated into the content rendering application itself. The main functions of the DRM controller are to gather identity information 132 from the user, obtain licenses 135 comprising user rights and encryption keys from the license server 12, authorize the rendering application 133 to have access to the content package 134 comprising the content and metadata and perform the possible content decryption. Additionally, the controller delivers the user's commands to the license server for requesting licenses and checking the payment options. The DRM controller must support public-key cryptography for secure data transmission between the client 13 and the license server 12.
The usage authorization scenarios depend on the used rights models of the content. The basic model authorizes the user to have access to the content 134 as many times as possible for a single fee. Other models may give or restrict access to the content temporarily regarding the selected payment options. Another possibility is to restrict the number of renderings with a counter-based solution. Securing the usage counter in the client device remains an implementation problem, especially in cases when the user is not required to be online when accessing the content. Trusted computing and hash-based solutions have been proposed for secure storing of the usage counter.
The most important player in Mobile DRM industry is the Open Mobile Alliance (OMA), which is a standards body developing open standards for the mobile phone industry.
OMA DRM 1.0 was the first industry standard method for protecting mobile content. It was approved in 2004, and it is currently supported in most of the mobile phones in the market. The goal of OMA DRM 1.0 is to follow common DRM practices with conforming to special requirements and characteristics of the mobile domain, while providing basic functionality with some level of security. Version 1.0 provides three methods for content protection and delivery: forward-lock, combined delivery and separate delivery.
In the first DRM revision OMA focused on the fundamental building blocks for a DRM system. The new OMA DRM 2.0 addresses the security issues with new features based on the separate delivery method.
The OMA DRM 2.0 security model relies heavily on the DRM agent of the user device. The content itself is packaged in a similar secure container encrypted with a symmetrical content encryption key, but in addition it utilizes PKI (Public Key Infrastructure) certificates for increased security. Every device with OMA DRM 2.0 support has an individual PKI certificate with a public and a private key. Every rights object is then encrypted with the public key of the receiver before it is sent over the network. The rights object contains the symmetrical key that is used to decrypt the actual content files.
Digital watermarking is a process where information is embedded into a digital host signal, which can be for example a video, an audio, or an image. The watermark can be detectable or non-detectable depending on the application. The idea of using audible removable watermark to protect audio content was presented in M. Löytynoja, N. Cvejic, and T. Seppänen, “Audio scrambling using removable watermarking”, Sixth International Conference on Information, Communications and Signal Processing (ICICS 2007), Singapore, 10-13 Dec. 2007.
Digital watermarks have three important characteristics that are determined by the type of application: capacity, robustness and imperceptibility. Capacity is the amount of data that can be embedded in the watermark, robustness is the ability of the watermark to resist modifications to the host signal, and imperceptibility means that the watermark cannot be detected from the host signal with human senses. These characteristics are partially exclusionary, which means that other areas can be emphasized while deteriorating others.
Watermarks can be embedded in audio in time domain or some transform domain, such as the Fourier domain. The selection of domain affects the properties of the watermark concerning imperceptibility and robustness. Frequency domain watermarks are generally considered more inaudible, but they are especially vulnerable against frequency modifications such as pitch shifting or dynamic compression. Time domain watermarking techniques generally use spread spectrum based watermarking. Other domains used for audio watermarking are wavelet domain and cepstrum domain, which is basically the Fourier transform of the decibel spectrum of the signal.
Spread spectrum watermarking means that the power of the watermark information is deliberately spread wider in the frequency domain in order to hide the signal more efficiently in the cover signal. Two types of spread spectrum methods are generally used in digital watermarking: frequency hopping and direct sequence spread spectrum methods. The frequency hopping method is based on fast switching of the carrier frequency according to a pseudorandom sequence, which must be known both in the embedding and extraction phases. The direct sequence method spreads the watermark signal into a wider band signal, also created from a pseudorandom sequence.
In direct sequence spread spectrum watermarking, the watermark signal constructed from pseudorandom sequences can be added to the cover signal by simply adding or subtracting the samples. As the pseudorandom sequence is generally much shorter than the host signal, the sequence is repeated for every block of the host signal. One possible method is to add the pseudorandom signal to the block if the bit to be embedded is one, and subtract if the bit is zero. This kind of approach keeps the computational complexity of the embedding algorithm very low for facilitating real-time usage.
An important usage for direct sequence spread spectrum methods in audio watermarking is synchronization. It is a procedure for determining the exact location of the watermark in the extraction process. The synchronization can be performed either by inserting the synchronization signal once to the beginning of the block sequence or to the beginning of each block.
The synchronization signal is usually a similar pseudorandom spread spectrum signal as in the direct sequence methods, except that the synchronization signal can be much longer. In the extraction process, the synchronization point is calculated by calculating the cross-correlation of the original synchronization signal and the watermarked signal. Separate synchronization signals must be used if the watermark is embedded with the frequency hopping method.
The frequency hopping method is very different by nature than the direct sequence method. Instead of being a wide band signal, the frequency hopping watermark is present at very narrow bands at any given time. The frequency of the signal changes rapidly over time according to a pre-defined pseudorandom sequence. The frequency hopping band defines limits for the hopping sequence. The pseudorandom sequence defining the frequency hopping sequence can be used as the watermark key for securing the exact location of the watermark signal in the frequency coefficients.
An example of the frequency hopping method is presented in FIG. 2. It divides the host audio into blocks of 1024 FFT coefficients and selects two coefficients according to the pseudorandom frequency hopping sequence. The method changes the values of these coefficients to the sub-band mean, which is calculated from the coefficients around the two coefficients. If bit “one” is embedded, the lower coefficient magnitude 21 is set K decibels higher and the higher coefficient 22 is set K decibels lower. If bit “zero” is embedded, the procedure is the opposite. The watermark strength is directly determined by the used K value. Therefore, K cannot be higher than the distance from the sub-band mean value to the frequency masking threshold in order for the watermark to remain below the JND level (Just Noticeable Difference).