Digital rights management (DRM) for multimedia has become a popular way to protect the intellectual property of media content owners. The MPEG-4 Fine Granularity Scalability (FGS) video coding standard, for example, enables straightforward and flexible adaptation of one multimedia stream to different transmission and application needs. DRM plays an important role in protecting copyrighted multimedia items such as music and movies. There is a growing demand for DRM services in the market. DRM has already been implemented in MICROSOFT® products, such as WINDOWS MEDIA™ format and EBOOKS™ (Microsoft Corporation, Redmond, Wash., U.S.) But new DRM schemes are needed that are optimized for new scalable multimedia formats. These new multimedia formats and DRMs enable the growth of new business and service models.
Scalable video coding has gained wide acceptance due to its flexibility and easy adaptation to a wide range of application requirements and environments. “Scalable media adaptation and robust transport” (SMART and SMART++) is an example of a scalable multimedia scheme (see, e.g., <http://research.microsoft.com/im/>. Microsoft Corporation, Redmond, Wash.; Feng Wu, Shipeng Li, Ya-Qin Zhang, “A framework for efficient progressive fine granular scalable video coding”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 332-344, 2001; Xiaoyan Sun, Feng Wu, Shipeng Li, Wen Gao, Ya-Qin Zhang, “Macroblock-based temporal-SNR progressive fine granularity scalable video coding”, IEEE International Conference on Image Processing (ICIP), pp. 1025-1028, Thessaloniki, Greece, October, 2001; and Yuwen He, Feng Wu, Shipeng Li, Yuzhuo Zhong, Shiqiang Yang, “H.26L-based fine granularity scalable video coding”, ISCAS 2002, vol. 4, pp. 548-551, Phoenix, USA, May 2002).
In the MPEG-4 FGS scalable multimedia profile, a video stream is divided into two layers, the base layer and the enhancement layer. The base layer is a non-scalable coding of the video at a low bitrate, e.g., the lowest bitrate used in an application. The residue of each frame is encoded in the enhancement layer in a scalable manner: the discrete cosine transformation (DCT) coefficients of a frame's residue are compressed bit-plane wise from the most significant bit to the least significant bit. A video is compressed by MPEG-4 FGS only once. When it is transmitted over a network, a server can discard the enhancement layer data associated with the least significant bit(s) should the transmitting network lack required bandwidth. Other rate shaping operations can also be carried out on the FGS compressed data directly without resorting to either compression or decompression.
A multimedia encryption algorithm, whether for scalable or non-scalable codecs, ideally has these features: high security, low complexity, low compression overhead, error resilience, rate shaping adaptability, and random play ability. Security is an essential requirement for multimedia encryption. Compared to other types of encryption for more critical military and banking applications, multimedia encryption has its own particular issues, including the relative vastness of the video data to be encrypted and the usually low value of the information encrypted, compared with information encrypted for military and banking applications, for example.
Low complexity is an issue because any encryption or decryption process adds processing overhead. Since a multimedia stream has a relatively vast amount of data, it is desirable or mandatory in many applications that the complexity of an encryption system be very low, especially during decryption, since many applications require realtime decryption of the vast amounts of multimedia data, and usually on a user's equipment that has limited resources.
Compression overhead is also an issue since encryption inevitably affects compression efficiency by either reducing the compression algorithm's coding efficiency (directly) or by adding bytes to the already compressed file. Thus, the compression overhead is ideally minimized for multimedia encryption algorithms.
Error resilience is important because faults occur during multimedia storage and transmission. Wireless networks are notorious for transmission errors. Data packets may be lost in transmission due to congestion, buffer overflow, and other network imperfections. Encryption schemes are ideally resilient to bit error and package losses. They should also allow quick recovery from bit errors and fast resynchronization from package losses to prevent extensive error propagation. Multimedia encryption algorithms, typically designed under perfect transmission environments (most), propagate great perceptual degradation over time when bit errors or package losses occur during multimedia transmission.
Rate shaping describes the ability to vary the transmission bitrate (number of bits in one second of a stream) to suit various conditions. During multimedia stream delivery from the content owner to the user, many middle stages typically process the data. Transcoding, for example, may change the bitrate to adapt to transmission bandwidth fluctuation or even application requirements. If the data is encrypted, these middle stages typically must call for encryption and decryption keys and then execute cycles of encryption and decryption in order to process the data. This increases processing overhead and reduces security since encryption secrets have to be shared with these middle stages.
Users are accustomed to playing audio and video multimedia in fast forward, reverse, and with random access. An ideal DRM system should not deprive users of these options. This means an encryption algorithm used in DRM should be able to handle random play within chain-encrypted data or, in the case that data is not chain-encrypted for the sake of random access, be able to avoid security vulnerability from “dictionary” and other attacks on the encryption.
While there are many encryption algorithms proposed for non-scalable multimedia formats, a few are designed specifically for scalable multimedia formats. Wee, et al., propose a secure scalable streaming (SSS) scheme that enables transcoding without decryption. (S. J. Wee and J. G. Apostolopoulos, “Secure Scalable Streaming Enabling Transcoding Without Decryption,” IEEE Int. Conf. Image Processing, 2001, vol. 1, pp. 437-440.) For MPEG-4 FGS, the approach encrypts video data in both base and enhancement layers except header data. Hints for RD-optimal (rate distortion-optimal) cutoff points have to be inserted into the unencrypted header for a middle stage to perform RD-optimal bitrate reduction. Encryption granularity depends on the way a video stream is packetized. More precisely, encryption is applied to each packet. No modification on the packet size is allowed after encryption is done. SSS protects scalable media as a single access layer.
Grosbois et al. propose a scalable authentication and access control scheme for the image compression standard of JPEG 2000. (Raphael Grosbois, Pierre Gergelot, and Touradj Ebrahimi, “Authentication and Access Control in the JPEG 2000 compressed domain,” Proc. of SPIE 46th Annual Meeting, Applications of Digital Image Processing XXIV, San Diego, 2001.) It is based on modification and insertion of information in the bit steam. A keyed hash value is used to generate a pseudo-random sequence that is used to pseudo-randomly invert the signs of high-frequency band wavelet coefficients. Layered access structure allows adaptation to different applications. One of its major drawbacks is the insertion of extra information to aid decryption, which reduces compression efficiency.