Scalable video coding imparts flexibility and adaptability to multimedia content so that it can be suited to many different applications and environments. Scalable multimedia formats include, among others, several MPEG stream formats and “scalable media adaptation and robust transport” (SMART and SMART++). (See, e.g., <http://research.microsoft.com/im/>. Microsoft Corporation, Redmond, Washington; Feng Wu, Shipeng Li, Ya-Qin Zhang, “A framework for efficient progressive fine granular scalable video coding”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 332-344, 2001; Xiaoyan Sun, Feng Wu, Shipeng Li, Wen Gao, Ya-Qin Zhang, “Macroblock-based temporal-SNR progressive fine granularity scalable video coding”, IEEE International Conference on Image Processing (ICIP), pp. 1025-1028, Thessaloniki, Greece, October, 2001; and Yuwen He, Feng Wu, Shipeng Li, Yuzhuo Zhong, Shiqiang Yang, “H.26L-based fine granularity scalable video coding”, ISCAS 2002, vol. 4, pp. 548-551, Phoenix, USA, May 2002).
MPEG-4 has adopted profiles to provide scalability to different devices. One such MPEG-4 profile is called the “Simple” profile. “Simple” provides a video base layer upon which other profiles, such as “Simple Scalable” can add an enhancement layer to provide the scalability. Another profile called “Advanced Simple” performs rectangular video object coding by adding some coding tools to the “Simple” profile. This “Advanced Simple” profile, while having the latest MPEG-4 coding efficiency tools, provides no scalability. Yet another MPEG-4 profile that provides a scalable video coding scheme by adding a fine-grained enhancement layer to the “Advanced Simple” base layer is called “Fine Granularity Scalability (FGS).” FGS has become a standard.
The base layer used in some of these scalable MPEG-4 profiles is encoded in a non-scalable manner at the lowest bound of bitrates. The FGS profile includes an enhancement layer predicted from the low bitrate base layer that strives to achieve the optimal video quality for each bitrate in a spectrum of bitrates possible with the same enhancement layer. The enhancement layer is encoded in a scalable manner: the discrete cosine transformation (DCT) coefficients of a frame's residue II are compressed bit-plane wise from the most significant bit to the least significant bit. A video is compressed by MPEG-4 FGS only once. When it is transmitted over a network, a server can discard the enhancement layer data associated with the least significant bit(s) should the transmitting network lack required bandwidth. Other rate shaping operations can also carried out on the compressed enhancement layer data directly without resorting to either compression or decompression. The above-described FGS profile will be referred to herein as MPEG-4 FGS, or just “FGS.”
Multimedia encryption techniques for multimedia streams conforming to the various scalable MPEG-4 profiles ideally have the same features that are desirable in non-scalable codecs, namely high security, low complexity, low compression overhead, error resilience, rate shaping adaptability, and random play ability. Some of these will now be discussed in greater detail.
Security is an essential requirement for multimedia encryption. Compared with other types of encryption for more critical applications, such as military and banking processes, multimedia encryption has its own particular issues, including the relatively large amount of (video) data to be encrypted and the characteristically low value of the data, compared with military and banking data.
Low complexity is a feature because any encryption or decryption process adds processing overhead. Since a multimedia stream has a relatively vast amount of data, it is desirable or mandatory in many applications that the complexity of an encryption system be very low, especially during decryption, since many applications require real time decryption of the vast amount of multimedia data, and usually on a user's limited equipment.
Compression overhead is also a feature since encryption inevitably affects compression efficiency by reducing the compression algorithm's coding efficiency or by adding bytes to the already compressed file. Thus, the compression overhead is ideally minimized for multimedia encryption algorithms.
Error resilience is important for encryption because faults occur during multimedia storage and transmission. Wireless networks are notorious for transmission errors. Data packets may be lost in transmission due to congestion, buffer overflow, and other network imperfections. Encryption schemes should ideally be resilient to bit error and package losses. They should also allow quick recovery from bit errors and fast resynchronization from package losses to prevent extensive error propagation. Many multimedia encryption algorithms, typically designed under perfect transmission environments, propagate great perceptual degradation when bit errors or package losses occur during transmission.
Rate shaping describes the ability to vary the transmission bitrate (number of bits in one second of a stream) to suit various conditions. During multimedia stream delivery from the content owner to the user, many middle stages typically process the data. Transcoding, for example, may change the bitrate to adapt to transmission bandwidth fluctuation or even application requirements. If the data is encrypted, these middle stages typically must call for encryption and decryption keys and then execute cycles of decryption and encryption in order to process the data. This increases processing overhead and reduces security since encryption secrets have to be shared with these middle stages.
Associated with the security feature discussed above, it is often desirable to encrypt a multimedia stream for digital rights management (DRM). Such encryption can help a content owner enforce copyright and other property rights, such as licensing of multimedia content.
Many algorithms have been proposed to encrypt non-scalable video. The most straightforward method is the naive algorithm which applies standard encryption schemes such as DES on the compressed stream in the same way as if it were text data (I. Agi and L. Gong, “An Empirical Study of Secure MPEG Video Transmissions,” Proc. Symp. Network & Distributed System Security, 1996, pp. 137-144). A naive algorithm usually has a large processing overhead due to the large amount of video data to be processed. It has minimal error resilience, and does not allow rate shaping on the ciphertext directly.
Another method is the selective algorithm which exploits the video stream structure and encrypts only part of the compressed video data (T. B. Maples and G. A. Spanos, “Performance Study of a Selective Encryption Scheme for the Security of Networked, Real-time Video,” Proc. 4th Int. Conf. Computer Communications & Networks, 1995; and J. Meyer and F. Gadegast, “Security Mechanisms for Multimedia Data with the Example MPEG-1 Video,” http://www.gadegast.de/frank/doc/secmeng.pdf, 1995). The partial data to be encrypted can be I-frames, I-frames plus all I-blocks in P and B frames, or the DC coefficients and lower AC terms of the I-blocks. Encrypting only I-frames does not provide sufficient security due to exposed I-blocks in the P and B frames as well as interframe correlation (see the Agi and Gong reference, above).
A method to reduce the data to be encrypted is proposed in (L. Qiao and K. Nahrstedt, “Comparison of MPEG Encryption Algorithms,” Int. J. Computers & Graphics, Special Issue: “Data Security in Image Communication and Network”, vol. 22, no. 3, 1998) which encrypts only the bytes at even indexes for a chunk of an I-frame, and the rest is replaced by the result of XORing the odd-indexed subsequence with the even-indexed subsequence.
Another selective algorithm pseudo-randomly changes the sign bits of all DCT coefficients (C. Shi and B. Bhargava, “A Fast MPEG Video Encryption Algorithm,” Proc. of ACM Multimedia'98, 1998, pp. 81-88) or the sign bits of differential values of DC coefficients of I blocks and sign bits of differential values of motion vectors (C. Shi and B. Bhargava, “An Efficient MPEG Video Encryption Algorithm,” IEEE Proc. 17th Symp. Reliable Distributed Systems, 1998, pp. 381-386).
A third approach is the scrambling algorithm which scrambles some compression parameters or shuffles codewords to prevent unauthorized users from correct decompression. A simple scheme uses a random permutation instead of the normal zigzag order to map a 2D block to a 1D vector (L. Tang, “Methods for Encrypting and Decrypting MPEG Video Data Efficiently,” Proc. ACM Multimedia'96, 1996, pp. 219-230). Motion vectors and selected DCT coefficients can be shuffled before entropy coding (W. Zeng and S. Lei, “Efficient Frequency Domain Video Scrambling for Content Access Control,” Proc. ACM Multimedia'99, 1999, pp. 285-294; and “Efficient Frequency Domain Selective Scrambling of Digital Video,” a preprint to appear in IEEE Trans. Multimedia).
VLC (variable length coding) codes can also be shuffled in a format-compliant way (J. Wen, M. Severa, W. Zeng, M. H. Luttrell, and W. Jin, “A Format-compliant Configurable Encryption Framework for Access Control of Video,” IEEE Trans. Circuits & Systems for Video Technology, vol. 12, no. 6, 2002, pp. 545-557). These schemes change the data's statistical properties, and thus lower compression efficiency. A scheme that performs without incurring the bit overhead has also been proposed. (W. Zeng, J. Wen, and M. Severa, “Fast Self-synchronous Content Scrambling by Spatially Shuffling Codewords of Compressed Bitstreams,” IEEE Int. Conf. Image Processing, 2002, vol. 3, pp. 169-172) which spatially shuffles codewords of the compression bitstream.
While some of the schemes mentioned above and others, for example most scrambling algorithms, are equally applicable to MPEG-4 FGS, most encryption schemes developed for non-scalable multimedia, when applied to a scalable multimedia stream, often diminish or destroy the scalability features. To apply a scalable manipulation to such an encrypted scalable stream, intermediate transmission stages often have to decrypt the stream, apply rate shaping, such as transcoding and/or bit reduction, and then re-encrypt the stream. Thus, a technique for encrypting scalable multimedia streams while maintaining the scalability features is needed.
Schemes specifically designed for scalable formats have been reported recently. Wee et al., propose a secure scalable streaming (SSS) scheme that enables transcoding without decryption. (S. J. Wee and J. G. Apostolopoulos, “Secure Scalable Streaming Enabling Transcoding Without Decryption,” IEEE Int. Conf. Image Processing, 2001, vol. 1, pp. 437-440.) For MPEG-4 FGS, the approach encrypts video data in both base and enhancement layers except header data. Hints for rate distortion optimal (RD-optimal) cutoff points have to be inserted into the unencrypted header for an intermediate stage to perform RD-optimal bitrate reduction. Encryption granularity depends on the way a video stream is packetized. More precisely, encryption is applied to each packet. This means that when encryption is applied in SSS, the packet size has to be known. No modification on the packet size is allowed after encryption is done in SSS. In real applications, a packet size designed for one type of transmission channel may not be appropriate for another. For example, the video packet size for wireless transmission has to be small since the channel is error-prone. For Internet transmission, a video packet should be large for efficient transmission since the error rate for the Internet is very small. Any intermediate stage that wants to change the packet size to best fit into the transmission channel needs to resort to the decryption/re-encryption cycle in SSS.
Grosbois et al., propose a scalable authentication and access control scheme for the image compression standard of JPEG 2000. (Raphael Grosbois, Pierre Gergelot, and Touradj Ebrahimi, “Authentication and Access Control in the JPEG 2000 compressed domain,” Proc. of SPIE 46th Annual Meeting, Applications of Digital Image Processing XXIV, San Diego, 2001.) It is based on modification and insertion of information in the bit steam. A keyed hash value is used to generate a pseudo-random sequence that is used to pseudo-randomly invert the signs of high-frequency band wavelet coefficients. Layered access structure allows adaptation to different applications. One of its major drawbacks is the insertion of extra information to aid decryption, which reduces compression efficiency.