Multimedia is often encrypted to prevent unauthorized consumption. Typical protection encrypts multimedia data and restricts access to the decryption key(s) to only authorized users. This approach is widely used in multimedia Digital Rights Management (DRM) which provides persistent protection for content from creation to consumption.
However, a secure encryption may create encrypted data that is not compliant with the syntax standard being applied to the unencrypted data. For example, a JPEG 2000 image is expected to conform to JPEG 2000 formatting, but when encrypted, the JPEG 2000 image may become “out-of-spec” or noncompliant with the JPEG 2000 syntax. This means that the encrypted form of the data cannot be handled in the same manner as the original, unencrypted data. A good cipher applied to multimedia data may produce “random” ciphertext that may emulate or accidentally create marker bits that the original syntax is carefully designed to avoid.
Data in a multimedia bitstream is often organized into structural groups, referred to as packets. A packet consists of header fields and data fields. Each packet starts with a unique marker to indicate the start of a packet, and may end with another unique marker. Boundaries between each field may also be indicated by delimiting markers.
Markers in a multimedia bitstream are a set of special binary strings that are defined and reserved by the multimedia format. Different formats use different markers. To facilitate easy identification of each packet and each individual field in a packet, data in each multimedia format is coded with a carefully designed coding schema to avoid emulation of any markers in the data fields. Otherwise misidentification of part of the data payload for a marker may result in the bitstream being parsed incorrectly and generating an incorrect result.
For example, in the JPEG 2000 image coding standard, certain compressed bitstreams do not allow any values in the range of hexadecimal 0xFF90 through 0xFFFF (decimal 65,424 through 65,535) for any two consecutive bytes of coded data, or the value of hexadecimal 0xFF (decimal 255) as the last byte (see, Information Technology—JPEG 2000 Image Coding System, Part 1: Core Coding System, ISO/IEC 15444-1:2000). JPEG 2000 also allows an optional arithmetic coding bypass in which raw bits are output to the bitstream without arithmetic coding. In this arithmetic coding bypass mode, data is not allowed to have a byte of value hexadecimal 0xFF as the last byte or to be followed by a binary 1 as the most significant bit (MSB) of the next byte.
Another example of values that are reserved for the boundary markers and other syntax functions of a multimedia format is the MPEG-4 Fine Granularity Scalability (FGS) video coding standard (see, for example, W. Li, “Overview of Fine Granularity Scalability in MPEG-4 Video Standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 301-317, March 2001; or, “MPEG-4 Video Verification Model Version 17.0,” ISO/IEC JTC1/SC291WG11 N3515, Beijing, July 2000). In the MPEG-4 FGS model, compressed bit-plane data in the enhancement layer is grouped into packets separated by a bit-plane start code denoted as “fgs_bp_start_code” or, if a flag “fgs_resync_marker_disable” is set to zero, then a resynchronization marker denoted as “fgs_resync_marker” is used. Both markers are byte-aligned, i.e., start at a byte boundary. The marker fgs_bp_start_code starts with 23 bits of 0's followed by hexadecimal 0xA (decimal 10) plus another five bits to indicate which bit-plane the data belongs to. The marker fgs_resync_marker is 22 bits of 0's followed by bit 1. Therefore compressed bit-plane data in a packet does not allow byte-aligned 22 consecutive binary zeros.
To avoid this problem of the encryption technique accidentally forming ciphertext that includes forbidden/reserved characters, a typical method adds additional information to unencrypted header fields of a packet such as length of the ciphertext or the number of occurrences of marker emulation in the data field to ensure correct decryption and decoding of encrypted multimedia content (see, C. Yuan, B. B. Zhu, M. Su, X. Wang, S. Li, and Y. Zhong, “Layered Access Control for MPEG-4 FGS Video,” IEEE Int. Conf. Image Processing, Barcelona, Spain, vol. 1, pp. 517-520, September 2003). The resulting bitstream, however, may not be syntax compliant. This syntax noncompliant approach has several drawbacks. First, the encrypted bitstream may not be backward compatible. Adding header fields to a packet may lead a compliant but encryption-unaware decoder to parse a packet incorrectly and produce undesirable results. This type of encryption may also impair fast random access of encrypted multimedia, a very desirable feature, especially when playing long audiovisual content. The syntax noncompliant approach may also cause incorrect parsing and false synchronization when error or data loss occurs, resulting in deteriorated error resilience, and extended blackouts.
In many applications, it is very desirable that encrypted multimedia still be syntax compliant so that spurious markers do not appear in ciphertext. This is actually a goal pursued by many researchers in multimedia encryption and protection arts. For example, in developing JPSEC, a security approach to protect JPEG 2000 codestreams, i.e., Part 8, of the JPEG 2000 standard, experts in the standardization committee have tried in the past few years to develop encryption schemata that meet the strict syntax compliance requirement of JPEG 2000, but for encrypted bitstreams. In addition to multimedia encryption such as JPEG 2000 and MPEG-4 FGS encryption, syntax compliant encryption may also find applications in encrypting other structural data such as extensible markup language (XML) data, Internet packet data, etc.
In the case of JPEG 2000, several encryption schemata have been proposed recently to generate syntax compliant ciphertext for JPSEC. One schema, proposed by H. Wu and D. Ma, “Efficient and Secure Encryption Schemes for JPEG2000,” IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2004 (ICASSP '04), vol. 5, pp. V869-872, May 2004, referred to herein as a partial encryption schema, encrypts a JPEG 2000 data stream in the following manner. A keystream is first generated with a stream cipher, and any byte of value 0xFF is removed from the keystream. If a byte in the plain stream is of value 0xFF, both this byte and the following byte are copied to the ciphertext without encryption. The remaining bytes are encrypted byte by byte in a modular addition, that is, for each byte mi in the remaining plaintext and a corresponding byte si from the keystream, the output byte to the ciphertext is ci=(mi+si) mod 0xFF. This partial encryption schema leaves some portions of plaintext unencrypted. More alarmingly, locations and lengths of unencrypted substreams are publicly known. Further, this approach is syntax specific and difficult to extend to a general syntax specification.
FIG. 1 shows another conventional approach for keeping an encrypted data stream syntax compliant: globally iterative encryption 100 (see, Y. Wu and R. H. Deng, “Compliant Encryption of JPEG2000 Codestreams,” IEEE. Int. Conf. on Image Processing 2004 (ICIP'04), Singapore, October 2004). Globally iterative encryption 100 works with both block and stream ciphers, and is applicable for a general syntax specification. A block cipher can be used directly with the iterative encryption schema. A stream cipher needs to be modified to work with this schema since a typical stream cipher encrypts plaintext by XORing (the bitwise exclusive-or operation) the stream with a keystream generated from a secure generator. Applying the same keystream in the second round of encryption produces the plaintext. For a stream cipher to work with the iterative encryption, a slower modular addition is used in place of the XOR operation. If the (plaintext) data to be encrypted, M 102, has n bits, and S is a keystream of the same length generated by a secure sequence generator, modular addition produces an output C=M+S mod 2n as the ciphertext 104. Decryption of such encrypted ciphertext is M=C−S mod 2n. For a stream cipher of modular addition, if encryption is iteratively applied 106 to plaintext M 102 multiple times r, until the intermediate result of an encryption iteration meets a syntax specification 108, then the output compliant ciphertext can be expressed asC=M+rS mod 2n,  (1)where S is the keystream.
The globally iterative encryption schema 100 has a few drawbacks. One drawback is the computational complexity. Globally iterative encryption to produce a compliant ciphertext has much higher complexity than conventional block or stream cipher encryption, especially when the plaintext is long or the probability that an illegal substream appears is high.
FIG. 2 shows some disadvantages of the conventional globally iterative encryption schema. A first graph 200 shows the number of iterations to produce a syntax compliant ciphertext versus the length of the data stream to be encrypted. The complexity of the conventional globally iterative encryption schema 100 increases exponentially with the length of ciphertext, as encryption or decryption of a portion of data in this schema is dependent on the entire data-since the ending condition for globally iterative encryption 100 (or decryption) is that the whole data is syntax compliant. This is very different from conventional stream and block cipher encryption in which encryption or decryption of a current block depends only on the current and previous data. A second graph 202 shows that the speed of the globally iterative encryption schema 100 decreases in relation to the length of the data stream to be encrypted, in fact decreases exponentially as the length of the data stream to be encrypted increases.
The global dependency inherent in a globally iterative encryption schema 100 does not allow any truncation of the entire cipherstream 104 (the entire encrypted data). If the encrypted data is not completely received at a deciphering agent, then all of the received cipertext 104 has to be discarded since decryption is useless as it may stop at a wrong iteration of decryption. This is very undesirable for encryption of scalable multimedia streams. An encrypted scalable stream should still be able to be truncated directly even in the ciphertext so that scalability functionalities offered by the scalable stream are not severally impaired after encryption (see, B. B. Zhu, M. D. Swanson, and S. Li, “Encryption and Authentication for Scalable Multimedia: Current State of the Art and Challenges,” Proc. of SPIE Internet Multimedia Management Systems V, vol. 5601, pp. 157-170, Philadelphia Pa., October 2004). If there is a bit error 110 in the original plaintext, the global dependency may also propagate bit errors to the whole ciphertext 104. The same is also true for bit errors that occur in the ciphertext 104, that propagate to the whole decrypted plaintext. This especially occurs when an erroneous bit removes or generates an illegal substream during a decryption iteration so that the decryption stops at a wrong iteration.
Another problem with the globally iterative encryption schema 100 is that when r, i.e. the number of iterations in encryption, is not prime to 2n in Equation (1), then some trailing bits are not encrypted by the globally iterative encryption schema 100. For example, if r=2q<2n, then the last q bits of plaintext are not encrypted. Since the number of unencrypted bits is a 2-based logarithm of the greatest common divisor (gcd) of r and 2n, the unencrypted trailing bits are typically small. This may not amount to a very great vulnerability unless the last several bits are very important.
A slightly different iterative block cipher encryption schema is proposed for syntax compliant JPEG 2000 encryption in H. Wu and D. Ma, “Efficient and Secure Encryption Schemes for JPEG2000,” IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2004 (ICASSP '04), vol. 5, pp. V869-872, May 2004. In this schema, every byte which is not of hexadecimal value 0xFF or whose preceding byte is not of value 0xFF is extracted from plaintext to form a new stream of data which is then encrypted iteratively with a block cipher until the output does not contain any byte of value 0xFF. The resulting ciphertext is then partitioned and placed back to the original positions in the plaintext to obtain the ciphertext. In addition to the drawbacks of the globally iterative encryption schema 100 described just above, this schema may extract wrong bytes and obtain a misaligned cipherstream to be decrypted when error occurs in the ciphertext, and results in the whole decrypted data being garbled.
Yet another syntax compliant encryption schema called ciphertext switching encryption which works with a stream ciphers such as RC4 and SEAL has been proposed by B. B. Zhu, Y. Yang, and S. Li, in “Ciphertext Switching for Syntax Compliant Encryption,” submitted to IEEE Trans. on Image Processing. In ciphertext switching encryption, post-processing is used to make the output of a conventional stream cipher encryption syntax compliant. More specifically, illegal substreams in the output of conventional stream cipher encryption are switched back to the original substreams in the plaintext to obtain syntax compliant ciphertext. This “switching” occurs rarely. For example for arithmetic coded data in JPEG 2000, only about 0.34% data are switched to plaintext, and each occurrence is about 2 bytes long on average. Unlike the partial encryption schema mentioned above, in which unencrypted data is publicly known, the location of switched data in the ciphertext switching encryption schema depends on both the plaintext and the random and unpredictable keystream, and therefore is itself random and unpredictable too. This ensures the security of the ciphertext switching encryption schema. One drawback of the schema is that it cannot be extended to work with block ciphers. A disadvantage of using a stream cipher for encryption is that there is a one-to-one correspondence between the ciphertext and the plaintext. It is possible to modify some bits in the ciphertext and observe the consequence after decryption to gain a knowledge of the structures of the original data. Thus, many applications prefer to use more widely-known block ciphers instead of stream ciphers.
What is needed is an encryption technique that can produce syntax compliant ciphertext, yet avoid the many problems and disadvantages discussed above.