1. Field of the Invention
The present invention relates to a method for processing compressed data, and to apparatus and media for its implementation.
The technical field of the present invention is the manufacturing of audio and/or video data encoders.
2. Description of the Related Art
The present invention relates more particularly to a method for selectively (partially) ciphering audio or video data by a cipher algorithm, the data being compressed and organized according to a standardized format, and being capable of comprising codewords of variable length.
Today, the secured distribution of video documents is limited to the broadcasting of “pay-as-you-go” cable or satellite television; the security is provided by “proprietary” cipher systems, which are defined, implemented and controlled by a single provider: the broadcaster.
The new standards of low-rate video, broadband Internet and wireless-network handheld terminals, of 3G telephone or personal assistant type, should soon enable the distribution of video documents: teleconferencing, multimedia messages, film trailers, live sporting events and video on demand, in particular.
Some security requirements are emerging which cannot be met by the current solutions. The requirements are as follows:
a—the syntax of the ciphered stream must remain as compliant as possible with the coding standard, in order to facilitate the transport by network; the method for processing data must provide transparency to the transcoding and to the changes in data rates, as well as transparency to the routers and servers for reasons of confidence; the method must enable random access and other video processing without deciphering the complete stream, and must enable the transport by protocols provided for standard video;
b—the compression efficiency must not be reduced as a result of the securement of the data by cipher;
c—the securement must be compatible with various tools provided for by the video data compression standards (MPEG4, H264), particularly the resistance to errors, for wireless transmission and the losses of IP (Internet Protocol) packets, as well as the multi-level coding, for heterogeneous bandwidth client terminals;
d—the security and backward masking level must be adapted to the application: robustness to video-specific attacks;
e—the required computing power must remain compatible with embedded terminals, for applications like the wireless streaming of multimedia documents for example.
According to the MPEG standard, a video sequence is made up of a series of groups of images, each image group comprising a series of images of type I (intrinsic), P (predicted) and B (bi-directional); each type-I image is split into macroblocks; each macroblock is converted into four luminance blocks and into two chrominance blocks, this conversion resulting in a first loss of information.
Each 64-pixel block is converted into a 64-coefficient table by a DCT (“discrete cosine transform”); this table is compressed by quantization and then ordered and coded (“zig-zag ordering” and “run-length coding”) according to the number of zero-value coefficients encountered during a zig-zag scan of the table; the resulting compressed data are coded into words of variable length (“Huffman coding”); these transformations also result in a loss of information.
Various methods for ciphering a standardized video data stream—particularly an MPEG-standard stream—have been proposed in order to meet some of the aforementioned requirements.
When a codeword that is part of a table of codewords of different lengths is entirely ciphered, the result is generally a codeword which does not belong to this table (“non-compliant” word); consequently, a decoder that analyses the codewords bit by bit and makes decisions at each bit, will not be able to recognize the boundary of the ciphered codeword, will “get confused” and will no longer know which data field it is analyzing; this disadvantage results from the fact that the codewords are of variable length.
The document “A Fast Mpeg Video Encryption Algorithm”, Changgui Shi et al., ACM Multimedia 98, describes a method for ciphering MPEG-compressed video data, by a secret key; the sign bits of the Huffman coefficients (AC and DC)—which are codewords of variable length—are “XORed” bit by bit with a key of determined length (i.e. combined bit by bit with the bits of the key by means of XOR logic gates, i.e. “exclusive OR” gates), and are respectively replaced—in the video data stream—with the bit value resulting from this operation; this document proposes using one or several long key(s); a 128-bit key is used as an example.
According to this document, only the sign bits of the codewords are ciphered, and only for the codewords that represent useful data (motion vectors and DCT coefficients representing the texture), which results in a compliant codeword. If the codewords representing something else were ciphered, like the number of coded blocks for example, even if compliant codewords were obtained after ciphering, the decoder would be lost.
This selective cipher method, which operates on a small part of the data stream, requires fewer computing resources than those required by the methods for fully ciphering the stream; on the other hand, the darkening of the ciphered images is relatively low.
According to the aforementioned Changgui Shi et al. document, sync points, which are added to the data stream, enable a decoder that has the key to know which position in the ciphered stream it must start using the deciphering key from again; these sync points are added at the start of each image group, at the start of each type-I image or at the start of a predetermined number of images.
Schedule E to the ISO standard 14496-2 defines several useful tools or modes to minimize the negative consequences of errors in the transmission of a compressed data stream: i) the synchronization markers; ii) the separation between the texture data on the one hand, the header and motion data on the other hand; iii) the use of reversible codes of variable length for the coding of the texture data.
In a “video packet synchronization” mode, a periodic synchronization marker can be created at the end of a macroblock when the number of bits since the previous marker is higher than a certain threshold; a video packet (part of the stream between two successive markers) therefore has a variable number of macroblocks.
When the data stream is partially ciphered with a block cipher algorithm, like the DES (64-bit block) and a fortiori the AES (128 bits) standards for example, the number of data bits to be ciphered inside this video packet can be lower than the number of bits of the cipher block, in particular when the packet contains the motion vectors associated With the P and B-type images; in this case, this packet will be transmitted without ciphering, and the darkening of the sequence will be reduced.
The U.S. Pat. No. 6,505,299-B1 (Zeng et al.) describes different methods for ciphering quantized (partially compressed) video data before their coding by Huffman tables, by RLE encoding, arithmetic coding or other entropy coding: a spatial frequency transform is applied to the image, which generates a map of transform coefficients; these coefficients are then ciphered, either by scrambling their sign bits, by scrambling their least significant bits, by mixing blocks of the map, or by mixing coefficients corresponding to a spatial frequency band of the map.
This document further proposes ciphering the motion vectors of the P and B-type images; this increases the darkening of the ciphered images.
A disadvantage of these cipher methods is that they reduce the efficiency of the data compression obtained by quantization; another disadvantage is that they require computing means that are more significant than those required for a cipher after Huffman coding or equivalent.
The patent application US-2002/0018565 (Luttrell et al.) describes a method for selectively ciphering an MPEG4 data stream that preserves the coding syntax; according to this method, the indexes (of fixed length equal to n) of a table of 2n words of variable length are ciphered, and for each index of the table, the word (in plain text) corresponding to this index is replaced with the word (in plain text) corresponding to the ciphered index; this method does not enable the relation of the table in plain text between the length of a codeword and the frequency of occurrence of the corresponding symbol in a data stream to be kept; consequently, it reduces the data compression by coding using the ciphered table.
Some of the known methods for ciphering a video data stream are insensitive to the loss of a data packet; on the other hand, these methods are sensitive to the isolated loss of one or more data bits, which frequently occurs in wireless transmission systems in particular.
Furthermore, these known methods are not compatible with the methods for adapting the stream to a variable bandwidth provided by the MPEG4FGS standard (“fine granularity scalability”, MPEG-4 Video Verification Model version 18.0, January 2001), in which, to adapt a data stream to a reduced-bandwidth transmission channel, the data stream is truncated at arbitrary positions, as soon as the number of bits allocated to the channel has been reached, in the middle of an image for example.