The present invention generally relates to a method and a system for embedding binary data sequences into video bitstreams which may be utilized in a Digital Rights Management (DRM) system. Further, the present invention relates to a method and a system for embedding binary data into compressed standard compliant video bitstreams without having to first decompress the video bitstream and then re-compress the manipulated video sequence with data embedded. The present invention is especially suitable for bitstreaming to set-top boxes, wireless telephones, handheld devices, multimedia servers or gateways, where the computational and memory requirements for real-time compression of video materials is prohibitively expensive. In accordance with the present invention, the embedded data can be any binary data sequence. In particular, it may be utilized in a Digital Rights Management system for authentication and access control purposes, such as, for example, access control, creating signatures, watermarking, and/or in-band signaling.
Demands for full motion video in such applications as video telephony, video conferencing, and/or multimedia applications have required the introduction of standards for motion video on computers and related systems. Such applications have further required development of compression techniques that can reduce the amount of data required to represent a moving image and corresponding sound to manageable lengths to, for example, facilitate data transmission using conventional communications equipment with limited transmission capabilities.
One set of standards for compression of motion picture video images for transmission or storage is known as the Motion Picture Experts Group (“MPEG”) family of standards. Each MPEG standard is an international standard for the compression of motion video pictures and audio. The MPEG standards allow motion picture video to be compressed along with corresponding high quality sound and to provide other features such as single frame advance, reverse motion, and still-frame video.
Two versions of the MPEG video standard which have received widespread adoption are commonly known as the MPEG-1 and MPEG-2 standards. In general, the MPEG-2 standard has higher resolution and quality than the MPEG-1 standard and enables broadcast transmission at a rate of 4–6 Mbps. In addition to the MPEG-1 and MPEG-2 standards, the MPEG-4 standard is now standardized by the International Organization of Standardization (“ISO”) and the International Electrotechnical Commission (“IEC”). The MPEG-4 standard is intended to facilitate, for example, content-based interactivity and certain wireless applications.
Another family of video compression standards is standardized by the International Telecommunications Union-Telecommunications Section (“ITU”). The ITU family of video coding standards has evolved from the original H.261 for video conferencing applications over ISDN to H.262 (same as MPEG-2) and now the latest H.263 version 3, which supports many advanced optional operation modes.
The video codes specified by the MPEG and ITU standards are very similar and provide compression of a digital video sequence by utilizing a block motion-compensated Discrete Cosine Transform (“DCT”). In a first block matching step of the DCT process, an algorithm estimates and compensates for the motion that occurs between two temporally adjacent frames. The frames are then compensated for the estimated motion and compared to form a difference image. By taking the difference between the two temporally adjacent frames, all existing temporal redundancy is removed. The only information that remains is new information that may not be compensated for in the motion estimation and compensation algorithm.
In a second step, this new information is transformed into the frequency domain using the DCT. The DCT has the property of compacting the energy of this new information into a few low frequency components. Further compression of the video sequence is obtained by limiting the amount of high frequency information encoded through quantization and entropy coding of information after quantization.
In the MPEG and H.263 motion-compensated transform coding based standards, the basic unit for motion compensation and transform are “blocks” which are non-overlapping 8 pixels×8 pixels. Four spatially adjacent blocks form a macroblock (“MB”) which has a size of 16×16 pixels. All pixels contained in an MB are usually assumed to have the same motion. The DCT transform is performed independently on each 8×8 block in the MB. A motion vector is associated with a particular MB, and the present frame of a video sequence may be found by searching over a predetermined search area in the previous temporally adjacent frame for a best match to the MB.
Utilizing the estimated motion vectors, a copy of the previous frame may be altered by each vector to produce a prediction of the current frame. This operation is referred to as motion compensation. As described above, each predicted MB may be subtracted from the current MB to produce a differential MB whose four blocks are transformed independently into the frequency domain by the DCT. These coefficients are quantized and entropy encoded using variable length codes (“VLCs”) to provide further compression of the original video sequence. Both the motion vectors and the DCT coefficients are transmitted to the decoder wherein an inverse operation is performed to produce the decoded video sequence.
MBs coded with the above described process are called “INTER” MBs since inter-frame correlation was utilized in the compression of these MBs. All parts of the INTER coding process are lossless except for the quantization which introduces unrecoverable distortion to the reconstructed video. Because of the predictive coding process of INTER MBs in MPEG and H.263, quantization distortion accumulates from frame-to-frame and eventually makes prediction inefficient. In addition, scenarios, such as scene changes or large motion video, may also make prediction unsatisfactory for some MBs in a video frame. For these MBs, MPEG and H.263 offer an option of coding image data independently that is similar to block-based still image compression with 8×8 block based DCT and entropy coding of quantized DCT coefficients. MBs coded in this manner are termed “INTRA” MBs. The information on whether an MB is coded as “INTER” or “INTRA” may be transmitted as part of the MB “header information.”
For both INTER and INTRA MBs, many of the DCT coefficients are zero after quantization. Therefore, to achieve better entropy coding efficiency, the VLC codes are not coded with quantized coefficients themselves, but are coded with the non-zero coefficients and the number of consecutive zero coefficients between them. More specifically, before entropy compression coding of run-length coded quantized DCT coefficients, the DCT coefficients in an 8×8 block are mapped to EVENTs which are triplets of the form EVENT=(RUN, LEVEL, LAST), where RUN designates the number of consecutive DCT coefficients that are quantized to zero since the last non-zero DCT coefficient; LEVEL designates the amplitude and the sign of the current non-zero coefficient; and LAST designates whether the current non-zero coefficient is the last non-zero coefficient in the block. All EVENTs generated by non-zero coefficients in an 8×8 block have LAST=0 except for the last EVENT in a block for which LAST=1.
As computers and computer networks become faster and more ubiquitous and publication and distribution of digital content via the internet become more widespread, the ability to manage the usage rights to this content more securely is increasingly significant. Such management of usage rights is commonly referred to as Digital Rights Management (DRM). The use of DRM techniques in video or multimedia delivery networks may involve embedding data into video bitstreams for providing access control and authentication, watermarking or other in-band signaling features. Although the issue of access control and authentication in image and video communications has been considered by some algorithms, many of these algorithms operate on uncompressed image or video intensity field data. For pre-encoded or legacy bitstreams, this requires decoding or pre-encoded data, data embedding, and then re-encoding. Because real-time video encoding is extremely memory and computation intensive, this type of approach is not suitable for large scale video servers and/or gateways that connect to a large number of clients and serve a large set of different compressed video materials at different sizes and rates under a usually tight delay-time budget. Such decoding and re-encoding is also not suitable for applications such as hand-held devices where memory and computation power resources are very limited.
An additional concern of DRM techniques or systems is the issue of error resiliency when DRM systems are deployed in wireless networks. Wireless networks often suffer from packet loss and/or biterrors to data during data transmission. As a result, to deliver content over wireless channels, in a manner that preserves the utility of the DRM system, it is necessary for the DRM system to be error resilient. Unfortunately, security and error resiliency have contradictory requirements. A good encryption scheme has to be a good randomizer, while redundancy is needed to achieve resiliency. One existing error resilient DRM scheme involves channel coding. Channel coding consists of adding redundancy to the data in order to protect it from the effects of errors. However, channel coding is in general not very effective in dealing with bursty errors and often involves significant overhead and/or delay.
A need, therefore, exists for an improved error resilient DRM method and system for providing authentication and access control in video delivery over wireless networks.