Videos can be compressed using MPEG-4 compression. MPEG-4 compression incorporates many compression features of MPEG-1 and MPEG-2 such as frame types (I-frames, B-frames, and P-frames), motion compensation, group of pictures, and macroblocks. MPEG-4 also introduces the concept of an object. Objects are parts of scenes that can be coded as separate video objects. For example, a person in a scene can be coded as a video object that is separate from the coding of the background of the scene. The separate coding of objects allows different parts of a scene to be coded with different resolutions. For example, the object representing a person can be coded at a higher resolution than the background.
MPEG-4 codes a video hierarchically. A video object may be sampled at each frame of a scene to generate a video object plane (“VOP”). A VOP may be coded using various frame types or using motion compensation. A sequence of VOPs can be grouped together into a group of VOPs (“GOV”). GOVs, like VOPs, can be coded independently. MPEG-4 organizes the VOPs or GOVs into video object layers (“VOLs”). The VOLs for a video object are further organized into a video object (“VO”) level, which includes all the bitstreams, for that video object. The video object levels of a scene are organized into a video session (“VS”) for the scene.
The spatial and temporal scalability of MPEG-4 is provided at the VO level. MPEG-4 provides scalability using a base layer and enhancement layers. The base layer represents the lowest quality supported by a bitstream, and each enhancement layer provides increasingly higher quality. To provide spatial scalability, each VOP is converted from its original resolution to a lower resolution as a base layer and the difference between the lower resolution and the original resolution is represented in the enhancement layers. When a device receives an MPEG-4 video, it can present the video using the base layer alone or using the base layer and one or more enhancement layers. Similarly, when a routing device receives an MPEG-4 video, it can forward the base layer only or forward the base layer along with one or more of the enhancement layers.
MPEG-4 was amended to allow for Fine Grain Scalability (“FGS”) to support environments (e.g., streaming media) where scalability based on base and enhancement layers is too coarse and does not provide the needed flexibility, or the coders and decoders for multiple enhancement layers are too complex and thus too expensive. FGS provides a base layer and one enhancement layer. The base layer is encoded with a non-scalable coder to provide the lowest quality bitrate for a scalable codestream. The enhancement layer is coded into bitplanes from the most significant bitplane to the least significant bitplane. In particular, the difference between the original VOP and the reconstructed VOP from the base layer is encoded bitplane-wise from the most significant bitplane to the least significant bitplane. Each bitplane of a macroblock's discrete cosine transform (“DCT”) coefficients is zigzag ordered, converted to run and end-of-plane (RUN, EOP) symbols, and coded with variable-length coding to produce an enhancement layer codestream. RUN is the number of consecutive zeros before a nonzero value, and EOP indicates if any non-zero values are left on the current bitplane for the block. For FGS Temporal (“FGST”), which does not have corresponding base layer VOPs, the bitplane coding is applied to the entire DCT coefficients of the VOP. MPEG-4 FGS provides very fine grain scalability to allow near rate-distortion (RD) optimal bitrate truncation for a large range of bitrates. An FGS video can be truncated to the base layer or any bitplane of the enhancement layer depending on channel capacity or display device capability.
MPEG-4 FGS groups video data into Video Packets (“VPs”) that contain independently coded data. Each VP is delimited by unique resynchronization markers to prevent error propagation to other VPs. Information is inserted after a resynchronization marker to enable resuming decoding in the event that a VP is damaged in transmission. For the enhancement layer, both the bitplane start marker (i.e., fgs_bp_start_code) and the resynchronization marker (i.e., fgs_resync_marker) are used as VP delimiters. The fgs_bp_start_code is 32 bits, starting with 23 binary zeros followed by 0xA and five bits indicating to which bitplane the data belongs. The fgs_resync_marker is 22 binary zeros followed by a binary one. The number of the first macroblock is inserted after each marker fgs_resync_marker. The VP boundary is aligned with a macroblock. If an error occurs in coded bitplane data, the bitplane data of the current and subsequent blocks of that bitplane cannot be correctly decoded, and will be discarded. The lower bitplane data of those affected blocks are also discarded because the alignment of the sign bits cannot be determined. In particular, a sign bit for a DCT coefficient is encoded with the bitplane that has the most significant “1” for that DCT coefficient. So, if an error occurs in the bitplane, the sign bits for lower bitplanes become misaligned and cannot be properly decoded. The size of a VP can be determined at encoding time based on different scenarios. For example, if the video is being transmitted on a highly reliable channel, a large VP may be used as errors and corrupted VPs will be rare. In contrast, if the channel is unreliable and prone to errors (e.g., a wireless channel), a small VP may be used so that not much video information is lost with each frequent error.
Encryption can be applied to MPEG-4 FGS video codestreams to protect videos from unauthorized access or usage. An important requirement for encryption of scalable codestreams is that the encrypted codestream should preserve as fine as possible granularity for scalability so that it can be truncated directly by an encryption-unaware device without decryption. In other words, as a video is processed by intermediaries, it is desirable that those intermediaries can reduce the scale (e.g., resolution) of the video without having to decrypt the video. A desirable requirement for MPEG-4 FGS encryption is that an encrypted codestream is still compliant to the MPEG-4 FGS syntax and that encrypted data does not emulate any MPEG-4 FGS delimiters to avoid erroneous parsing or synchronization, especially under error-prone transmissions.