Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless communication devices such as radio telephone handsets, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, video gaming devices, video game consoles, and the like. Digital video devices implement video compression techniques, such as MPEG-2, MPEG-4, or H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), to transmit and receive digital video more efficiently. Video compression techniques perform spatial and temporal prediction to reduce or remove redundancy inherent in video sequences.
Block-based video compression techniques generally perform spatial prediction and/or temporal prediction. Intra-coding commonly relies on spatial prediction to reduce or remove spatial redundancy between video blocks within a given coded unit, which may comprise a video frame, a slice of a video frame, or any independently decodable unit. In contrast, inter-coding relies on temporal prediction to reduce or remove temporal redundancy between video blocks of temporally successive coded units of a video sequence. For intra-coding, a video encoder may perform spatial prediction to compress data based on other data within the same coded unit. For inter-coding, the video encoder may perform motion estimation and motion compensation to track the movement of corresponding video blocks of two or more adjacent coded units.
A coded video block may be represented by prediction information that comprises a prediction mode and a predictive block size, and a residual block of data indicative of differences between the block being coded and a predictive block. In the case of inter-coding, one or more motion vectors are used to identify the predictive block of data. For intra-coding, the prediction mode can be used to generate the predictive block.
After block-based prediction coding, the video encoder may apply transform, quantization and entropy coding processes to further reduce the bit rate associated with communication of a residual block. Transform techniques may comprise discrete cosine transforms or conceptually similar processes, integer transforms, or other types of transforms. In a discrete cosine transform (DCT) process, as an example, the transform process converts a set of pixel values into transform coefficients, which may represent the energy of the pixel values in the frequency domain.
Quantization is applied to the transform coefficients, and generally involves a process that limits the number of bits associated with any given transform coefficient. Entropy coding comprises one or more processes that collectively compress a sequence of quantized transform coefficients. Scanning techniques, such as zig-zag scanning techniques, may be performed on the quantized transform coefficients in order to define one-dimensional vectors of coefficients from two-dimensional blocks. The scanned coefficients are then entropy coded, e.g., via content adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), or another entropy coding process.
In many cases, a video sequence may be coded into a base layer and one or more enhancement layers. In this case, the base layer may define a base level of video quality, and one or more enhancement layers may enhance the quality of the decoded video signal. Enhancement layers may improve the video quality in a variety of ways, e.g., by providing spatial or signal-to-noise enhancements to base layer frames, or by providing temporal enhancements to the decoded video by adding additional frames between the base layer frames. In any case, the encoded video may be transmitted to a video decoding device, which performs the reciprocal process of the video encoder in order to reconstruct the video sequence.