Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless communication devices, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, video gaming devices, video game consoles, cellular or satellite radio telephones, and the like. Digital video devices implement video compression techniques, such as Motion Pictures Expert Group (MPEG)-2, MPEG-4, or International Telecommunication Union Standardization Sector (ITU-T) H.264/MPEG-4, Part 10, Advanced Video Coding (AVC) (hereinafter “H.264/MPEG-4 Part 10 AV” standard), to transmit and receive digital video more efficiently. Video compression techniques perform spatial and temporal prediction to reduce or remove redundancy inherent in video sequences.
In video coding, video compression typically includes spatial prediction and/or motion estimation and motion compensation to generate a prediction video block. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy among video blocks within a given coding unit, e.g., frame or slice. In other words, a video encoder performs spatial prediction to compress data based on other data within the same coding unit. In contrast, inter-coding relies on temporal prediction to reduce or remove temporal redundancy among video blocks of successive video frames of a video sequence. Thus, for inter-coding, the video encoder performs motion estimation and motion compensation to track the movement of matching video blocks of two or more adjacent coding units.
After spatial or temporal prediction, a block of residual coefficients (referred to as a residual block or residual information) is generated by subtracting the prediction video block from the original video block that is being coded. The residual block may be a two-dimensional matrix of coefficient values that quantify the differences between the prediction video block and the original block. The video encoder may apply transform, quantization and entropy coding processes to the residual block to further reduce the bit rate associated with communication of the residual block. The transform techniques may comprise discrete cosine transforms (DCTs), wavelet transforms, integer transforms, or other types of transforms.
In a DCT transform, for example, the transform process converts a set of pixel-domain coefficients into transform coefficients that represent the energy of the pixel-domain coefficients in the frequency, or transform, domain. Quantization is applied to the transform coefficients to generate quantized transform coefficients. Quantization generally limits the number of bits associated with any given coefficient. The video encoder entropy encodes the quantized transform coefficients to further compress the quantized transform coefficients. The video encoder may entropy encode the coefficients using variable length coding (VLC), arithmetic coding, fixed length coding or a combination thereof. A video decoder may perform inverse operations to reconstruct the video sequence.
Some video coding standards, such as MPEG-2, encode video at a relatively constant quality, bit rate or spatial resolution. Such a technique may be sufficient to provide video applications to devices having similar decoder capabilities (e.g., memory or processing resources) and/or connection qualities. However, more modern video transmission systems typically include devices with varying decoder capabilities and/or connection qualities. In such systems, transmitting video encoded at a relatively constant quality, bit rate or spatial resolution results in the video applications working for devices that have appropriate decoder capabilities and/or connection qualities and not working for devices that do not have appropriate decoder capabilities and/or connection qualities. In the wireless context, for example, devices located closer to a source of the video transmission may have a higher quality connection than devices located farther from the source. As such, the devices located farther from the source may not be able to receive the encoded video transmitted at the constant quality, bit rate or spatial resolution.
Other video coding standards make use of scalable coding techniques to overcome these issues. Scalable video coding (SVC), e.g., in accordance with an extension of ITU-T H.264/MPEG-4, Part 10, AVC, refers to video coding in which the video sequence is encoded as a base layer and one or more scalable enhancement layers. For SVC, the base layer typically carries video data with a base spatial, temporal and/or quality level. One or more enhancement layers carry additional video data to support higher spatial, temporal and/or quality levels. Enhancement layers may, for example, add spatial resolution to frames of the base layer, or may add additional frames to increase the overall frame rate. In some instances, the base layer may be transmitted in a manner that is more reliable than the transmission of enhancement layers. As such, devices located farther from the source of the encoded video or with lower decoder capabilities may be able to receive the base layer, and thus the video sequence, albeit at the lowest spatial, temporal and/or quality level.