Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless communication devices, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, video gaming devices, video game consoles, cellular or satellite radio telephones, and the like. Digital video devices implement block-based video compression techniques, such as those defined in MPEG-2, MPEG-4, or H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), to transmit and receive digital video in an efficient manner.
Block-based coding techniques generally perform spatial prediction and/or temporal prediction in order to achieve data compression in video sequences. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy between video blocks within a given coded unit, which may comprise a video frame, a slice of a video frame, or another independently decodable unit of data. In contrast, inter-coding relies on temporal prediction to reduce or remove temporal redundancy between video blocks of successive coded units of a video sequence. For intra-coding, a video encoder performs spatial prediction to compress data based on other data within the same coded unit. For inter-coding, the video encoder performs motion estimation and motion compensation to track the movement of corresponding video blocks of two or more adjacent coded units.
A coded video block may be represented by prediction information that can be used to create or identify a predictive block, and a residual block of data indicative of differences between the block being coded and the predictive block. In the case of inter-coding, one or more motion vectors are used to identify the predictive block of data, while in the case of intra-coding, the prediction mode can be used to generate the predictive block. Both intra-coding and inter-coding may define several different prediction modes, which may define different block sizes and/or prediction techniques used in the coding. Additional types of syntax elements may also be included as part of encoded video data in order to control or define the coding techniques or parameters used in the coding process. A 16 by 16 area of pixels is typically represented by sub-partitioned luminance (luma) blocks and two different downsampled 8 by 8 chrominance (chroma) blocks. Each of the different video blocks may be predictively coded.
After block-based prediction coding, the video encoder may apply transform, quantization and entropy coding processes to further reduce the bit rate associated with communication of a residual block. Transform techniques may comprise discrete cosine transforms or conceptually similar processes, such as wavelet transforms, integer transforms, or other types of transforms. In a discrete cosine transform (DCT) process, as an example, the transform process converts a set of pixel values into transform coefficients, which may represent the energy of the pixel values in the frequency domain. Quantization is applied to the transform coefficients, and generally involves a process that limits the number of bits associated with any given transform coefficient. Entropy coding comprises one or more processes that collectively compress a sequence of quantized transform coefficients, such as context adaptive variable length coding (CAVLC) or context adaptive binary arithmetic coding (CABAC).
Some block-based video coding and compression makes use of scalable techniques. Scalable video coding (SVC) refers to video coding in which a base layer and one or more scalable enhancement layers are used. For SVC, a base layer typically carries video data with a base level of quality. One or more enhancement layers carry additional video data to support higher spatial, temporal and/or SNR levels. In some cases, the base layer may be transmitted in a manner that is more reliable than the transmission of enhancement layers.
Some types of SVC schemes are scalable based on bitdepths of pixel values. In these cases, the base layer may define pixel values at a base level of quality according to a first bitdepth, and the enhancement layer may add additional data such that the base and enhancement layers together define pixel values at a higher level of quality, e.g., according to a second bitdepth that is larger than the first bitdepth. Bitdepth scalability is becoming more and more desirable due to the emergence of higher resolution display capabilities, which support pixel reproduction based on higher bitdepths than conventional displays.